By James, Katherine and Amara.
The subtitle of our project, is ‘Re-evaluating Women’s work in archaeology, history and heritage in Britain, 1870 – 1950′. We aim to create a large-scale study of women’s contributions to these fields by rendering visible previously unknown ‘professional’ (both salaried and unsalaried) activities of women through the study of archival sources. This requires us to identify instances in our sources of women conducting various activities we would categorise as ‘work’, and to express those as data. But how do we identify these women?
This has led the project team to chew over two issues in recent meetings:
- How to work within, or modify, the conventions of Wikidata to deal with gender (and indeed other sensitive personal) characteristics in our database
- The ethics of ascription of gender to people in the past
To begin, let’s consider the conventions governing how the property of gender (or ‘sex and gender’, since Wikidata currently conflates the two – and this is in itself controversial) is handled in Wikidata, the collaboratively edited knowledge graph whose linked data underpins Wikipedia and Google Search. Like other systems for the organization and systematization of knowledge, Wikidata operates by using (relatively) controlled vocabularies: lists of key terms with agreed definitions that allow records to be tagged/described in ways that render them searchable. In order to link our data to other data sets it is important to work to some extent within conventions in order to make those links. If we were to create an entirely unique, bespoke set of categories to classify our historical data, our database would be limited in usefulness, since it would not be discoverable through standard searches or interoperable with other data ecologies. On the other hand, conventions developed in Wikidata (term lists etc) are simplifying/flattening and can be inadequate when dealing with historically constructed categories, including gender. Crowd-sourced editing of Wikidata (in which we are participating) can also lead to changes and allows for modifications.
We are constructing our database by writing statements that ascribe information to individual or classes of items/objects, including people, by linking them with particular values of properties (known in data design as key-value pairs). For example, the English language statement “milk is white” would be encoded by a statement pairing the property ‘color’ (P462) with the value ‘white’ (Q23444) under the item ‘milk’ (Q8495).
The category (property) of ‘sex or gender’ (P21) is defined in Wikidata as follows:
sex or gender identity of human or animal. For human: male, female, non-binary, intersex, transgender female, transgender male, agender. For animal: male organism, female organism.
The issue here is not only that this gloss is contestable (and indeed contested – check out the discussion on the property talk page (content warning: transphobic language)). It is also that no caveats exist around the ascription of values of this property to historical or living individuals.
The absence of caveats for ‘sex or gender’ contrasts with cases such as ‘ethnic group’ (P172)’ and ‘sexual orientation (P91)’, properties the definitions of which are hedged about with caveats:
subject’s ethnicity (consensus is that a VERY high standard of proof is needed for this field to be used. In general this means 1) the subject claims it themselves, or 2) it is widely agreed on by scholars, or 3) is fictional and portrayed as such)
the sexual orientation of the person — use IF AND ONLY IF they have stated it themselves, unambiguously, or it has been widely agreed upon by historians after their death
We are not the only people to notice Wikidata’s blunt flattening of sex and gender. The wonderful Homosaurus, a linked data vocabulary of lesbian, qay, bisexual, transgender, queer/questioning, and others (LGBTQ+) terms, gives us a range of narrower terms we might use instead: gender identity, gender expression, assigned gender.
These narrower definitions draw attention to what we, as historians, are doing in this project. We are dealing overwhelmingly in assigned gender rather than gender identity or gender expression, i.e. gender as ascribed to historical agents in our sources and/or as perceived by us in our interpretation of those sources. We have no direct access to the gender identity of the majority of our subjects (they do not ‘state it themselves, unambiguously’). And gender expression varies over time and between places, making our particular perception of gender a determinant of how we ascribe gender.
In the sources we have been looking at so far, sources that (partially) record work in archaeology, history and heritage in Britain, ‘sex or gender’ (P21) property-values such as womanhood are either ascribed to the people that feature in them, or our sources are silent on the matter. Sometimes the (ascribed) gender of individuals in our sources is signalled in explicit fashion, e.g. by use of gendered titles such as ‘Mr’, ‘Mrs’ or ‘Miss’. In other cases there is indirect or implicit evidence of gender-ascription – not least, evidence of the various kinds of barriers and exclusions to which women were subject in 19th and early 20th century Britain. Most obviously, individuals to whom womanhood was ascribed were excluded from being Fellows of the Society of Antiquaries until 1920, but (in the UK until 1918/1928) they were also excluded from suffrage, from taking their degrees in certain universities, from pursuing certain kinds of professional work once married, and so on.
Data Feminism gives us ways to respond to the inadequacies of Wikidata P21, both as a tool for representing the past lives that are the focus of our study and – in turn – all people effectively misgendered by its flattening effect and binary assumptions. As D’Ignazio and Klein write “data feminism requires us to challenge the gender binary, along with other systems of counting and classification that perpetuate oppression” (D’Ignazio and Klein, Data Feminism (2020), 97). Not only is questioning a classification system a feminist move, so is acting in opposition to it, refusing to contribute to it on its terms. If what gets counted counts, we need to ensure that not only are more women counted, but that they are counted in ways that make clear when their womanhood is an ascription, an identity, and an expression.
What does this mean in practice? It means a number of interventions in the way we make statements about gender and sex, none of which we claim to have got entirely right, all of which we are working though in real time as we encounter the archive and the lives therein. These include assigning gender as ‘woman’ (Q3 in our data) if:
Where gendered honorifics are absent and only initials and surname are given, even if the individual’s name appears in relation to a context and activity in which normative actors in our period are men, we do not assume that the individual indicated is a man. Rather, we investigate that name, indicate uncertainty when ascribed gender is unclear, and record ‘unknown value’ when no evidence can be found.
Finally, we are committed to using a technical infrastructure that tracks our changes, timestamps them and gives each edit an author. This enables our attempts to resist the presumption of gender ascription to be recorded, and when new information is found that revises a claim, ensures that our uncertainty – however fleeting – remains entangled with the linked data we produce.
All these solutions are provisional and imperfect. We welcome constructive feedback on the procedures we have developed so far.