Building the Beyond Notability Knowledge Base: 4 reasons why we chose Wikibase

By James Baker (Co-Investigator, Beyond Notability)

For all that people like to moan about the things that are wrong on Wikipedia (and there is much that is wrong on Wikipedia), it is the place people go to when they want to know something: together with the other sites run by the Wikimedia Foundation, Wikipedia is the knowledge infrastructure of the web. Since 2010 cultural institutions have formally contributed to this ecosystem through Wikimedian-in-Residence programmes, typically resulting in digitised material appearing on Wikimeda Commons, the home for every media artefact you encounter when browsing Wikipedia.

More recently a number of those Wikimedian-in-Residence programmes have directed attention towards Wikidata, a multilingual knowledge graph that is a common source of open data used on Wikipedia. More significantly, every time you search Google and a little info box pops up on the right side of the screen containing useful – typically biographical – information, that is probably drawn from Wikidata. In turn a person without a Wikidata page is unlikely to get a box. And so if less than 20% of Wikipedia Biographies are about women, and if most Wikipedia biographies have a corresponding Wikidata page, then it follows that enriching Wikidata with otherwise neglected histories of women active in archaeology, history and heritage is something worth attention. Hence, our project.

Wikidata is a wiki (a collaboratively edited hypertext publication) whose technical infrastructure is based on a combination of the software MediaWiki and a set of knowledge graph MediaWiki extensions known as Wikibase, the workings of which are explained in the ’Introducing Our Database’ post. We have built the Beyond Notability Knowledge Base on the same infrastructure, using Wikibase-as-a-service, first via WbStack (with amazing support from Adam Shorland) and latterly via the Wikimedia Deutschland hosted Wikibase Cloud (with thanks to Mohammed Sadat). In this blog we list the Top 4 reasons why we took this approach.

1. Aligning Biographical Approaches

We can’t record the evidence we find directly onto Wikidata because many of the women we encounter in our research do not meet Wikidata’s ‘notability threshold’ – in some cases because evidence for their work in archaeology, history, and heritage is fragmentary, in other cases because the evidence needs to be assembled first to get over that threshold. Despite this, it wouldn’t make much sense for us to design from scratch a biographical database. And so we align our approach with Wikidata because, in part, it gives us an ontological platform to build on, a template for how to represent things like familial relations, office holding, and residences.

2. Beyond Notability as a Trusted Source

It made sense then to use the same technical infrastructures as Wikidata for our knowledge base. But whilst alignment is useful we cannot – as discussed in our recent blog ‘On Working with Gender – faithfully follow the Wikidata model for representing biographical information: the historically-specific circumstances in which women were working in the late-nineteenth and early-twentieth century are an awkward fit for a data model orientated around modern ways of being in the Global North: indeed, our project is a test of the capacities of data models like Wikidata to capture and represent these women’s lives. Given this need to diverge, given the choices we are making to diverge from Wikidata-as-canon, using the same software platform as Wikidata, the same visual and ontological aesthetic, supports our ambition for the Beyond Notability Knowledge Base to be regarded as a trusted source of biographical information. This is important because we think our work can make vital contributions to Wikidata. Take as an example Gwenllian Morgan, the subject of our previous blog. Prior to our project she was not listed on Wikidata as being a Fellow of the Society of Antiquaries (the construction of which on Wikidata uses the’award received’ property). But now she is, with the amended Wikidata entry using Beyond Notability as the source of this information.

3. Querying Between Knowledge Bases

Recording Gwenllian Morgan as a Fellow of the Society of Antiquaries (FSA) means that any queries that use Wikidata to return a list of FSAs will now include her, as one of the many people that link to the Wikidata item Fellow of the Society of Antiquaries (Q26196499). These queries can be made through the Wikidata Query Service, a SPARQL endpoint, “SPARQL” here meaning the query language used to interrogate graph databases. Building the Beyond Notability Knowledge Base on the same technologies as Wikidata means not only that we too have a SPARQL Query Service but also that both sets of data are organised using the same underlying principles, allowing us to more easily write queries that simultaneously interrogate both knowledge bases (and, indeed, any other knowledge bases that take a similar form).

We are already doing this kind of cross-querying to help our data entry. For example, we are using this..

PREFIX bnwd: <http://beyond-notability.wiki.opencura.com/entity/>
PREFIX bnwds: <http://beyond-notability.wiki.opencura.com/entity/statement/>
PREFIX bnwdv: <http://beyond-notability.wiki.opencura.com/value/>
PREFIX bnwdt: <http://beyond-notability.wiki.opencura.com/prop/direct/>
PREFIX bnp: <http://beyond-notability.wiki.opencura.com/prop/>
PREFIX bnps: <http://beyond-notability.wiki.opencura.com/prop/statement/>
PREFIX bnpq: <http://beyond-notability.wiki.opencura.com/prop/qualifier/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd:  <http://www.wikidata.org/entity/>

SELECT ?person ?personLabel ?item ?WD_DOB ?WD_DOD
WHERE {  
  ?person bnwdt:P16 ?isFSA . #select FSA
  FILTER NOT EXISTS {?person bnwdt:P4 bnwd:Q12 .} #filter out project team
  ?person bnwdt:P14 ?url . #look for wikidata URL on person page
  BIND(IRI(REPLACE(?url,"https://www.wikidata.org/wiki/","http://www.wikidata.org/entity/")) as ?item ) 
  
  SERVICE <https://query.wikidata.org/sparql> {
        ?item wdt:P21 wd:Q6581072 . #select women
        OPTIONAL {?item wdt:P569 ?WD_DOB . } #recall date of birth
        OPTIONAL {?item wdt:P570 ?WD_DOD . } #recall data of death
      }
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en-gb". } 
}

..query to return a list of all woman on our knowledge base with corresponding Wikidata entries and – where present – their dates of birth and death as listed on Wikidata (and yes, it could be a better query, I’m still learning). This is important to know, because we intend to use Wikidata to run queries that rely on this information – for example, return all the women who became Fellows of the Society of Antiquaries before they were 40 – for those women on Wikidata (for those who aren’t, we will record that data on our knowledge base).

As we develop more research orientated queries, using a comparable infrastructure to Wikidata gives us more example queries to draw on for inspiration and guidance. One such query is helping to develop our understanding of the interpersonal connections that women relied on to get recognition for their work, and who were key allies for women in the period. Other queries we are starting to imagine and this is helping shape the data we include in the Beyond Notability Knowledge Base. For example, in order to successfully run a queries that returns a list of all women in our knowledge base who undertook professional activities within 3 years of becoming a mother, we need a record of when their children were born, data which only exists in Wikidata for women whose children are all considered ‘notable’. We therefore have started to formulate plans for how to record information about motherhood, and other life events, in a way that preserves our imperative to centre women in our data.

4. A Community

Finally, we choose Wikibase because it isn’t just a piece of software, it is a community. The Wikibase Stakeholder Group is providing a space where we can gain expertise, share ideas, and demonstrate our commitment to trustworthy linked open data infrastructures. Our particular thanks go to Adam Shorland, Laurence ‘GreenReaper’ Parry, Lozana Rossenova, Maarten Brinkerink, and Maarten Zeinstra. We look forward to continuing to work with you over the next few years of our project.