infrastructure – Beyond Notability

By James Baker (Co-Investigator)

The Beyond Notability Knowledge Base stores biographical information about women’s work in archaeology, history, and heritage in Britain between 1870 and 1950, information gathered during the course of our AHRC-funded research. We create information in the form of semantic triples, machine and human reading statements that describe the relationship between two things: the Miss Hemming lived in Uxbridge, that Louisa Elizabeth Deane was a donor to the Society of Antiquaries of London in 1887, that Harriet Loyd Lindsay destroyed the Yew Down barrow in 1906.

Wikidata, which celebrates its 10th birthday this Autumn, is the pre-eminent knowledge base for machine readable linked data describing the relationship between people and things. Whilst we are adding and enriching wikidata, and whilst we use it as a source of information we choose not to duplicate, we maintain our research on a separate knowledge base because we need to describe relationships that are too particular to us to represent on Wikidata, and because we diverge from the Wikidata community in how some concepts – such as gender expression – should be described.

If you visually compare our knowledge base with Wikidata you’ll notice that they look remarkably similar. This is because they use the same underlaying software – Wikibase – to create, maintain, manage, and query semantic triples. Since June this year, our Beyond Notability Wikibase instance has been hosted by Wikimedia Deutschland via their wikibase.cloud service. wikibase.cloud enables people who want to run a Wikibase but don’t have the (technical or financial) capacity to run their own instance, to create a Wikibase on a shared hosting platform with minimal configuration.

This post describes how to get started, key points to consider, and some basic things to do to make your work with wikibase.cloud easier.

Create an Instance

At the time of writing, wikibase.cloud is in a closed beta, which means they are not accepting account requests. However, you can sign up for early access and join community mailing list.

Once you have a login, you can create a new wiki by choosing a site name, deciding a prefix to .wikibase.cloud, and then creating your wiki. From there you have a few important configuration options:

to set a site logo;
to edit your site skin from three options (ours is “Vector”);
to select whether users of your Wikibase can create accounts and edit straight away, or require your approval (we have the latter);
whether or not to map your properties to those on Wikidata (we don’t, for reasons).

Editing pages

Editing a page – e.g. a landing page or a list of queries – on your wikibase.cloud instance is the same as editing a page on Wikipedia in that both use the same syntax: so, ==HEADING== for a heading, * for a bullet, [http://www.foo.bar My Website] for a link, etc.

If you aren’t sure where to start, hit the View source link on another Wikibase – like ours! – borrow the code, and start playing around. Anything you get wrong can be reverted via the View history tab, so little can really go wrong.

Note that to make a new page, there is no new page button of the kind you might be used to on WordPress or similar sites. To create a new page you need to manually enter the URL you want for your new page – such as https://beyond-notability.wikibase.cloud/wiki/Project:MyNewPage – in your browser, and then hit the create this page button to create the page from scratch.

Give your collaborators edit access

Once you are logged into your Wikibase, you will see a Special pages link on the left-side tower. Here you can find lots of useful pages for maintaining your site. One is the Create account page. Use this to add new people who will be collaborating with you on the Wikibase. Their user privileges can then be maintained via links in the Users and rights section of Special pages.

Create some linked data

Linked data is made up of Subject-Predicate-Object triples. These are both human and machine readable, meaning that – on our Wikibase – Margaret Sefton-Jones (Subject) was a member of (Predicate) the Royal Archaeological Institute (Object) is the same as bnwd:Q507 bnwdt:P67 bnwd:Q35.

Subjects and objects can change position (so, the Royal Archaeological Institute (Subject) has archives at (Predicate) the Society of Antiquaries of London (Object)). On Wikibase – as on Wikidata – both subjects and objects are represented by Q numbers and called “Items”. Predicates are the glue in the middle, represented by P numbers and called “Properties”. A Q-P-Q triple is known as a “Statement”.

To make a new item, hit New Item on the left-side tower. To make a new property, hit New Property on the left-side tower. Note that you must select a Data type for new properties otherwise they can’t be used to make statements. In most cases, the Data type will be Item, meaning that the property takes a Q number as its object. Common alternatives are Point in time or EDTF Date/Time (used for dates) and Monolingual text (used for adding free text).

Once you’ve made two items and a property you can make them into a statement. To do that works as follows:

Go to the item page for the item you want to be a subject, hit add statement, type in your P number (note that you can start typing the label for a P or Q in this box, but new items and properties won’t appear immediately because the search index for wikibase.cloud refreshes occasionally – usually daily at the slowest – to minimise resource use/impact) and click it.
Add your Q number in the next box and hit save to create your statement.
For more complex statements, create qualifiers to add detail to your statements and/or references to show where you got the information from. Qualifiers work the same way as statements so should feel intuitive (even if the logic takes a while to figure out – dig around our Wikibase and look at pages for individuals such as Margerie Venables Taylor if you need some guidance).

See who has been making what

Special pages are your friend. One really useful section is Recent changes and logs, which can give you a sense of what changes have been made recently, who has been doing what, and the new items that have been created in your Wikibase. If you are planning quality assurance work on your Wikibase, these logs are the place to start.

Use the ‘what links here’ pages

On the left side of each item and property page is the link What links here. This is an incredibly useful resource for navigating your emerging knowledge base, getting reports on usage of particular properties, and spotting quirks (and errors!) in the implementation of your data model.

For example, the What links here page for Margerie Venables Taylor gives you a quick sense of all the items – mostly for people – that link to her, in most cases because of her role in putting other women forward as Fellows of the Society of Antiquaries.

Equally, the What links here page for Oxford gives a sense of that place as hub for women’s intellectual communities in our period.

And the What links here page can also be useful for properties. For example, the What links here page for the property Archaeology Data Service person ID gives a list of all the people with ADS IDs in our Wikibase. That the result (at the time of writing) is 304 of 489 women in our Wikibase indicates the way our sources are revealing voices thus far unrecorded on other canonical services and persistent identifer infrastructures.

Write your first query

You can query your data with your ‘Query Service’, which can be accessed from the left pane. The Wikibase query services uses a query language called SPARQL, a standard query language for linked data. I have had a long and painful relationship with SPARQL – it isn’t all that easy to get your head around. Thankfully there are amazing resources out there to support query writing, notably Bob DuCharme’s book Learning SPARQL, and the Wikidata community maintains a range of example queries which give a sense of what is possible. Because a lot is possible.

We use SPARQL queries not only for analysing our data (for example, a query that returns people in our knowledge base sorted by the number of places they lived, including the number of cites/towns/villages in which they lived in), but also for auditing our data: for example, to return lists of people whose gender we’ve been unable to assign or people in the knowledge base listed alongside the external identifiers – e.g. Wikidata IDs – that we’ve been able to find. These connections with external IDs enable our linked data to link to other linked data, and are particularly powerful in enabling us – for example – to recover familial connections from Wikidata (where people have Wikidata IDs, and to the extent to which their familial connections are listed on Wikidata).

By building up our data, and connecting to external sources, we hope – in time – to be able to write more complex queries that support our research, including queries that return lists of women who undertook work within two years of having their first child, or those people who used their position in the field to bring women into the profession (a hacky version of which we’ve made a start on), and so on.

Join the community

When I run out of SPARQL talent (which happens often), Bob’s book and the examples of Wikidata often help me realise how to write the query I want. But if I’m totally stuck, I’ve also found that the Wikibase community is full of wonderful people willing to offer advice and guidance. Questions on Twitter are responded to. The Wikibase community on Telegram are a constant source of support and insight. And public tickets on Phabricator – where fixes and feature additions are proposed, prioritised, and tracked – help reveal which problems are your own, and which are shared; as well as being a space to log problems and suggest features. Like many such open source communities, the Wikibase community – as well as the wider Wikidata – are welcoming to beginners, full of expertise, and provide sustainability to the technology – software is, after all, about people. So, if you are thinking of using the Wikibase, join the community, dig around the community activity, don’t be afraid to ask the community and when you have insights to share or wish to contribute to the community.

License your data

People need to know how they can use your data. So make it easy for them. Good linked data enables data to be connected, queried across, and assembled from various sources. So clearly state your terms of use (data on Wikidata is available under the Creative Commons CC0 License) so that it can be used. Better still use your Wikibase to document your data so that people using it get a sense of the decisions you’ve made, the absences you are aware of, and the uses you think would be inappropriate or might cause harm. If you are not sure where to start, see Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford’s wonderful ‘Datasheets for Datasets’ (2020) – you don’t need to follow every suggestion, but given that you will be creating machine readable data on your Wikibase, it is sure to provide inspiration.

By James Baker (Co-Investigator, Beyond Notability)

For all that people like to moan about the things that are wrong on Wikipedia (and there is much that is wrong on Wikipedia), it is the place people go to when they want to know something: together with the other sites run by the Wikimedia Foundation, Wikipedia is the knowledge infrastructure of the web. Since 2010 cultural institutions have formally contributed to this ecosystem through Wikimedian-in-Residence programmes, typically resulting in digitised material appearing on Wikimeda Commons, the home for every media artefact you encounter when browsing Wikipedia.

More recently a number of those Wikimedian-in-Residence programmes have directed attention towards Wikidata, a multilingual knowledge graph that is a common source of open data used on Wikipedia. More significantly, every time you search Google and a little info box pops up on the right side of the screen containing useful – typically biographical – information, that is probably drawn from Wikidata. In turn a person without a Wikidata page is unlikely to get a box. And so if less than 20% of Wikipedia Biographies are about women, and if most Wikipedia biographies have a corresponding Wikidata page, then it follows that enriching Wikidata with otherwise neglected histories of women active in archaeology, history and heritage is something worth attention. Hence, our project.

Wikidata is a wiki (a collaboratively edited hypertext publication) whose technical infrastructure is based on a combination of the software MediaWiki and a set of knowledge graph MediaWiki extensions known as Wikibase, the workings of which are explained in the ’Introducing Our Database’ post. We have built the Beyond Notability Knowledge Base on the same infrastructure, using Wikibase-as-a-service, first via WbStack (with amazing support from Adam Shorland) and latterly via the Wikimedia Deutschland hosted Wikibase Cloud (with thanks to Mohammed Sadat). In this blog we list the Top 4 reasons why we took this approach.

1. Aligning Biographical Approaches

We can’t record the evidence we find directly onto Wikidata because many of the women we encounter in our research do not meet Wikidata’s ‘notability threshold’ – in some cases because evidence for their work in archaeology, history, and heritage is fragmentary, in other cases because the evidence needs to be assembled first to get over that threshold. Despite this, it wouldn’t make much sense for us to design from scratch a biographical database. And so we align our approach with Wikidata because, in part, it gives us an ontological platform to build on, a template for how to represent things like familial relations, office holding, and residences.

2. Beyond Notability as a Trusted Source

It made sense then to use the same technical infrastructures as Wikidata for our knowledge base. But whilst alignment is useful we cannot – as discussed in our recent blog ‘On Working with Gender – faithfully follow the Wikidata model for representing biographical information: the historically-specific circumstances in which women were working in the late-nineteenth and early-twentieth century are an awkward fit for a data model orientated around modern ways of being in the Global North: indeed, our project is a test of the capacities of data models like Wikidata to capture and represent these women’s lives. Given this need to diverge, given the choices we are making to diverge from Wikidata-as-canon, using the same software platform as Wikidata, the same visual and ontological aesthetic, supports our ambition for the Beyond Notability Knowledge Base to be regarded as a trusted source of biographical information. This is important because we think our work can make vital contributions to Wikidata. Take as an example Gwenllian Morgan, the subject of our previous blog. Prior to our project she was not listed on Wikidata as being a Fellow of the Society of Antiquaries (the construction of which on Wikidata uses the’award received’ property). But now she is, with the amended Wikidata entry using Beyond Notability as the source of this information.

3. Querying Between Knowledge Bases

Recording Gwenllian Morgan as a Fellow of the Society of Antiquaries (FSA) means that any queries that use Wikidata to return a list of FSAs will now include her, as one of the many people that link to the Wikidata item Fellow of the Society of Antiquaries (Q26196499). These queries can be made through the Wikidata Query Service, a SPARQL endpoint, “SPARQL” here meaning the query language used to interrogate graph databases. Building the Beyond Notability Knowledge Base on the same technologies as Wikidata means not only that we too have a SPARQL Query Service but also that both sets of data are organised using the same underlying principles, allowing us to more easily write queries that simultaneously interrogate both knowledge bases (and, indeed, any other knowledge bases that take a similar form).

We are already doing this kind of cross-querying to help our data entry. For example, we are using this..

PREFIX bnwd: <http://beyond-notability.wiki.opencura.com/entity/>
PREFIX bnwds: <http://beyond-notability.wiki.opencura.com/entity/statement/>
PREFIX bnwdv: <http://beyond-notability.wiki.opencura.com/value/>
PREFIX bnwdt: <http://beyond-notability.wiki.opencura.com/prop/direct/>
PREFIX bnp: <http://beyond-notability.wiki.opencura.com/prop/>
PREFIX bnps: <http://beyond-notability.wiki.opencura.com/prop/statement/>
PREFIX bnpq: <http://beyond-notability.wiki.opencura.com/prop/qualifier/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd:  <http://www.wikidata.org/entity/>

SELECT ?person ?personLabel ?item ?WD_DOB ?WD_DOD
WHERE {  
  ?person bnwdt:P16 ?isFSA . #select FSA
  FILTER NOT EXISTS {?person bnwdt:P4 bnwd:Q12 .} #filter out project team
  ?person bnwdt:P14 ?url . #look for wikidata URL on person page
  BIND(IRI(REPLACE(?url,"https://www.wikidata.org/wiki/","http://www.wikidata.org/entity/")) as ?item ) 
  
  SERVICE <https://query.wikidata.org/sparql> {
        ?item wdt:P21 wd:Q6581072 . #select women
        OPTIONAL {?item wdt:P569 ?WD_DOB . } #recall date of birth
        OPTIONAL {?item wdt:P570 ?WD_DOD . } #recall data of death
      }
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en-gb". } 
}

..query to return a list of all woman on our knowledge base with corresponding Wikidata entries and – where present – their dates of birth and death as listed on Wikidata (and yes, it could be a better query, I’m still learning). This is important to know, because we intend to use Wikidata to run queries that rely on this information – for example, return all the women who became Fellows of the Society of Antiquaries before they were 40 – for those women on Wikidata (for those who aren’t, we will record that data on our knowledge base).

As we develop more research orientated queries, using a comparable infrastructure to Wikidata gives us more example queries to draw on for inspiration and guidance. One such query is helping to develop our understanding of the interpersonal connections that women relied on to get recognition for their work, and who were key allies for women in the period. Other queries we are starting to imagine and this is helping shape the data we include in the Beyond Notability Knowledge Base. For example, in order to successfully run a queries that returns a list of all women in our knowledge base who undertook professional activities within 3 years of becoming a mother, we need a record of when their children were born, data which only exists in Wikidata for women whose children are all considered ‘notable’. We therefore have started to formulate plans for how to record information about motherhood, and other life events, in a way that preserves our imperative to centre women in our data.

4. A Community

Finally, we choose Wikibase because it isn’t just a piece of software, it is a community. The Wikibase Stakeholder Group is providing a space where we can gain expertise, share ideas, and demonstrate our commitment to trustworthy linked open data infrastructures. Our particular thanks go to Adam Shorland, Laurence ‘GreenReaper’ Parry, Lozana Rossenova, Maarten Brinkerink, and Maarten Zeinstra. We look forward to continuing to work with you over the next few years of our project.

Tag: infrastructure

Getting started with wikibase.cloud for heritage projects