Getting started with wikibase.cloud for heritage projects

By James Baker (Co-Investigator)

The Beyond Notability Knowledge Base stores biographical information about women’s work in archaeology, history, and heritage in Britain between 1870 and 1950, information gathered during the course of our AHRC-funded research. We create information in the form of semantic triples, machine and human reading statements that describe the relationship between two things: the Miss Hemming lived in Uxbridge, that Louisa Elizabeth Deane was a donor to the Society of Antiquaries of London in 1887, that Harriet Loyd Lindsay destroyed the Yew Down barrow in 1906.

Wikidata, which celebrates its 10th birthday this Autumn, is the pre-eminent knowledge base for machine readable linked data describing the relationship between people and things. Whilst we are adding and enriching wikidata, and whilst we use it as a source of information we choose not to duplicate, we maintain our research on a separate knowledge base because we need to describe relationships that are too particular to us to represent on Wikidata, and because we diverge from the Wikidata community in how some concepts – such as gender expression – should be described.

If you visually compare our knowledge base with Wikidata you’ll notice that they look remarkably similar. This is because they use the same underlaying software – Wikibase – to create, maintain, manage, and query semantic triples. Since June this year, our Beyond Notability Wikibase instance has been hosted by Wikimedia Deutschland via their wikibase.cloud service. wikibase.cloud enables people who want to run a Wikibase but don’t have the (technical or financial) capacity to run their own instance, to create a Wikibase on a shared hosting platform with minimal configuration.

This post describes how to get started, key points to consider, and some basic things to do to make your work with wikibase.cloud easier.

Create an Instance

At the time of writing, wikibase.cloud is in a closed beta, which means they are not accepting account requests. However, you can sign up for early access and join community mailing list.

Once you have a login, you can create a new wiki by choosing a site name, deciding a prefix to .wikibase.cloud, and then creating your wiki. From there you have a few important configuration options:

  • to set a site logo;
  • to edit your site skin from three options (ours is “Vector”);
  • to select whether users of your Wikibase can create accounts and edit straight away, or require your approval (we have the latter);
  • whether or not to map your properties to those on Wikidata (we don’t, for reasons).

Editing pages

Editing a page – e.g. a landing page or a list of queries – on your wikibase.cloud instance is the same as editing a page on Wikipedia in that both use the same syntax: so, ==HEADING== for a heading, * for a bullet, [http://www.foo.bar My Website] for a link, etc.

If you aren’t sure where to start, hit the View source link on another Wikibase – like ours! – borrow the code, and start playing around. Anything you get wrong can be reverted via the View history tab, so little can really go wrong.

Note that to make a new page, there is no new page button of the kind you might be used to on WordPress or similar sites. To create a new page you need to manually enter the URL you want for your new page – such as https://beyond-notability.wikibase.cloud/wiki/Project:MyNewPage – in your browser, and then hit the create this page button to create the page from scratch.

Give your collaborators edit access

Once you are logged into your Wikibase, you will see a Special pages link on the left-side tower. Here you can find lots of useful pages for maintaining your site. One is the Create account page. Use this to add new people who will be collaborating with you on the Wikibase. Their user privileges can then be maintained via links in the Users and rights section of Special pages.

Create some linked data

Linked data is made up of Subject-Predicate-Object triples. These are both human and machine readable, meaning that – on our Wikibase – Margaret Sefton-Jones (Subject) was a member of (Predicate) the Royal Archaeological Institute (Object) is the same as bnwd:Q507 bnwdt:P67 bnwd:Q35.

Subjects and objects can change position (so, the Royal Archaeological Institute (Subject) has archives at (Predicate) the Society of Antiquaries of London (Object)). On Wikibase – as on Wikidata – both subjects and objects are represented by Q numbers and called “Items”. Predicates are the glue in the middle, represented by P numbers and called “Properties”. A Q-P-Q triple is known as a “Statement”.

To make a new item, hit New Item on the left-side tower. To make a new property, hit New Property on the left-side tower. Note that you must select a Data type for new properties otherwise they can’t be used to make statements. In most cases, the Data type will be Item, meaning that the property takes a Q number as its object. Common alternatives are Point in time or EDTF Date/Time (used for dates) and Monolingual text (used for adding free text).

Once you’ve made two items and a property you can make them into a statement. To do that works as follows:

  • Go to the item page for the item you want to be a subject, hit add statement, type in your P number (note that you can start typing the label for a P or Q in this box, but new items and properties won’t appear immediately because the search index for wikibase.cloud refreshes occasionally – usually daily at the slowest – to minimise resource use/impact) and click it.
  • Add your Q number in the next box and hit save to create your statement.
  • For more complex statements, create qualifiers to add detail to your statements and/or references to show where you got the information from. Qualifiers work the same way as statements so should feel intuitive (even if the logic takes a while to figure out – dig around our Wikibase and look at pages for individuals such as Margerie Venables Taylor if you need some guidance).

See who has been making what

Special pages are your friend. One really useful section is Recent changes and logs, which can give you a sense of what changes have been made recently, who has been doing what, and the new items that have been created in your Wikibase. If you are planning quality assurance work on your Wikibase, these logs are the place to start.

Use the ‘what links here’ pages

On the left side of each item and property page is the link What links here. This is an incredibly useful resource for navigating your emerging knowledge base, getting reports on usage of particular properties, and spotting quirks (and errors!) in the implementation of your data model.

For example, the What links here page for Margerie Venables Taylor gives you a quick sense of all the items – mostly for people – that link to her, in most cases because of her role in putting other women forward as Fellows of the Society of Antiquaries.

Equally, the What links here page for Oxford gives a sense of that place as hub for women’s intellectual communities in our period.

And the What links here page can also be useful for properties. For example, the What links here page for the property Archaeology Data Service person ID gives a list of all the people with ADS IDs in our Wikibase. That the result (at the time of writing) is 304 of 489 women in our Wikibase indicates the way our sources are revealing voices thus far unrecorded on other canonical services and persistent identifer infrastructures.

Write your first query

You can query your data with your ‘Query Service’, which can be accessed from the left pane. The Wikibase query services uses a query language called SPARQL, a standard query language for linked data. I have had a long and painful relationship with SPARQL – it isn’t all that easy to get your head around. Thankfully there are amazing resources out there to support query writing, notably Bob DuCharme’s book Learning SPARQL, and the Wikidata community maintains a range of example queries which give a sense of what is possible. Because a lot is possible.

We use SPARQL queries not only for analysing our data (for example, a query that returns people in our knowledge base sorted by the number of places they lived, including the number of cites/towns/villages in which they lived in), but also for auditing our data: for example, to return lists of people whose gender we’ve been unable to assign or people in the knowledge base listed alongside the external identifiers – e.g. Wikidata IDs – that we’ve been able to find. These connections with external IDs enable our linked data to link to other linked data, and are particularly powerful in enabling us – for example – to recover familial connections from Wikidata (where people have Wikidata IDs, and to the extent to which their familial connections are listed on Wikidata).

By building up our data, and connecting to external sources, we hope – in time – to be able to write more complex queries that support our research, including queries that return lists of women who undertook work within two years of having their first child, or those people who used their position in the field to bring women into the profession (a hacky version of which we’ve made a start on), and so on.

Join the community

When I run out of SPARQL talent (which happens often), Bob’s book and the examples of Wikidata often help me realise how to write the query I want. But if I’m totally stuck, I’ve also found that the Wikibase community is full of wonderful people willing to offer advice and guidance. Questions on Twitter are responded to. The Wikibase community on Telegram are a constant source of support and insight. And public tickets on Phabricator – where fixes and feature additions are proposed, prioritised, and tracked – help reveal which problems are your own, and which are shared; as well as being a space to log problems and suggest features. Like many such open source communities, the Wikibase community – as well as the wider Wikidata – are welcoming to beginners, full of expertise, and provide sustainability to the technology – software is, after all, about people. So, if you are thinking of using the Wikibase, join the community, dig around the community activity, don’t be afraid to ask the community and when you have insights to share or wish to contribute to the community.

License your data

People need to know how they can use your data. So make it easy for them. Good linked data enables data to be connected, queried across, and assembled from various sources. So clearly state your terms of use (data on Wikidata is available under the Creative Commons CC0 License) so that it can be used. Better still use your Wikibase to document your data so that people using it get a sense of the decisions you’ve made, the absences you are aware of, and the uses you think would be inappropriate or might cause harm. If you are not sure where to start, see Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford’s wonderful ‘Datasheets for Datasets’ (2020) – you don’t need to follow every suggestion, but given that you will be creating machine readable data on your Wikibase, it is sure to provide inspiration.