RDF, XSD dates, and the beginning of time

February 24, 2015

Let's explore what happens to unusual dates when they are exported from Wikidata (via RDF) and imported into Blazegraph.

Let's find the Universe's start time:

prefix wdq: <http://www.wikidata.org/entity/>
select ?x WHERE {
  wdq:Q1 wdq:P580s ?x
}
x
<http://www.wikidata.org/entity/Q1S789eef0c-4108-cdda-1a63-505cdd324564>

This means there exists a statement:

wdq:Q1 wdq:P580s wdq:Q1S789eef0c-4108-cdda-1a63-505cdd324564

Let's find out about wdq:Q1S789eef0c-4108-cdda-1a63-505cdd324564:

prefix wdq: <http://www.wikidata.org/entity/>
select ?x ?y WHERE {
  wdq:Q1S789eef0c-4108-cdda-1a63-505cdd324564 ?x ?y
}
x y
<http://www.w3.org/ns/prov#wasDerivedFrom> <http://www.wikidata.org/entity/Rd7fdd0d3f5ec956ea699679ee7109c9e>
<http://www.wikidata.org/entity/P459q> <http://www.wikidata.org/entity/Q15605>
<http://www.wikidata.org/entity/P459q> <http://www.wikidata.org/entity/Q76250>
<http://www.wikidata.org/entity/P580v> <http://www.wikidata.org/entity/VT392fa31586a0bde63ee928c91b586004>
<http://www.wikidata.org/entity/P805q> <http://www.wikidata.org/entity/Q500699>
rdf:type <http://www.wikidata.org/ontology#Statement>

That value for P580v looks interesting. Let's check it out:

prefix wdq: <http://www.wikidata.org/entity/>
select ?x ?y WHERE {
  wdq:VT392fa31586a0bde63ee928c91b586004 ?x ?y
}
x y
<http://www.wikidata.org/ontology#preferredCalendar> <http://www.wikidata.org/entity/Q1985727>
<http://www.wikidata.org/ontology#time> 1196
<http://www.wikidata.org/ontology#timePrecision> 1
rdf:type <http://www.wikidata.org/ontology#TimeValue>

It appears that the Universe began in 1196 AD. That doesn't seem right, considering it's listed as 13798 million years BCE.

If we look into the Wikidata RDF dump, we see:

$ grep VT392fa31586a0bde63ee928c91b586004 wikidata-statements.nt
<http://www.wikidata.org/entity/Q1S789eef0c-4108-cdda-1a63-505cdd324564> <http://www.wikidata.org/entity/P580v> <http://www.wikidata.org/entity/VT392fa31586a0bde63ee928c91b586004> .
<http://www.wikidata.org/entity/VT392fa31586a0bde63ee928c91b586004> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.wikidata.org/ontology#TimeValue> .
<http://www.wikidata.org/entity/VT392fa31586a0bde63ee928c91b586004> <http://www.wikidata.org/ontology#time> "-13800000000"^^<http://www.w3.org/2001/XMLSchema#gYear> .
<http://www.wikidata.org/entity/VT392fa31586a0bde63ee928c91b586004> <http://www.wikidata.org/ontology#timePrecision> "1"^^<http://www.w3.org/2001/XMLSchema#int> .
<http://www.wikidata.org/entity/VT392fa31586a0bde63ee928c91b586004> <http://www.wikidata.org/ontology#preferredCalendar> <http://www.wikidata.org/entity/Q1985727> .

The statement indicates the date as the year -13,800,000,000. This seems reasonable, so let's investigate the data type, XMLSchema#gYear.

We see that the "value space of gYear is the set of Gregorian calendar years as defined in ยง 5.2.1 of ISO 8601", so let's theck out ISO 8601.

It appears that ISO 8601 years are restricted to four-digit years from 0000 to 9999. The standard allows for expansion outside this range, but it must be agreed upon by both the producer and the consumer of the data, so in this case it looks like Blazegraph is not expecting the actual value.