Looking for…science fiction movies on the Linked Data Cloud

When working in technology, sometimes you find yourself day-dreaming about Artificial Intelligence, Space Ships, Aliens. The trip can be intensified by a legacy of Science Fiction movies, that inspire and give motivation to work harder on the actual project. The bad part about being a science fiction junkie is that you always search for new movies worth watching. Along the years you become pickier and pickier, it’s never enough.

After re-reading The Foundations Series by Isaac Asimov, you crave for more and have the urge to find available movies with a solid background in literature. That seems to be a good filter for quality:

You want to watch all sci-fi movies inspired by books.

Before watching them of course you need to list them. There are many resources on the web to accomplish the task:

1 – imdb, rotten tomatoes: they offer detailed information about the movie, e.g. what the movie is about, what are the actors, some reviews. There are interesting user curated lists that partially satisfy your requirements, for example a list of the best sci-fi movies from 2000 to now. These websites are good resources to get started, but they don’t offer a search for the details you care about.

2 – individual blogs: you may be lucky if a search engine indexed an article that exactly replies to your questions. The web is huge and someone might have been so brave to do the research himself and so generous to put it online. Not always the case, and absolutely not reliable.

3 – Linked Data Cloud: the web of data comes as a powerful resource to query the web at atomic detail. Both dbPedia and Wikidata, the LOD versions of Wikipedia, contain thousands of movies and plenty of details for each. Since the LOD cloud is a graph database hosted by the public web, you can freely ask very specific and domain crossing questions obtaining as a result pure, diamond data. Technically this is more challenging, some would say “developer only”, but at InsideOut we are working to democratize this opportunity.

From the title of the post you may already know what option we like most, so let’s get into the “how to”.
We need to send a query to the Wikidata public SPARQL endpoint. Let’s start from a visual depiction of our query, expressing concepts (circles) and relation between them (arrows).

movies_based_on_scifi

Let’s color in white what we want in output from the query and in blue what we already know.

movies_based_on_scifi

– Why is it necessary to specify that m is a Movie?
Writings can inspire many things, for example a song or a political movement, but we only want Movies.

– Why it is not specified also that w is a Writing and p is a Person?
Movies come out of both books, short stories and sometimes science essays. We want our movies to be inspired by something that was written, and this relation is implied by the “author” relation. The fact that p is a person is implied from the fact that only persons write science fiction (at least until year 2015).

Let’s reframe the picture in a set of triples (subject-predicate-object statements), the kind of language a graph database can understand. We call m (movie), w (writing) and p (person) our incognitas, then we define the properties and relations they must match.

  • m is a Movie
  • m is based on w
  • w is written by p
  • p is a science fiction writer

Since the graph we are querying is the LOD cloud, the components of our triples are internet addresses and the query language we use is SPARQL. See below how we translate the triples above in actual Wikidata classes and properties. Keep in mind that movies, persons and writings are incognitas, so they are expressed with the ?x syntax. Everything after # is a comment.

?m <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q11424> .
# ?m is a Movie
?m <http://www.wikidata.org/prop/direct/P144> ?w .
# ?m is based on ?w
?w <http://www.wikidata.org/prop/direct/P50> ?p .
# ?w written by ?p
?p <http://www.wikidata.org/prop/direct/P106> <http://www.wikidata.org/entity/Q18844224> .
# ?p is a science fiction writer

As you can see the triples’ components are links, and if you point your browser there you can fetch triples in which the link itself is the subject. That’s the major innovation of the semantic web in relation to any other kind of graph database: it is as huge and distributed as the web. Take a few moments to appreciate the idea and send your love to Tim Berners-Lee.

Similarly to SQL, we can express in SPARQL that we are selecting data with the SELECT…WHERE keywords. The PREFIX syntax makes our query more readable by making the URIs shorter:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT ?w ?m ?p WHERE {
?m wdt:P31 wd:Q11424 . # ?m is a Movie
?m wdt:P144 ?w . # ?m is based on ?w
?w wdt:P50 ?p . # ?w written by ?p
?p wdt:P106 wd:Q18844224 . # ?p is a science fiction writer
}

If you run the query above you will get as result a set of addresses, being the URI of the movies, writings and persons we searched for. We should query directly for the name, so let’s introduce ml (a label for the movie), wl (a label for the writing) and pl (a label for the person). We also impose the label language to be in english, via the FILTER command.

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?pl ?wl ?ml WHERE {
?m wdt:P31 wd:Q11424 . # ?m is a Movie
?m wdt:P144 ?w . # ?m is based on ?w
?w wdt:P50 >?p . # ?w written by ?p
?p wdt:P106 wd:Q18844224 . # ?p is a science fiction writer
?p rdfs:label ?pl . # ?p name is ?pl
?w rdfs:label ?wl . # ?w name is ?wl
?m rdfs:label ?ml . # ?m name is ?ml
FILTER(LANG(?pl) = "en") # ?pl is in english
FILTER(LANG(?wl) = "en") # ?wl is in english
FILTER(LANG(?ml) = "en") # ?ml is in english
}

Let’s run the query on the dedicated Wikidata service. You can image this process as SPARQL trying to match the pattern in our picture on all its data, and giving back as result only the values of our incognitas that can satisfy the constraints. The results are:

pl wl ml
Carl Sagan Contact Contact
Philip K. Dick Do Androids Dream of Electric Sheep? Blade Runner
Philip K. Dick Paycheck Paycheck
Philip K. Dick The Golden Man Next
H. G. Wells The War of the Worlds War of the Worlds
H. G. Wells The War of the Worlds The War of the Worlds
H. G. Wells The War of the Worlds War of the Worlds 2: The Next Wave
H. G. Wells The War of the Worlds H. G. Wells’ War of the Worlds
Mikhail Bulgakov Ivan Vasilievich Ivan Vasilievich: Back to the Future
Mikhail Bulgakov Heart of a Dog Cuore di cane
Mikhail Bulgakov The Master and Margarita Pilate and Others
H. G. Wells The Shape of Things to Come Things to Come
H. G. Wells The Time Machine The Time Machine
H. G. Wells The Island of Doctor Moreau The Island of Dr. Moreau
H. G. Wells The Time Machine The Time Machine
H. G. Wells The Invisible Man The Invisible Man
H. G. Wells The First Men in the Moon First Men in the Moon
H. G. Wells The Invisible Man The Invisible Woman
Isaac Asimov The Bicentennial Man Bicentennial Man
Isaac Asimov I, Robot I, Robot
Isaac Asimov The Caves of Steel I, Robot
Philip K. Dick Adjustment Team The Adjustment Bureau
Philip K. Dick Second Variety Screamers
Philip K. Dick Impostor Impostor
Philip K. Dick Radio Free Albemuth Radio Free Albemuth
Philip K. Dick We Can Remember It for You Wholesale Total Recall
Philip K. Dick The Minority Report Minority Report
Philip K. Dick A Scanner Darkly A Scanner Darkly
Daniel Keyes Flowers for Algernon Charly
Kingsley Amis Lucky Jim Lucky Jim (1957 film)
Kingsley Amis That Uncertain Feeling Only Two Can Play
John Wyndham The Midwich Cuckoos Village of the Damned
Fritz Leiber Conjure Wife Night of the Eagle
Brian Aldiss Super-Toys Last All Summer Long A.I. Artificial Intelligence
John Steakley Vampire$ Vampires
Iain Banks Complicity Complicity

You got new quality movies to buy and watch, to satisfy the sci-fi addiction. Our query is just an hint of the immense power unleashed by linked data. Stay tuned to get more tutorials, and check out WordLift, the plugin we are launching to manage and produce linked data directly from WordPress.

Some fun exercises:

  • EASY: get the movies inspired to writings of Isaac Asimov
  • MEDIUM: get all the movies inspired by women writers
  • HARD: get all music artists whose songs were featured in a TV series