WordLift 3.0: A brief semantic story – part 1

In the world of digital networks, the term knowledge is generically used to identify and justify all activities aimed at improving data collection and organization. Of all sorts.

Knowledge can be improved when information is made available for a variety of readings and reports aimed to interpret reality, fantasize on trends, evolution, a possible future, in order to somehow control or dominate it. 

Project processes have a necessary, preparatory activity in a project program, called identification of the reference scenario. In short, it consists in discovering and assimilating background contexts, or those that prepare the scene in which the subject of the study, as if it was an actor, inserts itself to explain the reasons for the first plan.

In computing knowledge is part of artificial intelligence. In this field the aim is (was) to achieve automation through strategies by making attempts and mistakes. This way of sketching a scenario is called  Knowledge Representation (wikipediaEN). This symbolic representation was limited by the difficulty to relate various scenario. The usual Tim Berners-Lee, still a WWW leader, is to blame for its evolution. Through the W3C he launched in 1996 the XML standard allowing to add semantic information to contents, so they could be related. It’s the beginning of the Semantic Web which made it possible to publish, alongside documents, information and data in a format allowing machines to automatically process them.

“Most of the information content in today’s web is designed to be read only by human beings …” ( Tim Berners-Lee again) “computers cannot process the language in web pages”.

Semantic web means a web whose content is structured so that software can read it: read it, answer questions and interact with users.

Introduction freely adapted from .. and for whoever wants to know the whole story.

Having introduced the value of any operation aimed to develop what will automatically set or suggest the metadata to be attached to the content in order to make it readable by machines, one still has  to understand and define the following: what are the components of this structure or metadata?   How can the significant elements be extracted uniformly disregarding the language? Which types of ontological categorisation and which relations must be activated in a content in order for it to become part of a  semantic web for all? And especially: how can all this be done simultaneously?

And this is where the whole research and development area that revolves around the semantic technologies got stuck. We believe that this impasse was also caused by the lack of agreement among the various scientific paths necessary to achieve any kind of standardization. And also because of language and lexical differences, which are pushed towards a kind of ‘local’ multi-language system by the web itself and by the technologies that are distributed.

Considering the topic and the context of this post, we should leap from 1986, when the first markup languages were born, to 1998, when the standard XML was defined, and finally today, November 2015. We have performed this leap, at least partially, by means of  a query (described here below) on Wikidata.

The path we have followed (considering that our group lacks scientific skills distributed among all the included fields of knowledge) involves:

  • accepting that semantic technologies as they had been conceived and applied could not fully meet our need to make the machines understand and order content;
  • redefining the context after the cultural and economic affirmation of the open data world and the data structure of the Linked Open Data.

Therefore, remembering what was dictated by the Austrian logician, mathematician and philosopher  Gödel (also loved by the computing world), who stated:  a world cannot be understood from inside the system itself; in order to understand any of it, we have to go out and observe it from the outside; we have initially deconstructed it by enclosing  in sets all that would have necessarily been part of the final solution and then we turned to the  world that preceded the current one: the  analogical world and how it had tackled and replied to problems arising from the organization and classification of large amounts of “knowledge”.

A study/guide was very useful to us (and we therefore thank its authors): Organizzare la conoscenza: dalle biblioteche all’architettura dell’informazione per il web (Claudio Gnoli, Vittorio Marino and Luca Rosati).

The query on Wikidata to reconstruct the story of markup languages

Here below is the query you can make with a click (result were incomplete because we only entered languages whose creation date has a value in Wikidata – this value is expressed by Property:P571).

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX v: <http://www.wikidata.org/prop/statement/>
PREFIX q: <http://www.wikidata.org/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?entity ?ml ?sl WHERE {
 ?entity wdt:P31 wd:Q37045 . # ?entity is a markup language
 ?entity wdt:P571 ?sl . # ?sl is the inception date of ?entity
 ?entity rdfs:label ?ml . # ?entity name is ?ml
 FILTER(LANG(?ml) = "it") # ?ml is in Italian
 ORDER by ?sl
 LIMIT 100

…. continues


Looking for…science fiction movies on the Linked Data Cloud

When working in technology, sometimes you find yourself day-dreaming about Artificial Intelligence, Space Ships, Aliens. The trip can be intensified by a legacy of Science Fiction movies, that inspire and give motivation to work harder on the actual project. The bad part about being a science fiction junkie is that you always search for new movies worth watching. Along the years you become pickier and pickier, it’s never enough.

After re-reading The Foundations Series by Isaac Asimov, you crave for more and have the urge to find available movies with a solid background in literature. That seems to be a good filter for quality:

You want to watch all sci-fi movies inspired by books.

Before watching them of course you need to list them. There are many resources on the web to accomplish the task:

1 – imdb, rotten tomatoes: they offer detailed information about the movie, e.g. what the movie is about, what are the actors, some reviews. There are interesting user curated lists that partially satisfy your requirements, for example a list of the best sci-fi movies from 2000 to now. These websites are good resources to get started, but they don’t offer a search for the details you care about.

2 – individual blogs: you may be lucky if a search engine indexed an article that exactly replies to your questions. The web is huge and someone might have been so brave to do the research himself and so generous to put it online. Not always the case, and absolutely not reliable.

3 – Linked Data Cloud: the web of data comes as a powerful resource to query the web at atomic detail. Both dbPedia and Wikidata, the LOD versions of Wikipedia, contain thousands of movies and plenty of details for each. Since the LOD cloud is a graph database hosted by the public web, you can freely ask very specific and domain crossing questions obtaining as a result pure, diamond data. Technically this is more challenging, some would say “developer only”, but at InsideOut we are working to democratize this opportunity.

From the title of the post you may already know what option we like most, so let’s get into the “how to”.
We need to send a query to the Wikidata public SPARQL endpoint. Let’s start from a visual depiction of our query, expressing concepts (circles) and relation between them (arrows).


Let’s color in white what we want in output from the query and in blue what we already know.


– Why is it necessary to specify that m is a Movie?
Writings can inspire many things, for example a song or a political movement, but we only want Movies.

– Why it is not specified also that w is a Writing and p is a Person?
Movies come out of both books, short stories and sometimes science essays. We want our movies to be inspired by something that was written, and this relation is implied by the “author” relation. The fact that p is a person is implied from the fact that only persons write science fiction (at least until year 2015).

Let’s reframe the picture in a set of triples (subject-predicate-object statements), the kind of language a graph database can understand. We call m (movie), w (writing) and p (person) our incognitas, then we define the properties and relations they must match.

  • m is a Movie
  • m is based on w
  • w is written by p
  • p is a science fiction writer

Since the graph we are querying is the LOD cloud, the components of our triples are internet addresses and the query language we use is SPARQL. See below how we translate the triples above in actual Wikidata classes and properties. Keep in mind that movies, persons and writings are incognitas, so they are expressed with the ?x syntax. Everything after # is a comment.

?m <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q11424> .
# ?m is a Movie
?m <http://www.wikidata.org/prop/direct/P144> ?w .
# ?m is based on ?w
?w <http://www.wikidata.org/prop/direct/P50> ?p .
# ?w written by ?p
?p <http://www.wikidata.org/prop/direct/P106> <http://www.wikidata.org/entity/Q18844224> .
# ?p is a science fiction writer

As you can see the triples’ components are links, and if you point your browser there you can fetch triples in which the link itself is the subject. That’s the major innovation of the semantic web in relation to any other kind of graph database: it is as huge and distributed as the web. Take a few moments to appreciate the idea and send your love to Tim Berners-Lee.

Similarly to SQL, we can express in SPARQL that we are selecting data with the SELECT…WHERE keywords. The PREFIX syntax makes our query more readable by making the URIs shorter:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT ?w ?m ?p WHERE {
?m wdt:P31 wd:Q11424 . # ?m is a Movie
?m wdt:P144 ?w . # ?m is based on ?w
?w wdt:P50 ?p . # ?w written by ?p
?p wdt:P106 wd:Q18844224 . # ?p is a science fiction writer

If you run the query above you will get as result a set of addresses, being the URI of the movies, writings and persons we searched for. We should query directly for the name, so let’s introduce ml (a label for the movie), wl (a label for the writing) and pl (a label for the person). We also impose the label language to be in english, via the FILTER command.

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?pl ?wl ?ml WHERE {
?m wdt:P31 wd:Q11424 . # ?m is a Movie
?m wdt:P144 ?w . # ?m is based on ?w
?w wdt:P50 >?p . # ?w written by ?p
?p wdt:P106 wd:Q18844224 . # ?p is a science fiction writer
?p rdfs:label ?pl . # ?p name is ?pl
?w rdfs:label ?wl . # ?w name is ?wl
?m rdfs:label ?ml . # ?m name is ?ml
FILTER(LANG(?pl) = "en") # ?pl is in english
FILTER(LANG(?wl) = "en") # ?wl is in english
FILTER(LANG(?ml) = "en") # ?ml is in english

Let’s run the query on the dedicated Wikidata service. You can image this process as SPARQL trying to match the pattern in our picture on all its data, and giving back as result only the values of our incognitas that can satisfy the constraints. The results are:

pl wl ml
Carl Sagan Contact Contact
Philip K. Dick Do Androids Dream of Electric Sheep? Blade Runner
Philip K. Dick Paycheck Paycheck
Philip K. Dick The Golden Man Next
H. G. Wells The War of the Worlds War of the Worlds
H. G. Wells The War of the Worlds The War of the Worlds
H. G. Wells The War of the Worlds War of the Worlds 2: The Next Wave
H. G. Wells The War of the Worlds H. G. Wells’ War of the Worlds
Mikhail Bulgakov Ivan Vasilievich Ivan Vasilievich: Back to the Future
Mikhail Bulgakov Heart of a Dog Cuore di cane
Mikhail Bulgakov The Master and Margarita Pilate and Others
H. G. Wells The Shape of Things to Come Things to Come
H. G. Wells The Time Machine The Time Machine
H. G. Wells The Island of Doctor Moreau The Island of Dr. Moreau
H. G. Wells The Time Machine The Time Machine
H. G. Wells The Invisible Man The Invisible Man
H. G. Wells The First Men in the Moon First Men in the Moon
H. G. Wells The Invisible Man The Invisible Woman
Isaac Asimov The Bicentennial Man Bicentennial Man
Isaac Asimov I, Robot I, Robot
Isaac Asimov The Caves of Steel I, Robot
Philip K. Dick Adjustment Team The Adjustment Bureau
Philip K. Dick Second Variety Screamers
Philip K. Dick Impostor Impostor
Philip K. Dick Radio Free Albemuth Radio Free Albemuth
Philip K. Dick We Can Remember It for You Wholesale Total Recall
Philip K. Dick The Minority Report Minority Report
Philip K. Dick A Scanner Darkly A Scanner Darkly
Daniel Keyes Flowers for Algernon Charly
Kingsley Amis Lucky Jim Lucky Jim (1957 film)
Kingsley Amis That Uncertain Feeling Only Two Can Play
John Wyndham The Midwich Cuckoos Village of the Damned
Fritz Leiber Conjure Wife Night of the Eagle
Brian Aldiss Super-Toys Last All Summer Long A.I. Artificial Intelligence
John Steakley Vampire$ Vampires
Iain Banks Complicity Complicity

You got new quality movies to buy and watch, to satisfy the sci-fi addiction. Our query is just an hint of the immense power unleashed by linked data. Stay tuned to get more tutorials, and check out WordLift, the plugin we are launching to manage and produce linked data directly from WordPress.

Some fun exercises:

  • EASY: get the movies inspired to writings of Isaac Asimov
  • MEDIUM: get all the movies inspired by women writers
  • HARD: get all music artists whose songs were featured in a TV series



MICO Testing: One, Two, Three…

We’ve finally reached an important milestone in our validation work in the MICO project…we can begin testing and integrating our toolset with the first release of the platform to evaluate the initial set of media extractors. 

This blog post is more or less a diary of our first attempts in using MICO in conjunction with our toolset that includes:

  • HelixWare – the Video Hosting Platform (our online video platform that allows publishers and content providers to ingest, encode and distribute videos across multiple screens)
  • WordLift – the Semantic Editor for WordPress (assisting the editors writing a blog post and organising the website’s contents using semantic fingerprints)
  • Shoof – a UGC video recording application (this is an Android native app providing instant video-recording for people living in Cairo)

The workflow we’re planning to implement aims at improving content creation, content management and content delivery phases. 

Combined Deliverable 7.2.1 & 8.2.1 Use Cases- First Prototype

The diagram describes the various steps involved in the implementation of the scenarios we will use to run the tests. At this stage the main goal is to:

  • a) ingest videos in HelixWare,
  • b) process these videos with MICO and
  • c) add relevant metadata that will be further used by the client applications WordLift and Shoof.  

While we’re working to see MICO in action in real-world environments the tests we’ve designed aims at providing valuable feedback for the developers of each specific module in the platform.

These low-level components (called Technology Enablers or simply TE) include the extractors to analyse and annotate media files as well as modules for data querying and content recommendation. We’re planning to evaluate the TEs that are significant for our user stories and we have designed the tests around three core objectives:

  1. output accuracy​­ how accurate, detailed and meaningful each single response is when compared to other available tools;
  2. technical performance ​­ how much time each task requires and how scalable the solution is when we increase in volume the amount of contents being analysed;
  3. usability ​evaluated both in terms of integration, ​modularity ​and usefulness. ​

As of today being, everything still extremely experimental, we’re using a dedicated MICO platform running in a protected and centralised cloud environment. This machine has been installed directly by the technology partners of the project: this makes it easier for us to test and simpler for them to keep on developing, hot-fixing and stabilising the platform.    

Let’s start

By accessing the MICO Admin UI (this is accessible from the `/mico-configuration` directory), we’ve been able to select the analysis pipeline. MICO orchestrates different extractors and combines them in pipelines. At this stage the developer shall choose one pipeline at the time.  


Upon startup we can see the status of the platform by reading the command output window; while not standardised this already provides an overview on the startup of each media extractor in the pipeline.


For installing and configuring the MICO platform you can read the end-user documentation: at this stage I would recommend you to wait until everything becomes more stable (here is a link to the MICO end-user documentation)!

After starting up the system using the platform’s REST APIs we’ve been able to successfully send the first video files and request the processing of it. This is done mainly in three steps:

1. Create a Content Item
curl -X POST http://<mico_platform>/broker/inject/create


2. Create a Content Part
curl -X POST “http://

<mico_platform>/broker/inject/add?ci=http%3A%2F%2Fdemo2.mico-project.eu%3A8080%2Fmarmotta%2F322e04a3-33e9-4e80-8780-254ddc542661&type=video%2Fmp4&name=horses.mp4″ –data-binary @Bates_2045015110_512kb.mp4



3. Submit for processing
curl -v -X POST “http://


HTTP/1.1 200 OK

Server: Apache-Coyote/1.1

Content-Length: 0

Date: Wed, 08 Jul 2015 08:08:11 GMT

In the next blog posts we will see how to consume the data coming from MICO and how this data will be integrated in our application workflows.

In the meantime, if you’re interested in knowing more about MICO and how it could benefit your existing applications you can read:

Stay tuned for the next blog post!


One Week at the Monastery building WordLift.

Last week our team has gathered for a focused face-to-face session in a remote location of Abruzzo right in the center of Italy: the  Abbey of Santo Spirito d’Ocre. Like early agile-teams we’ve discovered once again the importance of working together in close proximity and the value of keeping a healthy balance between hard work and quality of life. We’ve also found ourselves in love with the product we’re building and happy to share it with others. 

Our business is made of small distributed teams organised in startups each one self-sufficient and focused on a specific aspect of our technology (we help business and individuals manage and organise contents being text, data, images or videos).

As peers gathered from different time zones in this unique little monastery of cistercian order we began executing our plan.

And yes, getting to the meeting with a clear agenda did help.  All the issues we had in our list had been summarised and shared in a dedicated Trello board. These included mainly the work we’ve been doing in the last years on WordLift our semantic editor for WordPress.

Cistercians (at least in the old days) kept manual labour as a central part of their monastic life. In our case we’ve managed to structure most of our days around three core activities: business planningbug fixing and software documentation. At the very basis we’ve kept the happiness of working together around something we like.

Emphatic vs Lean: setting up the Vision.

Most of the work in startups is governed by lean principles, the tools and the mindset of the people have been strongly influenced by the work of  Eric Ries who first published a book to promote the lean approach and to share his recipe for continuos innovation and business success.

After over three years of work on WordLift we can say that we’ve worked in a complete different way. Lean demands teams to get out of the building and look for problems to be solved. In our case, while we’ve spent most of our time “out of the building” (and with our clients) we’ve started our product journey from a technology, inspired by the words of  Tim Berners Lee on building a web for open, linked data and we’ve studied all possible ways to create an emotional connection with bloggers and writers.

Not just that, we have also dedicated time in analyzing the world of journalism, its changes and how it will continue evolving according to journalistic celebrities like David Carr (a famous journalist of the New York Times who died early on this year) and many others like him as well as the continuously emerging landscape of news algorithms that help people reach the content they want.

Establish the Vision of WordLiftUnderstanding that WordLift, and the go-to-market process that will follow shall be empathy-driven rather than lean is one of the greatest outcome of our “monastic” seminar in the valley of L’Aquila.

By using an empathy-driven expression: we’ve finally set the Vision.

Organise the Workflow: getting things done.

As most of today’s open-source software, WordLift is primarily built over GitHub.

While GitHub can be used and seen as a programming tool, GitHub – being the largest digital space for collaborative works – embeds your workflow.

While working at the monastery we’ve been able to discuss, test and implement a GitFlow workflow

The Gitflow Workflow is designed around a strict branching model for project releases. While somehow complicated in the beginning we see it as a robust framework for continuing the development of WordLift and for doing bug fixing without waiting for the next release cycle.

Documentation. Documentation. Documentation.

Writing (hopefully good) software documentation helps the users of any product and shall provide a general understanding of core features and functions.

The reality is that, when documenting your own software, the advantages go way beyond the original aim of helping end-users.

By updating WordLift documentation we were able to get a clearer overview of all actions required by an editor in composing his/her blog post and/or in creating and publishing the knowledge graph. We also have been able to detect flows in the code and in the user experience.

Most importantly we’ve found that writing the documentation (and this is also done collaboratively over GitHub) can be a great way to keep everyone in sync between the “Vision” of the product (how to connect with our usersand the existing implementation (what are we offering with this release).

Next steps

Now it’s time to organise the team and start the next iterations by engaging with the first users of this new release while fine-tuning the value proposition (below the emphatic view of @mausa89 on the USP of WordLift).


As this is the very first blog post I’m writing with WordLift v3 I can tell you I’m very happy about it and if you would like to test it too join our newsletter...we will keep you posted while continue playing!


Loading player...

The closing of this blog post is @colacino_m‘s night improvisation from the monastery.


WordLift powered by MICO at the European Semantic Web Conference 2015

WordLift powered by MICO at the European Semantic Web Conference 2015
Yes! Time to start presenting WordLift v3 to the world and how we’re planning to help Greenpeace Italy with MICO cross-media analysis and… a lot more.



User-Generated-Content for News and Media.

User-Generated-Content for News and Media.
User-generated content (UGC) play an amazing role on-air and online in our every day information diet. … 


WordLift Hackathon

There is no better way for a startup like us to kickstart the new year with an hackathon for creating WordLift’s product roadmap. … 


An interactive visualisation of events in SalzburgerLand

A Leaflet based geographical visualisation pulling events data from data.salzbugerland.com – Proudly powered by Redlink and WordLift. … 




What do we need cross-media analysis for?

What do we need cross-media analysis for?

At InSideOut10 our mission is to create user-engaging experiences to help people interact with digital contents… 


InSideOut.Today brand reload at the Mashable Social Media Day in Egypt!

Our partner Fady Ramzy was one of the speakers at Mashable #SMDay2014 event in Cairo. … 


Surreal Gallery

It’s time to change skin here in Cairo. More news coming. …