WordLift 3.0: A brief semantic story – part 1

In the world of digital networks, the term knowledge is generically used to identify and justify all activities aimed at improving data collection and organization. Of all sorts.

Knowledge can be improved when information is made available for a variety of readings and reports aimed to interpret reality, fantasize on trends, evolution, a possible future, in order to somehow control or dominate it.

Project processes have a necessary, preparatory activity in a project program, called identification of the reference scenario. In short, it consists in discovering and assimilating background contexts, or those that prepare the scene in which the subject of the study, as if it was an actor, inserts itself to explain the reasons for the first plan.

In computing knowledge is part of artificial intelligence. In this field the aim is (was) to achieve automation through strategies by making attempts and mistakes. This way of sketching a scenario is called Knowledge Representation. This symbolic representation was limited by the difficulty to relate various scenario. The usual Tim Berners-Lee, still a WWW leader, is the one responsible for its evolution. Through the W3C he launched in 1996 the XML standard allowing to add semantic information to contents, so they could be related. It’s the beginning of the Semantic Web which made it possible to publish, alongside documents, information and data in a format allowing machines to automatically process them.

“Most of the information content in today’s web is designed to be read only by human beings …” (Tim Berners-Lee again) “computers cannot process the language in web pages”.

Semantic web means a web whose content is structured so that software can read it: read it, answer questions and interact with users.

Introduction freely adapted from .. and for whoever wants to know the whole story.

Having introduced the value of any operation aimed to develop what will automatically set or suggest the metadata to be attached to the content in order to make it readable by machines, one still has to understand and define the following: what are the components of this structure or metadata? How can the significant elements be extracted uniformly disregarding the language? Which types of ontological categorisation and which relations must be activated in a content in order for it to become part of a semantic web for all? And especially: how can all this be done simultaneously?

And this is where the whole research and development area that revolves around the semantic technologies got stuck. We believe that this impasse was also caused by the lack of agreement among the various scientific paths necessary to achieve any kind of standardization. And also because of language and lexical differences, which are pushed towards a kind of ‘local’ multi-language system by the web itself and by the technologies that are distributed.

Considering the topic and the context of this post, we should leap from 1986, when the first markup languages were born, to 1998, when the standard XML was defined, and finally today, November 2015. We have performed this leap, at least partially, by means of a query (described here below) on Wikidata.

The path we have followed (considering that our group lacks scientific skills distributed among all the included fields of knowledge) involves:

accepting that semantic technologies as they had been conceived and applied could not fully meet our need to make the machines understand and order content;
redefining the context after the cultural and economic affirmation of the open data world and the data structure of the Linked Open Data.

Therefore, remembering what was dictated by the Austrian logician, mathematician and philosopher Gödel (also loved by the computing world), who stated: a world cannot be understood from inside the system itself; in order to understand any of it, we have to go out and observe it from the outside; we have initially deconstructed it by enclosing in sets all that would have necessarily been part of the final solution and then we turned to the world that preceded the current one: the analogical world and how it had tackled and replied to problems arising from the organization and classification of large amounts of “knowledge”.

A study/guide was very useful to us (and we therefore thank its authors): Organizzare la conoscenza: dalle biblioteche all’architettura dell’informazione per il web (Claudio Gnoli, Vittorio Marino and Luca Rosati).

The query on Wikidata to reconstruct the story of markup languages

Here below is the query you can make with a click (result were incomplete because we only entered languages whose creation date has a value in Wikidata – this value is expressed by Property:P571).

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX v: <http://www.wikidata.org/prop/statement/>
PREFIX q: <http://www.wikidata.org/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?entity ?ml ?sl WHERE {
 ?entity wdt:P31 wd:Q37045 . # ?entity is a markup language
 ?entity wdt:P571 ?sl . # ?sl is the inception date of ?entity
 ?entity rdfs:label ?ml . # ?entity name is ?ml
 FILTER(LANG(?ml) = "it") # ?ml is in Italian
 }
 ORDER by ?sl
 LIMIT 100

…. continues and here is the part 2 of this article.