Reimagining the Video Player with the help of MICO – Part 1

How more context can help us revamp HelixWare’s Video player to boost user engagement on news sites.

originally posted on http://www.mico-project.eu/reimagining-the-video-player/

Our use case in MICO is focused on news and media organisations willing to offer more context and better navigation to users who visit their news outlets. A well-known (in the media industry at least) report from Cisco that came out this last February predicts that nearly three-fourths (75 percent) of the world’s mobile data traffic will be attributed to video by 2020.

The latest video content meetup - organized in Cairo this March with the Helixware's team at Injaz. (Images courtesy of Insideout Today)

The latest video content meetup – organized in Cairo this March with the Helixware’s team at Injaz. (Images courtesy of Insideout Today)

While working with our fellow bloggers and editorial teams we’ve been studying how these news organizations, particularly those with text at their core, can be helped in crafting their stories with high-quality videos.

More over, the question we want to answer is: can videos become a pathway to deeper engagement?

With the help of MICO‘s media extractors we can add semantic annotations to videos on demand . These annotations are in the form of media fragments that can be used as input for both the embeddable video player of HelixWare and the video thumbnails created by HelixWare WordPress plugin. Media fragments, in our showcase, are generated from the face detection extractor and are both temporal (there is a face at this time in the video) and spatial (the face within these frames is located at xywh).

The new HelixWare video player that we’re developing as part of our effort in MICO aims at creating an immersive experience for the end-users. The validation of both video player and video thumbnail will be done using A/B testing against a set of metrics our publishers are focused on: time per session, numbers of videos played per session, number of shares of the videos over social media (content virality).

Now let’s review the design assumptions we’ve worked on so far to reimagine the video player and in the next post we will present the first results.

1. Use faces to connect with users

Thumbnails, when done right, are generally key to ensuring a high level of engagement on web pages. This seems to be particularly true when thumbnails feature human faces that are considered “powerful channels of non-verbal communication in social networks. With MICO we can now offer to the editor a better tool to engage audiences by integrating a new set of UI elements that use human faces in the video thumbnail.  The study documenting this point is “Faces Engage Us: Photos with Faces Attract More Likes and Comments on Instagram” and has been authored by S Bakhshi, D. A. Shamma, E. Gilber in ‎2014.   

2. Increase the saturation by 20%-30% to boost engagement

Another interesting finding backing up our work is that filtered photos are 21% more likely  to  be  viewed and 45% more likely to be commented on by consumers of photographs. Specifically, filters that increase warmth, exposure and contrast boost engagement the most.

3. Repeat text elements

As seen in most of the custom thumbnail tutorials for YouTube available on-line adding some elements of the title or the entire title using a bold font over a clear background can make the video more compelling and accordingly to some, significantly increase the click-through-rate.  One of the goals of the demo will be to provide a simple and appealing UI where text and image cooperate to offer a more engaging user experience removing any external informations that could distract the viewer.

4. Always keep the editor in full control

We firmly believe machines will help journalists and bloggers focus on what matters most – writing stories that people want to read. This means that whatever workflow we plan to implement there shall always be a human (the editor himself) behind the scene validating the content produced by technologies such as MICO.

This is particularly true when dealing with sensitive materials such as human faces depicted in videos. There might be obvious privacy concerns for which an editor might choose to use a landscape rather than a face for his video thumbnail and we shall make sure this option always remains available.

We will continue documenting this work in the next blog post and as usual we look forward to hearing your thoughts and ideas – please email us anytime.

 

Build Your Knowledge Graph with WordLift. Introducing version 3.4

We love the Web. We’ve been using the Internet in various forms since the 90’s. We believe that the increasing amount of information should be structured beforehand by the same individuals that create the content.

With this idea in mind and willing to empower journalists and bloggers we’ve created WordLift: a semantic editor for WordPress.

With the latest release (version 3.4) we are introducing a Dashboard to provide a quick overview of the website’s Knowledge Graph.

dashboard-wordlift

What the heck is a Knowledge Graph?

Knowledge Graphs are all around us. Internet giants like Google, Facebook, LinkedIn and Amazon they are all running their own Knowledge Graphs and willingly or not we are all contributing to them.

Knowledge Graphs are networks of all kind of things which are relevant to a specific knowledge domain or within an organization. They contains abstract concepts and relations as well as instances of all sort of ‘things’ including persons, companies, documents and datasets.

Knowledge Graphs are intelligent data models made of descriptive metadata that can be used to categorise contents.

Why should You bring all your content and data in a Knowledge Graph?

The answer is simple. You want to be relevant for your audience. A Knowledge Graph allows machines (including voice-enabled assistants, smartphone apps and search crawlers) to make complex queries over the entirety of your content. Let’s break this down into benefits:

  • Facilitate smarter and more relevant search results and recommendations
  • Support intelligent personal assistant like Apple Siri and Google Now understand natural language requests by providing the needed vocabulary to identify content
  • Get richer insights on how content is performing and is being received by the audience. Someone calls it Semantic Analytics (more on this topic soon)
  • Sell advertising more wisely by providing in-depth context to advertising networks
  • Create new services that drive reader engagement
  • Share (or sell) metadata to the rest of the World

So what makes WordLift special

WordLift allows anyone to build his/her own Knowledge Graph. The plugin adds a layer of structured metadata to the content you write on WordPress. Every article is classified with named entities and these classifications are used to provide relevant recommendations that boost the articles of your site with widgets like the navigator and the faceted search. There is more.

The deep vocabulary can be used to understand natural language requests like – “Who is the founder of company [X]?”. Let’s dig deeper. Here is an example that uses a generalist question answering tool called Platypus.

platyp

Playtypus leverages on the Wikidata Knowledge Graph. Now if I would ask “Who is the founder of Insideout10?Wikidata, would probably politely answer “I’m sorry but I don’t have this information“.

Now, the interesting part is that, for this specific question, this same blog holds the correct answer.

As named entities are described along with their properties I can consult the metadata about Insideout10and eventually have applications like Platypus run a SPARQL query on my graph.

redlink

This query returns two entities:

Who owns the data?

The site owner does. Every website has its own graph published on data.wordlift.it (or any custom domain name you might like) and the creator of the website holds all licensing rights of his/her data. In the next upcoming release a dedicated rights statement will be added to all graphs published with WordLift (here you’ll find the details of this issue).

So how big is this graph?

If we combine all existing beta testers we reach a total of 37.714 triples (this being the unit of measurement of information stored in a Knowledge Graph). Here is a chart representing this data.

While this is a very tiny fraction of the World’s knowledge (Wikidata holds 975.989.631 of triples – here is a simple query to check this information on their systems) it is relevant for the readers’ of this blog and contributes to the veracity of big data (“Is this data accurate?”).

Happy blogging!

 

Innovation in Digital Journalism – a report from the second year of FP7-MICO

The blog post summaries the work done in the context of the EU project FP7-MICO in the area of digital journalism.

This last December we attended in Cairo one of the most exciting events in the Middle East and North Africa region on Entrepreneurship and Hi-Tech startups: RiseUp Summit 2015. We engaged with the overwhelming crowd of startuppers and geeks on our HelixWare booth and in a separated Meetup organised at the Greek Campus.

We had the opportunity, during these two hectic days, to share the research work done in MICO for extending the publishing workflows of independent news organizations with cross-media analysis, natural language processing and linked data querying tools.

… 

 

Introducing WordLift new Vocabulary

When we first started WordLift, we envisioned a simple way for people to structure their content using Semantic Fingerprinting and named entities. Over the last few weeks we’ve seen the Vocabulary as the central place to manage all named entities on a website. Moreover we’ve started to see named entities playing an important role in making the site more visibile to both humans and bots (mainly search engines at this stage).

Here is an overview on the numbers of weekly organic search visits from Google on this blog (while numbers are still small we’ve a 110% growth).

google

To help editors increase the quality of their entity pages, today, we are launching our new Vocabulary along with version 3.3.

wordlift-vocabolary

The Vocabulary can be used as a directory for entities. Entities can now be filtered using the “Who“, “Where“, “When” and “What” categories and most importantly entities have a rating and a traffic light to quickly see where to start improving.

Until now it was hard to have a clear overview (thumbnails have been also introduced); it was also hard to see what was missing and..where. The principles for creating successful entity pages can be summarised as follow: 

  1. Every entity should be linked to one or more related posts. Every entity has a corresponding web page. This web page acts as a content hub (here is what we have to say about AI on this blog for example) – this means that we shall always have articles linked to our entities. This is not a strict rule though as we might also use the entity pages to build our website (rather than to organise our blog posts).
  2. Every entity should have its own description. And this description shall express the editor’s own vision on a given topic. 
  3. Every entity should link to other entities. When we chose other entities to enrich the description of an entity we create relationships within our knowledge graph and these relationships are precious and can be used in many different ways (the entity on AI on this blog is connected for instance with the entity John McCarthy who was the first to coin the term in 1955)
  4. Entities, just like any post in WordPress, can be kept as draft. Only when we publish them they become available in the analysis and we can use them to classify our contents.
  5. Images are worth thousand words as someone used to say. When we add a featured image to an entity we’re also adding the schema-org:image attribute to the entity.
  6. Every entity (unless you’re creating something completely new) should be interlinked with the same entity on at least one other dataset. This is called data interlinking and can be done by adding a link to the equivalent entity using the sameAs attribute (here we have for instance the same John McCarthy in the Yago Knowledge Base).
  7. Every entity has a type (i.e. Person, Place, Organization, …) and every type has its own set of properties. When we complete all the properties of an entity we increase its visibility.  

Happy blogging!

 

WordLift 3.0: A brief semantic story – part 2

Classifications help us find the material we are looking for.

Here is the part 1 of this article.


By now, the web has such a great amount of content that it has become impossible to apply homogeneous classification schemes to organize knowledge and make it available; unless only a  specific domain is considered (more than 2,5 million new articles are published  each and every day).

Classification schemes are structures that use  results and relations as information to be added to  content. The following four types can be identified: hierarchical, tree, faceted, according to reference models (or paradigms).

Structured information storage is ultimately aimed at improving human knowledge.

Our goal with WordLift consists in developing an application that will structure content so as to simultaneously represent various classification methods for machines, enabling the latter to organize the content that is published on digital networks so as to  make it usable in different ways.

Due to the impasse met by semantic technologies, introduced in part 1, in the first phase of our analysis we excluded the digital world as the mandatory recipient of our solution.

Therefore, during the first phase we looked to the classification systems that mankind has used to organize its knowledge before the computing era; then we considered the evolution of faceted interfaces; the technologies that put the different web environments into reference with each other; and what is the consolidated on the web regarding the considered topics (interlinking with dbpedia, freebase, geonames and methodologies required by the search engines to classify and publish content).

It’s not easy to identify the answers; especially because the essential technological component is increasingly and continually evolving. In the book “Organizzare la Conoscenza (in Italian)… already mentioned in the previous post, at a certain point in Chapter 2  the essential categories – those having various facets in common and valid for all disciplines – are introduced.

They are introduced by the Indian mathematician  Shiyali Ramamrita Ranganathan, who was the first – around 1930 – to talk about this analysis, consisting in breaking down a topic into components and then building it up again based on a code. He chose five essential categories: space and time, on which everyone agrees; energy, referring to activities or dynamism and indicating the ‘action’ in semantics; matter, for example of a material and its property; personas to indicate the main subject of that context, even when it’s not a human being.

These categories are considered abstract, but nevertheless we used them to design the back-end interface for the editors, and mapped them to the corresponding types in the schema.org vocabulary.

WordLift is indeed an editor built on top of the universally recognised vocabulary of concepts published by http://schema.org/, consisting so far of more than 1,200 items divided into nine essential categories: Action, CreativeWork, Event, Intangible, Medical Entity, Organization, Person, Place, Product.

In this November 2015 the schema.org vocabulary has over 217 million pages (URLs) containing a total of more than six billion triples.

WordLift 3.0 is a semantic editor that analyses content and automatically suggests metadata according to schema.org vocabulary categories that we have somewhat simplified for users, dividing them in this first experimental phase into four essential categories: Who (Person, Organization), Where (Place), When (Event), What (CreativeWork, Product, Intangible). However, users can add any amount of results to those suggested by the application, thus creating a personal vocabulary within the application.

The next release, which will complete the experimental phase in January 2016, will allow to assign different levels of importance to the results, creating a hierarchical and tree classification (by using the mainEntity that schema.org has created to mark articles).

For the future we are considering the Dewey (Dewey Decimal Classification) hierarchical classification that is used in all libraries across the world.

This is the general process that has led us to design a solution in which semantic technologies work jointly with relational technologies to automatically associate a set of metadata, or a semantic graph, to a specific content.

Identifying the technological development and services for users was not simple, but on the other hand the maturation and affirmation of the Open Data Linked cloud and of dbpedia (freebase, geonames) was essential to enable the WordLift 3.0 editor to generate reusable datasets.

[the first blog post of this brief Semantic Story is here

 

WordLift 3.0: A brief semantic story – part 1

In the world of digital networks, the term knowledge is generically used to identify and justify all activities aimed at improving data collection and organization. Of all sorts.

Knowledge can be improved when information is made available for a variety of readings and reports aimed to interpret reality, fantasize on trends, evolution, a possible future, in order to somehow control or dominate it. 

Project processes have a necessary, preparatory activity in a project program, called identification of the reference scenario. In short, it consists in discovering and assimilating background contexts, or those that prepare the scene in which the subject of the study, as if it was an actor, inserts itself to explain the reasons for the first plan.

In computing knowledge is part of artificial intelligence. In this field the aim is (was) to achieve automation through strategies by making attempts and mistakes. This way of sketching a scenario is called  Knowledge Representation. This symbolic representation was limited by the difficulty to relate various scenario. The usual Tim Berners-Lee, still a WWW leader, is the one responsible for its evolution. Through the W3C he launched in 1996 the XML standard allowing to add semantic information to contents, so they could be related. It’s the beginning of the Semantic Web which made it possible to publish, alongside documents, information and data in a format allowing machines to automatically process them.

“Most of the information content in today’s web is designed to be read only by human beings …” ( Tim Berners-Lee again) “computers cannot process the language in web pages”.

Semantic web means a web whose content is structured so that software can read it: read it, answer questions and interact with users.

Introduction freely adapted from .. and for whoever wants to know the whole story.

Having introduced the value of any operation aimed to develop what will automatically set or suggest the metadata to be attached to the content in order to make it readable by machines, one still has  to understand and define the following: what are the components of this structure or metadata?   How can the significant elements be extracted uniformly disregarding the language? Which types of ontological categorisation and which relations must be activated in a content in order for it to become part of a  semantic web for all? And especially: how can all this be done simultaneously?

And this is where the whole research and development area that revolves around the semantic technologies got stuck. We believe that this impasse was also caused by the lack of agreement among the various scientific paths necessary to achieve any kind of standardization. And also because of language and lexical differences, which are pushed towards a kind of ‘local’ multi-language system by the web itself and by the technologies that are distributed.

Considering the topic and the context of this post, we should leap from 1986, when the first markup languages were born, to 1998, when the standard XML was defined, and finally today, November 2015. We have performed this leap, at least partially, by means of  a query (described here below) on Wikidata.

The path we have followed (considering that our group lacks scientific skills distributed among all the included fields of knowledge) involves:

  • accepting that semantic technologies as they had been conceived and applied could not fully meet our need to make the machines understand and order content;
  • redefining the context after the cultural and economic affirmation of the open data world and the data structure of the Linked Open Data.

Therefore, remembering what was dictated by the Austrian logician, mathematician and philosopher  Gödel (also loved by the computing world), who stated:  a world cannot be understood from inside the system itself; in order to understand any of it, we have to go out and observe it from the outside; we have initially deconstructed it by enclosing  in sets all that would have necessarily been part of the final solution and then we turned to the  world that preceded the current one: the  analogical world and how it had tackled and replied to problems arising from the organization and classification of large amounts of “knowledge”.

A study/guide was very useful to us (and we therefore thank its authors): Organizzare la conoscenza: dalle biblioteche all’architettura dell’informazione per il web (Claudio Gnoli, Vittorio Marino and Luca Rosati).

The query on Wikidata to reconstruct the story of markup languages

Here below is the query you can make with a click (result were incomplete because we only entered languages whose creation date has a value in Wikidata – this value is expressed by Property:P571).

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX v: <http://www.wikidata.org/prop/statement/>
PREFIX q: <http://www.wikidata.org/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?entity ?ml ?sl WHERE {
 ?entity wdt:P31 wd:Q37045 . # ?entity is a markup language
 ?entity wdt:P571 ?sl . # ?sl is the inception date of ?entity
 ?entity rdfs:label ?ml . # ?entity name is ?ml
 FILTER(LANG(?ml) = "it") # ?ml is in Italian
 }
 ORDER by ?sl
 LIMIT 100

…. continues and here is the part 2 of this article.

 

Looking for…science fiction movies on the Linked Data Cloud

When working in technology, sometimes you find yourself day-dreaming about Artificial Intelligence, Space Ships, Aliens. The trip can be intensified by a legacy of Science Fiction movies, that inspire and give motivation to work harder on the actual project. The bad part about being a science fiction junkie is that you always search for new movies worth watching. Along the years you become pickier and pickier, it’s never enough.

After re-reading The Foundations Series by Isaac Asimov, you crave for more and have the urge to find available movies with a solid background in literature. That seems to be a good filter for quality:

You want to watch all sci-fi movies inspired by books.

Before watching them of course you need to list them. There are many resources on the web to accomplish the task:

1 – imdb, rotten tomatoes: they offer detailed information about the movie, e.g. what the movie is about, what are the actors, some reviews. There are interesting user curated lists that partially satisfy your requirements, for example a list of the best sci-fi movies from 2000 to now. These websites are good resources to get started, but they don’t offer a search for the details you care about.

2 – individual blogs: you may be lucky if a search engine indexed an article that exactly replies to your questions. The web is huge and someone might have been so brave to do the research himself and so generous to put it online. Not always the case, and absolutely not reliable.

3 – Linked Data Cloud: the web of data comes as a powerful resource to query the web at atomic detail. Both dbPedia and Wikidata, the LOD versions of Wikipedia, contain thousands of movies and plenty of details for each. Since the LOD cloud is a graph database hosted by the public web, you can freely ask very specific and domain crossing questions obtaining as a result pure, diamond data. Technically this is more challenging, some would say “developer only”, but at InsideOut we are working to democratize this opportunity.

From the title of the post you may already know what option we like most, so let’s get into the “how to”.
We need to send a query to the Wikidata public SPARQL endpoint. Let’s start from a visual depiction of our query, expressing concepts (circles) and relation between them (arrows).

movies_based_on_scifi

Let’s color in white what we want in output from the query and in blue what we already know.

movies_based_on_scifi

– Why is it necessary to specify that m is a Movie?
Writings can inspire many things, for example a song or a political movement, but we only want Movies.

– Why it is not specified also that w is a Writing and p is a Person?
Movies come out of both books, short stories and sometimes science essays. We want our movies to be inspired by something that was written, and this relation is implied by the “author” relation. The fact that p is a person is implied from the fact that only persons write science fiction (at least until year 2015).

Let’s reframe the picture in a set of triples (subject-predicate-object statements), the kind of language a graph database can understand. We call m (movie), w (writing) and p (person) our incognitas, then we define the properties and relations they must match.

  • m is a Movie
  • m is based on w
  • w is written by p
  • p is a science fiction writer

Since the graph we are querying is the LOD cloud, the components of our triples are internet addresses and the query language we use is SPARQL. See below how we translate the triples above in actual Wikidata classes and properties. Keep in mind that movies, persons and writings are incognitas, so they are expressed with the ?x syntax. Everything after # is a comment.

?m <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q11424> .
# ?m is a Movie
?m <http://www.wikidata.org/prop/direct/P144> ?w .
# ?m is based on ?w
?w <http://www.wikidata.org/prop/direct/P50> ?p .
# ?w written by ?p
?p <http://www.wikidata.org/prop/direct/P106> <http://www.wikidata.org/entity/Q18844224> .
# ?p is a science fiction writer

As you can see the triples’ components are links, and if you point your browser there you can fetch triples in which the link itself is the subject. That’s the major innovation of the semantic web in relation to any other kind of graph database: it is as huge and distributed as the web. Take a few moments to appreciate the idea and send your love to Tim Berners-Lee.

Similarly to SQL, we can express in SPARQL that we are selecting data with the SELECT…WHERE keywords. The PREFIX syntax makes our query more readable by making the URIs shorter:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT ?w ?m ?p WHERE {
?m wdt:P31 wd:Q11424 . # ?m is a Movie
?m wdt:P144 ?w . # ?m is based on ?w
?w wdt:P50 ?p . # ?w written by ?p
?p wdt:P106 wd:Q18844224 . # ?p is a science fiction writer
}

If you run the query above you will get as result a set of addresses, being the URI of the movies, writings and persons we searched for. We should query directly for the name, so let’s introduce ml (a label for the movie), wl (a label for the writing) and pl (a label for the person). We also impose the label language to be in english, via the FILTER command.

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?pl ?wl ?ml WHERE {
?m wdt:P31 wd:Q11424 . # ?m is a Movie
?m wdt:P144 ?w . # ?m is based on ?w
?w wdt:P50 >?p . # ?w written by ?p
?p wdt:P106 wd:Q18844224 . # ?p is a science fiction writer
?p rdfs:label ?pl . # ?p name is ?pl
?w rdfs:label ?wl . # ?w name is ?wl
?m rdfs:label ?ml . # ?m name is ?ml
FILTER(LANG(?pl) = "en") # ?pl is in english
FILTER(LANG(?wl) = "en") # ?wl is in english
FILTER(LANG(?ml) = "en") # ?ml is in english
}

Let’s run the query on the dedicated Wikidata service. You can image this process as SPARQL trying to match the pattern in our picture on all its data, and giving back as result only the values of our incognitas that can satisfy the constraints. The results are:

pl wl ml
Carl Sagan Contact Contact
Philip K. Dick Do Androids Dream of Electric Sheep? Blade Runner
Philip K. Dick Paycheck Paycheck
Philip K. Dick The Golden Man Next
H. G. Wells The War of the Worlds War of the Worlds
H. G. Wells The War of the Worlds The War of the Worlds
H. G. Wells The War of the Worlds War of the Worlds 2: The Next Wave
H. G. Wells The War of the Worlds H. G. Wells’ War of the Worlds
Mikhail Bulgakov Ivan Vasilievich Ivan Vasilievich: Back to the Future
Mikhail Bulgakov Heart of a Dog Cuore di cane
Mikhail Bulgakov The Master and Margarita Pilate and Others
H. G. Wells The Shape of Things to Come Things to Come
H. G. Wells The Time Machine The Time Machine
H. G. Wells The Island of Doctor Moreau The Island of Dr. Moreau
H. G. Wells The Time Machine The Time Machine
H. G. Wells The Invisible Man The Invisible Man
H. G. Wells The First Men in the Moon First Men in the Moon
H. G. Wells The Invisible Man The Invisible Woman
Isaac Asimov The Bicentennial Man Bicentennial Man
Isaac Asimov I, Robot I, Robot
Isaac Asimov The Caves of Steel I, Robot
Philip K. Dick Adjustment Team The Adjustment Bureau
Philip K. Dick Second Variety Screamers
Philip K. Dick Impostor Impostor
Philip K. Dick Radio Free Albemuth Radio Free Albemuth
Philip K. Dick We Can Remember It for You Wholesale Total Recall
Philip K. Dick The Minority Report Minority Report
Philip K. Dick A Scanner Darkly A Scanner Darkly
Daniel Keyes Flowers for Algernon Charly
Kingsley Amis Lucky Jim Lucky Jim (1957 film)
Kingsley Amis That Uncertain Feeling Only Two Can Play
John Wyndham The Midwich Cuckoos Village of the Damned
Fritz Leiber Conjure Wife Night of the Eagle
Brian Aldiss Super-Toys Last All Summer Long A.I. Artificial Intelligence
John Steakley Vampire$ Vampires
Iain Banks Complicity Complicity

You got new quality movies to buy and watch, to satisfy the sci-fi addiction. Our query is just an hint of the immense power unleashed by linked data. Stay tuned to get more tutorials, and check out WordLift, the plugin we are launching to manage and produce linked data directly from WordPress.

Some fun exercises:

  • EASY: get the movies inspired to writings of Isaac Asimov
  • MEDIUM: get all the movies inspired by women writers
  • HARD: get all music artists whose songs were featured in a TV series

 

 

MICO Testing: One, Two, Three…

We’ve finally reached an important milestone in our validation work in the MICO project…we can begin testing and integrating our toolset with the first release of the platform to evaluate the initial set of media extractors. 

This blog post is more or less a diary of our first attempts in using MICO in conjunction with our toolset that includes:

  • HelixWare – the Video Hosting Platform (our online video platform that allows publishers and content providers to ingest, encode and distribute videos across multiple screens)
  • WordLift – the Semantic Editor for WordPress (assisting the editors writing a blog post and organising the website’s contents using semantic fingerprints)
  • Shoof – a UGC video recording application (this is an Android native app providing instant video-recording for people living in Cairo)

The workflow we’re planning to implement aims at improving content creation, content management and content delivery phases. 

Combined Deliverable 7.2.1 & 8.2.1 Use Cases- First Prototype

The diagram describes the various steps involved in the implementation of the scenarios we will use to run the tests. At this stage the main goal is to:

  • a) ingest videos in HelixWare,
  • b) process these videos with MICO and
  • c) add relevant metadata that will be further used by the client applications WordLift and Shoof.  

While we’re working to see MICO in action in real-world environments the tests we’ve designed aims at providing valuable feedback for the developers of each specific module in the platform.

These low-level components (called Technology Enablers or simply TE) include the extractors to analyse and annotate media files as well as modules for data querying and content recommendation. We’re planning to evaluate the TEs that are significant for our user stories and we have designed the tests around three core objectives:

  1. output accuracy​­ how accurate, detailed and meaningful each single response is when compared to other available tools;
  2. technical performance ​­ how much time each task requires and how scalable the solution is when we increase in volume the amount of contents being analysed;
  3. usability ​evaluated both in terms of integration, ​modularity ​and usefulness. ​

As of today being, everything still extremely experimental, we’re using a dedicated MICO platform running in a protected and centralised cloud environment. This machine has been installed directly by the technology partners of the project: this makes it easier for us to test and simpler for them to keep on developing, hot-fixing and stabilising the platform.    

Let’s start

By accessing the MICO Admin UI (this is accessible from the `/mico-configuration` directory), we’ve been able to select the analysis pipeline. MICO orchestrates different extractors and combines them in pipelines. At this stage the developer shall choose one pipeline at the time.  

MICO-Initial-Screen-01

Upon startup we can see the status of the platform by reading the command output window; while not standardised this already provides an overview on the startup of each media extractor in the pipeline.

MICO-Initial-Screen-02

For installing and configuring the MICO platform you can read the end-user documentation: at this stage I would recommend you to wait until everything becomes more stable (here is a link to the MICO end-user documentation)!

After starting up the system using the platform’s REST APIs we’ve been able to successfully send the first video files and request the processing of it. This is done mainly in three steps:

1. Create a Content Item
Request
curl -X POST http://<mico_platform>/broker/inject/create
Response
{“uri”:”http://

<mico_platform>/marmotta/322e04a3-33e9-4e80-8780-254ddc542661″}

2. Create a Content Part
Request
curl -X POST “http://

<mico_platform>/broker/inject/add?ci=http%3A%2F%2Fdemo2.mico-project.eu%3A8080%2Fmarmotta%2F322e04a3-33e9-4e80-8780-254ddc542661&type=video%2Fmp4&name=horses.mp4″ –data-binary @Bates_2045015110_512kb.mp4

Response
{“uri”:”http://

<mico_platform>/marmotta/322e04a3-33e9-4e80-8780-254ddc542661/8755967a-6e1d-4f5e-a85d-4e692f774f76″}

3. Submit for processing
Request
curl -v -X POST “http://

<mico_platform>/broker/inject/submit?ci=http%3A%2F%2Fdemo2.mico-project.eu%3A8080%2Fmarmotta%2F322e04a3-33e9-4e80-8780-254ddc542661″

Response
HTTP/1.1 200 OK

Server: Apache-Coyote/1.1

Content-Length: 0

Date: Wed, 08 Jul 2015 08:08:11 GMT

In the next blog posts we will see how to consume the data coming from MICO and how this data will be integrated in our application workflows.

In the meantime, if you’re interested in knowing more about MICO and how it could benefit your existing applications you can read:

Stay tuned for the next blog post!

 

One Week at the Monastery building WordLift.

Last week our team has gathered for a focused face-to-face session in a remote location of Abruzzo right in the center of Italy: the  Abbey of Santo Spirito d’Ocre. Like early agile-teams we’ve discovered once again the importance of working together in close proximity and the value of keeping a healthy balance between hard work and quality of life. We’ve also found ourselves in love with the product we’re building and happy to share it with others.

Our business is made of small distributed teams organised in startups each one self-sufficient and focused on a specific aspect of our technology (we help business and individuals manage and organise contents being text, data, images or videos).

As peers gathered from different time zones in this unique little monastery of cistercian order we began executing our plan.

And yes, getting to the meeting with a clear agenda did help.  All the issues we had in our list had been summarised and shared in a dedicated Trello board. These included mainly the work we’ve been doing in the last years on WordLift our semantic editor for WordPress.

Cistercians (at least in the old days) kept manual labour as a central part of their monastic life. In our case we’ve managed to structure most of our days around three core activities: business planningbug fixing and software documentation. At the very basis we’ve kept the happiness of working together around something we like.

Emphatic vs Lean: setting up the Vision.

Most of the work in startups is governed by lean principles, the tools and the mindset of the people have been strongly influenced by the work of  Eric Ries who first published a book to promote the lean approach and to share his recipe for continuos innovation and business success.

After over three years of work on WordLift we can say that we’ve worked in a complete different way. Lean demands teams to get out of the building and look for problems to be solved. In our case, while we’ve spent most of our time “out of the building” (and with our clients) we’ve started our product journey from a technology, inspired by the words of  Tim Berners Lee on building a web for open, linked data and we’ve studied all possible ways to create an emotional connection with bloggers and writers.

Not just that, we have also dedicated time in analyzing the world of journalism, its changes and how it will continue evolving according to journalistic celebrities like David Carr (a famous journalist of the New York Times who died early on this year) and many others like him as well as the continuously emerging landscape of news algorithms that help people reach the content they want.

Establish the Vision of WordLiftUnderstanding that WordLift, and the go-to-market process that will follow shall be empathy-driven rather than lean is one of the greatest outcome of our “monastic” seminar in the valley of L’Aquila.

By using an empathy-driven expression: we’ve finally set the Vision.

Organise the Workflow: getting things done.

As most of today’s open-source software, WordLift is primarily built over GitHub.

While GitHub can be used and seen as a programming tool, GitHub – being the largest digital space for collaborative works – embeds your workflow.

While working at the monastery we’ve been able to discuss, test and implement a GitFlow workflow.

The Gitflow Workflow is designed around a strict branching model for project releases. While somehow complicated in the beginning we see it as a robust framework for continuing the development of WordLift and for doing bug fixing without waiting for the next release cycle.

Documentation. Documentation. Documentation.

Writing (hopefully good) software documentation helps the users of any product and shall provide a general understanding of core features and functions.

The reality is that, when documenting your own software, the advantages go way beyond the original aim of helping end-users.

By updating WordLift documentation we were able to get a clearer overview of all actions required by an editor in composing his/her blog post and/or in creating and publishing the knowledge graph. We also have been able to detect flows in the code and in the user experience.

Most importantly we’ve found that writing the documentation (and this is also done collaboratively over GitHub) can be a great way to keep everyone in sync between the “Vision” of the product (how to connect with our usersand the existing implementation (what are we offering with this release).

Next steps

Now it’s time to organise the team and start the next iterations by engaging with the first users of this new release while fine-tuning the value proposition (below the emphatic view of @mausa89 on the USP of WordLift).

wordlift3_gotomarket.strategy.001

As this is the very first blog post I’m writing with WordLift v3 I can tell you I’m very happy about it and if you would like to test it too join our newsletter...we will keep you posted while continue playing!

VID_20150818_223324.mp4

Loading player...

The closing of this blog post is @colacino_m‘s night improvisation from the monastery.

Retreat at the Monastery

 

WordLift powered by MICO at the European Semantic Web Conference 2015

WordLift powered by MICO at the European Semantic Web Conference 2015
Yes! Time to start presenting WordLift v3 to the world and how we’re planning to help Greenpeace Italy with MICO cross-media analysis and… a lot more.

… 

 

User-Generated-Content for News and Media.

User-Generated-Content for News and Media.
User-generated content (UGC) play an amazing role on-air and online in our every day information diet. … 

 

WordLift Hackathon

There is no better way for a startup like us to kickstart the new year with an hackathon for creating WordLift’s product roadmap. …