Reimagining the Video Player with the help of MICO – Part 1

How more context can help us revamp HelixWare’s Video player to boost user engagement on news sites.

originally posted on

Our use case in MICO is focused on news and media organisations willing to offer more context and better navigation to users who visit their news outlets. A well-known (in the media industry at least) report from Cisco that came out this last February predicts that nearly three-fourths (75 percent) of the world’s mobile data traffic will be attributed to video by 2020.

The latest video content meetup - organized in Cairo this March with the Helixware's team at Injaz. (Images courtesy of Insideout Today)

The latest video content meetup – organized in Cairo this March with the Helixware’s team at Injaz. (Images courtesy of Insideout Today)

While working with our fellow bloggers and editorial teams we’ve been studying how these news organizations, particularly those with text at their core, can be helped in crafting their stories with high-quality videos.

More over, the question we want to answer is: can videos become a pathway to deeper engagement?

With the help of MICO’s media extractors we can add semantic annotations to videos on demand . These annotations are in the form of media fragments that can be used as input for both the embeddable video player of HelixWare and the video thumbnails created by HelixWare WordPress plugin. Media fragments, in our showcase, are generated from the face detection extractor and are both temporal (there is a face at this time in the video) and spatial (the face within these frames is located at xywh).

The new HelixWare video player that we’re developing as part of our effort in MICO aims at creating an immersive experience for the end-users. The validation of both video player and video thumbnail will be done using A/B testing against a set of metrics our publishers are focused on: time per session, numbers of videos played per session, number of shares of the videos over social media (content virality).

Now let’s review the design assumptions we’ve worked on so far to reimagine the video player and in the next post we will present the first results.

1. Use faces to connect with users

Thumbnails, when done right, are generally key to ensuring a high level of engagement on web pages. This seems to be particularly true when thumbnails feature human faces that are considered “powerful channels of non-verbal communication in social networks. With MICO we can now offer to the editor a better tool to engage audiences by integrating a new set of UI elements that use human faces in the video thumbnail.  The study documenting this point is “Faces Engage Us: Photos with Faces Attract More Likes and Comments on Instagram” and has been authored by S Bakhshi, D. A. Shamma, E. Gilber in ‎2014.   

2. Increase the saturation by 20%-30% to boost engagement

Another interesting finding backing up our work is that filtered photos are 21% more likely  to  be  viewed and 45% more likely to be commented on by consumers of photographs. Specifically, filters that increase warmth, exposure and contrast boost engagement the most.

3. Repeat text elements

As seen in most of the custom thumbnail tutorials for YouTube available on-line adding some elements of the title or the entire title using a bold font over a clear background can make the video more compelling and accordingly to some, significantly increase the click-through-rate.  One of the goals of the demo will be to provide a simple and appealing UI where text and image cooperate to offer a more engaging user experience removing any external informations that could distract the viewer.

4. Always keep the editor in full control

We firmly believe machines will help journalists and bloggers focus on what matters most – writing stories that people want to read. This means that whatever workflow we plan to implement there shall always be a human (the editor himself) behind the scene validating the content produced by technologies such as MICO.

This is particularly true when dealing with sensitive materials such as human faces depicted in videos. There might be obvious privacy concerns for which an editor might choose to use a landscape rather than a face for his video thumbnail and we shall make sure this option always remains available.

We will continue documenting this work in the next blog post and as usual we look forward to hearing your thoughts and ideas – please email us anytime.


Build Your Knowledge Graph with WordLift. Introducing version 3.4

We love the Web. We’ve been using the Internet in various forms since the 90’s. We believe that the increasing amount of information should be structured beforehand by the same individuals that create the content.

With this idea in mind and willing to empower journalists and bloggers we’ve created WordLift: a semantic editor for WordPress.

With the latest release (version 3.4) we are introducing a Dashboard to provide a quick overview of the website’s Knowledge Graph.


What the heck is a Knowledge Graph?

Knowledge Graphs are all around us. Internet giants like Google, Facebook, LinkedIn and Amazon they are all running their own Knowledge Graphs and willingly or not we are all contributing to them.

Knowledge Graphs are networks of all kind of things which are relevant to a specific knowledge domain or within an organization. They contains abstract concepts and relations as well as instances of all sort of ‘things’ including persons, companies, documents and datasets.

Knowledge Graphs are intelligent data models made of descriptive metadata that can be used to categorise contents.

Why should You bring all your content and data in a Knowledge Graph?

The answer is simple. You want to be relevant for your audience. A Knowledge Graph allows machines (including voice-enabled assistants, smartphone apps and search crawlers) to make complex queries over the entirety of your content. Let’s break this down into benefits:

  • Facilitate smarter and more relevant search results and recommendations
  • Support intelligent personal assistant like Apple Siri and Google Now understand natural language requests by providing the needed vocabulary to identify content
  • Get richer insights on how content is performing and is being received by the audience. Someone calls it Semantic Analytics (more on this topic soon)
  • Sell advertising more wisely by providing in-depth context to advertising networks
  • Create new services that drive reader engagement
  • Share (or sell) metadata to the rest of the World

So what makes WordLift special

WordLift allows anyone to build his/her own Knowledge Graph. The plugin adds a layer of structured metadata to the content you write on WordPress. Every article is classified with named entities and these classifications are used to provide relevant recommendations that boost the articles of your site with widgets like the navigator and the faceted search. There is more.

The deep vocabulary can be used to understand natural language requests like – “Who is the founder of company [X]?”. Let’s dig deeper. Here is an example that uses a generalist question answering tool called Platypus.


Playtypus leverages on the Wikidata Knowledge Graph. Now if I would ask “Who is the founder of Insideout10?” Wikidata, would probably politely answer “I’m sorry but I don’t have this information“.

Now, the interesting part is that, for this specific question, this same blog holds the correct answer.

As named entities are described along with their properties I can consult the metadata about Insideout10and eventually have applications like Platypus run a SPARQL query on my graph.


This query returns two entities:

Who owns the data?

The site owner does. Every website has its own graph published on (or any custom domain name you might like) and the creator of the website holds all licensing rights of his/her data. In the next upcoming release a dedicated rights statement will be added to all graphs published with WordLift (here you’ll find the details of this issue).

So how big is this graph?

If we combine all existing beta testers we reach a total of 37.714 triples (this being the unit of measurement of information stored in a Knowledge Graph). Here is a chart representing this data.

While this is a very tiny fraction of the World’s knowledge (Wikidata holds 975.989.631 of triples – here is a simple query to check this information on their systems) it is relevant for the readers’ of this blog and contributes to the veracity of big data (“Is this data accurate?”).

Happy blogging!


Innovation in Digital Journalism – a report from the second year of FP7-MICO

The blog post summaries the work done in the context of the EU project FP7-MICO in the area of digital journalism.

This last December we attended in Cairo one of the most exciting events in the Middle East and North Africa region on Entrepreneurship and Hi-Tech startups: RiseUp Summit 2015. We engaged with the overwhelming crowd of startuppers and geeks on our HelixWare booth and in a separated Meetup organised at the Greek Campus.

We had the opportunity, during these two hectic days, to share the research work done in MICO for extending the publishing workflows of independent news organizations with cross-media analysis, natural language processing and linked data querying tools.



Introducing WordLift new Vocabulary

When we first started WordLift, we envisioned a simple way for people to structure their content using Semantic Fingerprinting and named entities. Over the last few weeks we’ve seen the Vocabulary as the central place to manage all named entities on a website. Moreover we’ve started to see named entities playing an important role in making the site more visibile to both humans and bots (mainly search engines at this stage).

Here is an overview on the numbers of weekly organic search visits from Google on this blog (while numbers are still small we’ve a 110% growth).


To help editors increase the quality of their entity pages, today, we are launching our new Vocabulary along with version 3.3.


The Vocabulary can be used as a directory for entities. Entities can now be filtered using the “Who“, “Where“, “When” and “What” categories and most importantly entities have a rating and a traffic light to quickly see where to start improving.

Until now it was hard to have a clear overview (thumbnails have been also introduced); it was also hard to see what was missing and..where. The principles for creating successful entity pages can be summarised as follow: 

  1. Every entity should be linked to one or more related posts. Every entity has a corresponding web page. This web page acts as a content hub (here is what we have to say about AI on this blog for example) – this means that we shall always have articles linked to our entities. This is not a strict rule though as we might also use the entity pages to build our website (rather than to organise our blog posts).
  2. Every entity should have its own description. And this description shall express the editor’s own vision on a given topic. 
  3. Every entity should link to other entities. When we chose other entities to enrich the description of an entity we create relationships within our knowledge graph and these relationships are precious and can be used in many different ways (the entity on AI on this blog is connected for instance with the entity John McCarthy who was the first to coin the term in 1955)
  4. Entities, just like any post in WordPress, can be kept as draft. Only when we publish them they become available in the analysis and we can use them to classify our contents.
  5. Images are worth thousand words as someone used to say. When we add a featured image to an entity we’re also adding the schema-org:image attribute to the entity.
  6. Every entity (unless you’re creating something completely new) should be interlinked with the same entity on at least one other dataset. This is called data interlinking and can be done by adding a link to the equivalent entity using the sameAs attribute (here we have for instance the same John McCarthy in the Yago Knowledge Base).
  7. Every entity has a type (i.e. Person, Place, Organization, …) and every type has its own set of properties. When we complete all the properties of an entity we increase its visibility.  

Happy blogging!


MICO Testing: One, Two, Three…

We’ve finally reached an important milestone in our validation work in the MICO project…we can begin testing and integrating our toolset with the first release of the platform to evaluate the initial set of media extractors. 

This blog post is more or less a diary of our first attempts in using MICO in conjunction with our toolset that includes:

  • HelixWare – the Video Hosting Platform (our online video platform that allows publishers and content providers to ingest, encode and distribute videos across multiple screens)
  • WordLift – the Semantic Editor for WordPress (assisting the editors writing a blog post and organising the website’s contents using semantic fingerprints)
  • Shoof – a UGC video recording application (this is an Android native app providing instant video-recording for people living in Cairo)

The workflow we’re planning to implement aims at improving content creation, content management and content delivery phases. 

Combined Deliverable 7.2.1 & 8.2.1 Use Cases- First Prototype

The diagram describes the various steps involved in the implementation of the scenarios we will use to run the tests. At this stage the main goal is to:

  • a) ingest videos in HelixWare,
  • b) process these videos with MICO and
  • c) add relevant metadata that will be further used by the client applications WordLift and Shoof.  

While we’re working to see MICO in action in real-world environments the tests we’ve designed aims at providing valuable feedback for the developers of each specific module in the platform.

These low-level components (called Technology Enablers or simply TE) include the extractors to analyse and annotate media files as well as modules for data querying and content recommendation. We’re planning to evaluate the TEs that are significant for our user stories and we have designed the tests around three core objectives:

  1. output accuracy​­ how accurate, detailed and meaningful each single response is when compared to other available tools;
  2. technical performance ​­ how much time each task requires and how scalable the solution is when we increase in volume the amount of contents being analysed;
  3. usability ​evaluated both in terms of integration, ​modularity ​and usefulness. ​

As of today being, everything still extremely experimental, we’re using a dedicated MICO platform running in a protected and centralised cloud environment. This machine has been installed directly by the technology partners of the project: this makes it easier for us to test and simpler for them to keep on developing, hot-fixing and stabilising the platform.    

Let’s start

By accessing the MICO Admin UI (this is accessible from the `/mico-configuration` directory), we’ve been able to select the analysis pipeline. MICO orchestrates different extractors and combines them in pipelines. At this stage the developer shall choose one pipeline at the time.  


Upon startup we can see the status of the platform by reading the command output window; while not standardised this already provides an overview on the startup of each media extractor in the pipeline.


For installing and configuring the MICO platform you can read the end-user documentation: at this stage I would recommend you to wait until everything becomes more stable (here is a link to the MICO end-user documentation)!

After starting up the system using the platform’s REST APIs we’ve been able to successfully send the first video files and request the processing of it. This is done mainly in three steps:

1. Create a Content Item
curl -X POST http://<mico_platform>/broker/inject/create


2. Create a Content Part
curl -X POST “http://

<mico_platform>/broker/inject/add?″ –data-binary @Bates_2045015110_512kb.mp4



3. Submit for processing
curl -v -X POST “http://


HTTP/1.1 200 OK

Server: Apache-Coyote/1.1

Content-Length: 0

Date: Wed, 08 Jul 2015 08:08:11 GMT

In the next blog posts we will see how to consume the data coming from MICO and how this data will be integrated in our application workflows.

In the meantime, if you’re interested in knowing more about MICO and how it could benefit your existing applications you can read:

Stay tuned for the next blog post!


One Week at the Monastery building WordLift.

Last week our team has gathered for a focused face-to-face session in a remote location of Abruzzo right in the center of Italy: the Abbey of Santo Spirito d’Ocre. Like early agile-teams we’ve discovered once again the importance of working together in close proximity and the value of keeping a healthy balance between hard work and quality of life. We’ve also found ourselves in love with the product we’re building and happy to share it with others.

Our business is made of small distributed teams organised in startups each one self-sufficient and focused on a specific aspect of our technology (we help business and individuals manage and organise contents being text, data, images or videos).

As peers gathered from different time zones in this unique little monastery of cistercian order we began executing our plan.

And yes, getting to the meeting with a clear agenda did help.  All the issues we had in our list had been summarised and shared in a dedicated Trello board. These included mainly the work we’ve been doing in the last years on WordLift our semantic editor for WordPress.

Cistercians (at least in the old days) kept manual labour as a central part of their monastic life. In our case we’ve managed to structure most of our days around three core activities: business planningbug fixing and software documentation. At the very basis we’ve kept the happiness of working together around something we like.

Emphatic vs Lean: setting up the Vision.

Most of the work in startups is governed by lean principles, the tools and the mindset of the people have been strongly influenced by the work of Eric Ries who first published a book to promote the lean approach and to share his recipe for continuos innovation and business success.

After over three years of work on WordLift we can say that we’ve worked in a complete different way. Lean demands teams to get out of the building and look for problems to be solved. In our case, while we’ve spent most of our time “out of the building” (and with our clients) we’ve started our product journey from a technology, inspired by the words of Tim Berners Lee on building a web for open, linked data and we’ve studied all possible ways to create an emotional connection with bloggers and writers.

Not just that, we have also dedicated time in analyzing the world of journalism, its changes and how it will continue evolving according to journalistic celebrities like David Carr (a famous journalist of the New York Times who died early on this year) and many others like him as well as the continuously emerging landscape of news algorithms that help people reach the content they want.

Establish the Vision of WordLiftUnderstanding that WordLift, and the go-to-market process that will follow shall be empathy-driven rather than lean is one of the greatest outcome of our “monastic” seminar in the valley of L’Aquila.

By using an empathy-driven expression: we’ve finally set the Vision.

Organise the Workflow: getting things done.

As most of today’s open-source software, WordLift is primarily built over GitHub.

While GitHub can be used and seen as a programming tool, GitHub – being the largest digital space for collaborative works – embeds your workflow.

While working at the monastery we’ve been able to discuss, test and implement a GitFlow workflow.

The Gitflow Workflow is designed around a strict branching model for project releases. While somehow complicated in the beginning we see it as a robust framework for continuing the development of WordLift and for doing bug fixing without waiting for the next release cycle.

Documentation. Documentation. Documentation.

Writing (hopefully good) software documentation helps the users of any product and shall provide a general understanding of core features and functions.

The reality is that, when documenting your own software, the advantages go way beyond the original aim of helping end-users.

By updating WordLift documentation we were able to get a clearer overview of all actions required by an editor in composing his/her blog post and/or in creating and publishing the knowledge graph. We also have been able to detect flows in the code and in the user experience.

Most importantly we’ve found that writing the documentation (and this is also done collaboratively over GitHub) can be a great way to keep everyone in sync between the “Vision” of the product (how to connect with our usersand the existing implementation (what are we offering with this release).

Next steps

Now it’s time to organise the team and start the next iterations by engaging with the first users of this new release while fine-tuning the value proposition (below the emphatic view of @mausa89 on the USP of WordLift).


As this is the very first blog post I’m writing with WordLift v3 I can tell you I’m very happy about it and if you would like to test it too join our newsletter...we will keep you posted while continue playing!

[hewa_player asset_id=”156″]

The closing of this blog post is @colacino_m‘s night improvisation from the monastery.

Retreat at the Monastery