Scaling Semantic Technology to Increase User Engagement - FT.com

Industry

The Financial Times as part of their digital-first strategy made the decision to not chase their consumers on whatever platform they might be using today and trying to predict what they'll be using tomorrow. Instead they took a universal publishing platform-independent strategy. This meant APIs for delivering services, good tools for creating content and good semantic metadata for assets, content and users to connect the two. By augmenting user data and content data with semantics, the FT was able to increase reader engagement, which is a key performance indicator for their 'digital first' strategy.

"Everyone forgets about metadata. They think they can just make stuff and then forget about how it is organised in terms of how you describe your content. But all your assets are useless to you unless you have metadata – your archive is full of stuff that is of no value because you can’t find it and don’t know what it’s about." 
-John O'Donovan. CTO of Financial Times

To be able to help the Financial Times achieve this vision, Ontotext pushed the state-of-the-art in RDF-based graph stores in terms of reliability, scalability and availability. This included support for multiple data centres and an AWS implementation. Additionally, the experience taught the company the importance, and often undervalued, of developing the user experience aspects of the technology. The semantic database at the FT is powering their most important B2B solutions. Peak query loads per second are at 50 QPS for reading and 20 QPS for writing over 184 million statements.

Additionally, the Financial Times had the same problem that every other content creator across news, media and publishing. Publishers have a vast amount of unstructured text, which is expensive and difficult to repurpose for new products and services. Ontotext has the unique ability to offer an enterprise-grade semantic repository, GraphDB that is tightly coupled with Natural Language Processing (NLP) services to enable ontology-aware NLP pipelines and entity disambiguation.

Previously, the Financial Times had basic concept extraction but the software they were using was expensive, difficult to maintain and poorly integrated with the technical stack. Ontotext's Publishing Platform offered an open non-proprietary solution that was tightly integrated with their choice of graphDB. The FT is now able to identify more than seven million named entities, and these entities all have additional metadata to enable 'added value' products such as ticker price information or affiliations. In addition, there are twenty million labels for people, companies and organisations. The coupling of a semantic database with text analytical pipelines via a specially developed plugin enables a high level of precision and recall, which is essential for a new organisation. The concept extraction service cluster scales horizontally to hand peak load times and can handle 10 documents per second at 100% reliability. These achievements are already impressive but the Financial Times had even greater ambitions. Besides semantically enriching content, Ontotext developed a recommendation service on top of the platform. This further drove consumer engagement as the content offered was informed by the user behaviour in addition to the semantic profile of the content, which provided a more holistic and naturalistic user experience. The result is a 100% reliable production API handling 1.5 to 3 million requests per day. Since going live last year, the system has indexed half a million documents and made 200 million recommendations. This is achieved without caching as each request is effectively a personalized search request.

Previously, Semantic technology has been dismissed as too academic with little support for enterprise environment or tools and interfaces that are unfriendly to anyone less than expert in the Semantic Web. The successful adoption, implementation and production use of semantic technology at the Financial Times is demonstrating that the technology is becoming an essential part of the technology stack in news and publishing. Jem Rayfield, the Head of Solution Architecture, confirms this observation, “…editorial and business [sides] have seen the benefits of the [semantic] approach and they want to follow it.”

Speakers: