Multi-Label Classification as a Service for Academic Journals

Oxford University Press, a department of the University of Oxford, has been developing semantic enrichment capabilities over a number of years, to improve the management and usage of our books, journals and dictionary content. This talk will describe a service to tag journal articles with terms from a SKOS taxonomy using a multi-class, multi-label classification algorithm using the Pool Party Semantic Classifier. I will outline some of the considerations we had when implementing a semantic service in a publishing production system, and how we overcame the classic challenges of limited training data and overlapping semantic classes using mapping techniques to alternative data sets tagged with different ontologies.

Manually classifying publication content is time-consuming, expensive and prone to error. Subject matter expertise can be quite difficult to find, and it’s even more difficult to find two who agree on every classification within a domain. In addition, given the speed at which Academic Journals publish, a business would need a substantially large team of individuals fully dedicated to classify each article in each Journal Volume/Issue. Yet we know the granularity with which we search, and therefore, expect to discover, information is increasing rapidly. More often, the expectations we have for Search Engine discovery, carry over to every aspect of our online experience. As a result, business sales models and marketing techniques are becoming more and more refined, which places more demand on Operational and Technical business units to refine support functions and intelligence management to allow sales, marketing and online discovery of our content at a fine-grained subject level.

To this end, we have deployed a semantic classification system operating real-time within our journals production process that employs a chain of binary classifiers to create a multi-label, multi-class classification against a large SKOS taxonomy of academic subjects. While the automation takes advantage of preexisting subject classifications which sit at the top of our SKOS taxonomy hierarchy, allowing us to automatically sort content against supervised classifiers bound to corresponding child subjects, the architecture can scale to handle the top-level classification, as well Scalability applies across the board from the front end to layered queuing, data validation and performance which allows the single system to seamlessly plug in to multiple production workflows with little-to-no loss in cycle turnaround time.


Interested in this talk?

Register for SEMANTiCS conference