Research & Innovation

Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation

Najmeh Mousavi NejadSimon ScerriSören Auer

With the omnipresent availability and use of cloud services, software tools, Web portals or services, legal contracts in the form of license agreements or terms and conditions regulating their use are of paramount importance. Often the textual documents describing these regulations comprise many pages and can not be reasonably assumed to be read and understood by humans. In this work, we describe a method for extracting and clustering relevant parts of such documents, including permissions, obligations, and prohibitions. The clustering is based on semantic similarity employing a distributional semantics approach on large word embeddings database. An evaluation shows that it can significantly improve human comprehension and that improved feature-based clustering has a potential to further reduce the time required for EULA digestion. Our implementation is available as a Web service, which can directly be used to process and prepare legal usage contracts.

Semantic Annotation of Heterogeneous Data Sources: Towards an Integrated Information Framework for Service Technicians

Sebastian BaderJan Oevermann

Service technicians in the domain of industrial maintenance require extensive technical knowledge and experience to complete their tasks. Some of the needed knowledge is made available as document-based technical manuals or service reports from previous deployments. Unfortunately, due to the great amount of data, service technicians spend a considerable amount of working time searching for the correct information. Another challenge is posed by the fact that valuable insights from operation reports are not yet considered due to insufficient textual quality and content-wise ambiguity. In this work we propose a framework to annotate and integrate these heterogeneous data sources to make them available as information units with Linked Data technologies. We use machine learning to modularize and classify information from technical manuals together with ontology-based autocompletion to enrich service reports with clearly defined concepts. By combining these two approaches we can provide an unified and structured interface for both manual and automated querying. We verify our approach by measuring precision and recall of information for typical retrieval tasks for service technicians, and show that our framework can provide substantial improvements for service and maintenance processes.

Linked Data Applied: A Field Report from the Netherlands

Jan Voskuil

Linked Data and the Semantic Web have generated interest in the Netherlands from the very beginning. Sporting several renowned research centers and some widely published early application projects, the Netherlands is home to Platform Linked Data Nederland, a grass-roots movement promoting Linked Data technologies which functions as a marketplace for exchanging ideas and experiences.

Specification of SemanticTrajectories and Data Transformations for Analytics: The datAcron Ontology

Georgios SantipantakisGeorge VourosChristos DoulkeridisAkrivi VlachouGennady AndrienkoNatalia AndrienkoJose Manuel CorderoMiguel Garcia Martinez

Motivated by real-life emerging needs in critical domains, this paper proposes a coherent and generic ontology for the representation of semantic trajectories, in association with related events and contextual information. The main contribution of the proposed ontology is the representation of semantic trajectories at varying, interlinked levels of spatio-temporal analysis. The paper presents the ontology in detail, also in connection to other well-known ontologies, and demonstrates how exploiting data at varying levels of granularity supports data transformations that can support visual analytics tasks in the air-traffic management domain.

Game CharacterOntology (GCO) - A vocabulary for extracting and describing game character information from fansites

Owen Sacco

Creating video games that are market competent costs in time, effort and resources which often cannot be afforded by small-medium enterprises, especially by independent game development studios. As most of the tasks involved in developing games are labour and creativity intensive, our vision is to reduce software development effort and enhance design creativity by automatically generating novel and semantically-enriched content for games from Web sources. In particular, this paper presents a vocabulary that defines detailed properties used for describing video game characters information extracted from sources such as fansites to create game character models. These character models could then be reused or merged to create new unconventional game characters.

Investigating the interpretability of hidden layers in deep text mining

Stephan RaaijmakersMaya SappelliWessel Kraaij

Good Applications for Crummy Entity Linkers? The Case of Corpus Selection inDigital Humanities

Alex OliemanKaspar BeelenJaap KampsMilan van Lange

We investigate the Digital Humanities use case, where scholars spend a considerable amount of time selecting relevant source texts. We developed WideNet; a semantically-enhanced search tool which leverages the strengths of (imperfect) EL without getting in the way of its expert users. We evaluate this tool in two historical case-studies aiming to collect a set of references to historical periods in parliamentary debates from the last two decades; the first targeted the Dutch Golden Age, and the second World War II.
The case-studies conclude with a critical reflection on the utility of WideNet for this kind of research, after which we outline how such a real-world application can help to improve EL technology in general.

LOD-a-lot: A Single-File Enabler for Data Science

Wouter BeekJavier D. FernándezRuben Verborgh

Many Data Scientists make use of Linked Open Data. However, most scientists restrict their analyses to one or two datasets (often DBpedia). One reason for this lack of variety in dataset use has been the complexity and cost of running large-scale triple stores, graph stores or property graphs. With Header Dictionary Triples (HDT) and Linked Data Fragments (LDF), the cost of Linked Data publishing has been significantly reduced. Still, Data Scientists who wish to run large-scale analyses need to query many LDF endpoints and integrate the results. Using recent innovations in data storage, compression and dissemination, we are able to compress (a large subset of) the LOD Cloud into a single file. We call this file LOD-a-lot. Because it is just one file, LOD-a-lot can be easily downloaded and shared. It can be queried locally or through an LDF endpoint. In this paper we identify several categories of use cases that previously required an expensive and complicated setup, but that can now be run over a cheap and simple LOD-a-lot file. LOD-a-lot does not expose the same functionality as a full-blown database suite, mainly offering Triple Pattern Fragments. Despite these limitations, this paper shows that there is a surprisingly wide collection of Data Science use cases that can be performed over a LOD-a-lot file. For these use cases LOD-a-lot significantly reduces the cost and complexity of doing Data Science.

Supporting virtual integration of Linked Data with just-in-time query recompilation

Alessandro AdamouMathieu D'AquinCarlo AlloccaEnrico Motta

In virtual data integration, the data reside on their original sources without being copied and transformed on a single platform as in warehousing. Integration must be performed at query execution time and relies on transformations of the original query to many target endpoints.

Towards a Semantic Outlier Detection Framework in WirelessSensor Networks

Iker Esnaola-GonzalezJesús BermúdezIzaskun FernandezSantiago FernandezAitor Arnaiz

Outlier detection in the preprocessing phase of Knowledge Discovery in Databases (KDD) processes has been a widely researched topic for many years. However, identifying the potential outlier cause still remains an unsolved challenge even though it could be very helpful for determining what actions to take after detecting it. Besides, conventional outlier detection methods might still overlook outliers in certain complex contexts. In this article, Semantic Technologies are used to contribute overcoming these problems by proposing the SemOD (Semantic Outlier Detection) Framework. This framework guides the data-scientist towards the detection of certain types of outliers in WSNs (Wireless Sensor Network). Feasibility of the approach has been tested in outdoor temperature sensors and results show that the proposed approach is generic enough to apply it to different sensors, even improving the accuracy of outlier detection as well as spotting their potential cause.

Search form

Research & Innovation

Pages