Scaling RML and SPARQL-based Knowledge Graph Construction with Apache SparkDownload PDF

17 Mar 2023 (modified: 24 Mar 2023)ESWC 2023 Workshop KGCW SubmissionReaders: Everyone
Keywords: RML, SPARQL, Big data, Semantic Query Optimization, Knowledge Graph Construction, RDF
TL;DR: Convert RML to SPARQL (with extensions); execute SPARQL with Apache Spark
Abstract: Approaches for the construction of knowledge graphs from heterogeneous data sources range from ad-hoc scripts to dedicated mapping languages. Two common foundations are thereby RML and SPARQL. So far, both approaches are treated as different: On the one hand there are tools specifically for processing RML whereas on the other hand there are tools that extend SPARQL in order to incorporate additional data sources. In this work, we first show how this gap can be bridged by translating RML to a sequence of SPARQL CONSTRUCT queries and introduce the necessary SPARQL extensions. In a subsequent step, we employ techniques to optimize SPARQL query workloads as well as individual query execution times in order to obtain an optimized sequence of queries w.r.t. order and uniqueness of the generated triples. Finally, we present a corresponding SPARQL query execution engine based on the Apache Spark Big Data framework. In our evaluation on benchmarks we show that our approach is capable of achieving RML mapping execution performance that surpasses the current state of the art.
1 Reply
