RML-view-to-CSV: A Proof-of-Concept Implementation for RML Logical Views

29 Feb 2024 (modified: 16 Mar 2024)ESWC 2024 Workshop KGCW SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: RML Logical View, flattening, joining, mixed content, proof-of-concept
TL;DR: Description and evaluation of proof-of-concept implementation for new RML Logical Views module, including following features: flattening of nested data, handling of mixed data formats, and extended joining of data sources
Abstract: Although the W3C Community Group on Knowledge Graph Construction (KGC)’s work on the modular RDF Mapping Language (RML) specification has taken great strides, open issues and respective solution proposals remain. Some of these issues are (i) inability to handle hierarchy in nested data, (ii) limited join functionality, and (iii) inability to handle mixed data formats. To combat these issues, the RML Logical Views module is proposed. However, proper but efficient validation of this module requires an implementation that allows short development cycles. In this workshop paper, we propose a proofof- concept RML Logical Views implementation, independent of and complementary to existing RML mapping engines. Our proof-of-concept covers three important features of the new RML Logical Views module: (i) flattening of nested data, (ii) extended joining of data sources, and (iii) handling mixed data formats. Our implementation supports one nested source format (JSON) and one tabular source format (CSV), and can be used independently, as preprocessor, by any RML Engine. With this implementation, we successfully executed the available relevant test cases of the RML Logical Views module. Additionally, we measured the knowledge graph construction times on GTFS-Madrid-Bench. When we included our implementation in the knowledge graph construction pipeline and replaced referencing object maps by joins in RML Logical Views, we noticed considerable execution time reductions. We conclude that the RML Logical Views specification can be implemented, and can solve needs that were not yet solvable by RML. The current implementation can already be realized as a modular part of a knowledge graph construction process. Although boosting performance was not the aim of our work, our implementation reduces the execution time of GTFS-Madrid-Bench scale 100 by 35% when combined with Morph-KGC or Carml. RMLStreamer, when used alone, times out after two hours on this task, but, in conjunction with our implementation, completes it in 236 seconds. We hope this proof-of-concept inspires the developers of existing RML engines to integrate the RML Logical Views module and benefit from its features.
Submission Number: 2
Loading