LingoDB - Revolutionizing Data Processing with Compiler Technology

LingoDB is a cutting-edge data processing system that leverages compiler technology to achieve unprecedented flexibility and extensibility without sacrificing performance. It supports a wide range of data-processing workflows beyond relational SQL queries, thanks to declarative sub-operators. Furthermore, LingoDB can perform cross-domain optimization by interleaving optimization passes of different domains and its flexibility enables sustainable support for heterogeneous hardware.

Features

JIT Query Compilation

LingoDB heavily builds on the MLIR compiler framework for compiling queries to efficient machine code without much latency.

Flexibility

LingoDB uses multiple layers of intermediate representation. This approach allows for high flexibility by exchanging layers.

Extendable Beyond SQL

LingoDB's custom dialects are designed for combination with any other MLIR dialects. Thus, LingoDB can be extended to other data processing domains through corresponding MLIR dialects.

Query Optimization

LingoDB implements state-of-the-art query optimizations as compiler passes, which allows for composing custom optimization pipeline, e.g., for cross-domain optimization.

Apache Arrow

By using Apache Arrow as in-memory storage format, LingoDB can interact automatically with many different systems.

Complex SQL

LingoDB can run complex analytical SQL queries and thus supports all queries of benchmarks like SSB, TPC-H, TPC-DS, and JOB.

Research

Query Engine Design

Through its flexible design, LingoDB facilitates fundamental research regarding query engine architectures.

Heterogeneous Hardware

By using a layered design with sub-operators and building on MLIR, LingoDB is an ideal research tool for investigating heterogeneous hardware for data processing.

Cross-Domain Execution and Optimization

LingoDB's design allows for representing both SQL queries and other domains which simplifies resarch on cross-domain execution and optimization.

Understand LingoDB

Core publications


2023

Declarative Sub-Operators for Universal Data Processing

VLDB 2023 | Michael Jungmair and Jana Giceva | August 28, 2023

Abstract

Data processing systems face the challenge of supporting increasingly diverse workloads efficiently. At the same time, they are already bloated with internal complexity, and it is not clear how new hardware can be supported sustainably. In this paper, we aim to resolve these issues by proposing a unified abstraction layer based on declarative sub-operators in addition to relational operators. By exposing this layer to users, they can express their non-relational workloads declaratively with sub-operators. Furthermore, the proposed sub-operators decouple the semantic implementation of operators from the efficient imperative implementation, reducing the implementation complexity for relational operators. Finally, through fine-grained automatic optimizations, the declarative sub-operators allow for automatic morsel-driven parallelism. We demonstrate the benefits not only by providing a specific set of sub-operators but also implementing them in a compiling query engine. With thorough evaluation and analysis, we show that we can support a richer set of workloads while retaining the development complexity low and being competitive in performance even with specialized systems.

2022

Designing an open framework for query optimization and compilation

VLDB 2022 | Michael Jungmair, André Kohn, and Jana Giceva | September 5, 2022

Abstract

Since its invention, data-centric code generation has been adopted for query compilation by various database systems in academia and industry. These database systems are fast but maximize performance at the expense of developer friendliness, flexibility, and extensibility. Recent advances in the field of compiler construction identified similar issues for domain-specific compilers and introduced a solution with MLIR, a generic infrastructure for domain-specific dialects.
We propose a layered query compilation stack based on MLIR with open intermediate representations that can be combined at each layer. We further propose moving query optimization into the query compiler to benefit from the existing optimization infrastructure and make cross-domain optimization viable. With LingoDB, we demonstrate that the used approach significantly decreases the implementation effort and is highly flexible and extensible. At the same time, LingoDB achieves high performance and low compilation latencies.

Other Resources


Team

Students

StudentTopicAdvisor(s)Type
Robert ImschweilerTransforming Data Frame Operations from Python to MLIREngelke, JungmairB.Sc. Thesis
Florian DrescherA template-based code generation backend for MLIREngelkeGuided Research
Raoul ZebischSub-Operator Placement on GPUs for accelerating analytical queriesJungmairM.Sc. Thesis
Pascal GinterC-Backend, Index-Nested Loop Joins, Query Plan VisualizationJungmairResearch Assistant

Let's Work Together. Get in Touch!

Contact us for student theses, collaborations, and research opportunities.