Publications

2025


Query Processing on Heterogeneous Hardware

Scalable Data Management for Future Hardware | Anastasiia Kozar, Janis von Bleichert, Sebastian Breß, Philipp M Grulich, Clemens Lutz, Tilmann Rabl, Viktor Rosenfeld, Jonas Traub, Steffen Zeuch, Volker Markl

Abstract: In modern processor design, power efficiency has become the primary constraint, prompting manufacturers to develop processors that balance energy consumption with the growing demand for speed. This shift has initiated an era of heterogeneous multi-core computing, characterized by machines utilizing various processors such as GPUs, MICs, and FPGAs. These processors significantly enhance performance due to their computational capabilities and memory bandwidth, essential for optimizing query processing performance. However, executing database queries efficiently across diverse processors presents challenges due to architectural differences, leading to varied performance outcomes for different operator implementations. This chapter explores methodologies for executing database queries on any processor with maximum efficiency without manual adjustments. We propose compiling database queries into optimized code that can adapt continuously to achieve optimal performance across a wide array of processors. Key areas of focus include the use of GPUs in database systems, addressing challenges such as workload distribution and data transfer bottlenecks, and introducing a classification scheme for strategies developed to tackle these issues. Additionally, we examine NVLink 2.0 technology’s potential to improve data transfer efficiency between GPUs and CPUs, enhancing GPU-accelerated query processing. Furthermore, we present a novel adaptive query compilation-based stream processing engine (SPE) that surpasses traditional interpretation-based SPEs by incorporating runtime optimizations and task-based parallelization. This approach allows for dynamic adjustments to data characteristics, significantly improving query execution efficiency and throughput. Through these explorations, we aim to provide insights into current systems and highlight areas for future research, ultimately contributing to the advancement of heterogeneous query processing systems.


Scalable Data Management for Future Hardware

2024


Query compilation without regrets

Sigmod'24 | Philipp M. Grulich, Aljoscha Lepping, Dwi Prasetyo Adi Nugroho, Bonaventura Del Monte, Varun Pandey, Steffen Zeuch, Volker Markl

Abstract: Engineering high-performance query execution engines is a challenging task. Query compilation provides excellent performance, but at the same time introduces significant system complexity, as it makes the engine hard to build, debug, and maintain. To overcome this complexity, we propose Nautilus, a framework that combines the ease of use of query interpretation and the performance of query compilation. On the one hand, Nautilus provides an interpretation-based operator interface that enables engineers to implement operators using imperative C++ code to ensure a familiar developer experience. On the other hand, Nautilus mitigates the performance drawbacks of interpretation by introducing a novel trace-based, multi-backend JIT compiler that translates operators into efficient code. As a result, Nautilus bridges the gap between compilation and interpretation and provides the best of both worlds, achieving high performance without sacrificing the productivity of engineers.


Proceedings of the ACM on Management of Data, Volume 2, Issue 3

Bridging the Gap: Complex Event Processing on Stream Processing Systems

EDBT'24 | Ariane Ziehn, Philipp M. Grulich, Steffen Zeuch, Volker Markl

Abstract: Joining streams is an essential and resource-demanding operation in stream processing engines (SPEs). Recent works have shown significant performance benefits by offloading stream-join processing to hardware accelerators, including GPUs. As a result, a wide variety of GPU-accelerated stream join algorithms (SJAs) have emerged. However, existing works evaluate the proposed GPU-accelerated SJAs only in isolation, on different hardware, and not using a common workload. As a result, it is difficult to compare different SJAs and select the best-suited SJA for a particular situation. In this paper, we shed light on the performance characteristics of GPU-accelerated SJAs. To this end, we explore the configuration parameter space of SJAs and investigate the impact of each parameter. We evaluate the performance of well-known SJAs under multiple configurations of the underlying join algorithm, the parallelization strategy, the algorithm progressiveness, and the GPU type. Our results show that each variant of SJA has its strengths and weaknesses and that ill-suited configurations of parameters lead to up to two orders of magnitude difference in throughput. Based on the results, we developed a guideline for selecting SJA variants for different circumstances.


Proceedings of the 27th International Conference on Extending Database Technology (EDBT), 25th March-28th March, 2024,

Benchmarking Stream Join Algorithms on GPUs: A Framework and its Application to the State-of-the-art

EDBT'24 | Dwi PA Nugroho, Philipp M. Grulich, Steffen Zeuch, Clemens Lutz, Stefano Bortoli, Volker Markl

Abstract: Analytical Stream Processing (ASP) and Complex Event Processing (CEP) extract knowledge from unbounded data streams. ASP solutions are optimized for scalable cloud environments to handle huge volumes of data in motion. In contrast, CEP solutions aredesigned for single-machine deployments, limiting their usage for large data volumes and distributed processing. A few hybrid solutions seek to address the lack of support for large-scale CEP by enabling its support in ASP systems and exploiting their data collection and distribution capabilities. However, these hybrid solutions assign the entire pattern workload to a single unary operator, which becomes the bottleneck of the entire execution pipeline. In addition, this composed operator prevents the application from utilizing the highly efficient stream processing optimization capabilities currently available in ASP systems. In this paper, we propose a novel operator mapping that overcomes the drawbacks of current hybrid solutions. In particular, we bridge the gap between CEP and ASP by mapping CEP to ASP operators, enabling the decomposition of the pattern workload into multiple operators. As a result, our mapping enables CEP workloads to piggyback on the scalability and efficiency of cloud-based ASP systems. Our results demonstrate that our proposed mapping outperforms the single-operator solution for semantically equivalent ASP queries by a factor of up to 150x and enables workloads that current CEP solutions do not sustain. As a result, our mapping truly unlocks the benefits of both paradigms in one system by enabling a broad range of CEP functionalities in general-purpose ASP systems.


Proceedings of the 27th International Conference on Extending Database Technology (EDBT), 25th March-28th March, 2024,

2023


Towards Unifying Query Interpretation and Compilation

CIDR'23 | Philipp M. Grulich, Aljoscha Lepping, Dwi Prasetyo Adi Nugroho, Bonaventura Del Monte, Varun Pandey, Steffen Zeuch, Volker Markl

Abstract: Engineering high-performance query execution engines is a highly challenging task. Query compilation provides excellent performance but at the same time introduces a high system complexity, as it makes the engine hard to build, debug and maintain. In this paper, we discuss two fundamental challenges that hinder the adoption of query compilation.


13th Annual Conference on Innovative Data Systems Research (CIDR 2023). January 8-11, 2023, Amsterdam, The Netherlands.

Survey of window types for aggregation in stream processing systems

VLDB Journal | Juliane Verwiebe, Philipp M. Grulich, Jonas Traub, Volker Markl

Abstract: In this paper, we present the first comprehensive survey of window types for stream processing systems which have been presented in research and commercial systems. We cover publications from the most relevant conferences, journals, and system whitepapers on stream processing, windowing, and window aggregation which have been published over the last 20 years. For each window type, we provide detailed specifications, formal notations, synonyms, and use-case examples. We classify each window type according to categories that have been proposed in literature and describe the out-of-order processing. In addition, we examine academic, commercial, and open-source systems with respect to the window types that they support. Our survey offers a comprehensive overview that may serve as a guideline for the development of stream processing systems, window aggregation techniques, and frameworks that support a variety of window types.


The International Journal on Very Large Data Bases, 2023

2022


Babelfish: Efficient Execution of Polyglot Queries

VLDB'22 | Philipp M. Grulich, Steffen Zeuch, Volker Markl

Abstract: Today's users of data processing systems come from different domains, have different levels of expertise, and prefer different programming languages. As a result, analytical workload requirements shifted from relational to polyglot queries involving user-defined functions (UDFs). Although some data processing systems support polyglot queries, they often embed third-party language runtimes. This embedding induces a high performance overhead, as it causes additional data materialization between execution engines. In this paper, we present Babelfish, a novel data processing engine designed for polyglot queries. Babelfish introduces an intermediate representation that unifies queries from different implementation languages. This enables new, holistic optimizations across operator and language boundaries, e.g., operator fusion and workload specialization. As a result, Babelfish avoids data transfers and enables efficient utilization of hardware resources. Our evaluation shows that Babelfish outperforms state-of-the-art data processing systems by up to one order of magnitude and reaches the performance of handwritten code. With Babelfish, we bridge the performance gap between relational and multi-language UDFs and lay the foundation for the efficient execution of future polyglot workloads.


48th International Conference on Very Large DatabasesSydney, Australia - September 05-09, 2022.

Scotty: General and Efficient Open-source Window Aggregation for Stream Processing Systems.

TODS | Jonas Traub, Philipp M. Grulich, Alejandro Rodriguez Cuellar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl

Abstract: Window aggregation is a core operation in data stream processing. Existing aggregation techniques focus on reducing latency, eliminating redundant computations, or minimizing memory usage. However, each technique operates under different assumptions with respect to workload characteristics such as properties of aggregation functions (e.g., invertible, associative), window types (e.g., sliding, sessions), windowing measures (e.g., time- or count-based), and stream (dis)order. In this paper, we present Scotty, an efficient and general opensource operator for sliding-window aggregation in stream processing systems, such as Apache Flink, Apache Beam, Apache Samza, Apache Kafka, Apache Spark, and Apache Storm. One can easily extend Scotty with user-defined aggregation functions and window types. Scotty implements the concept of general stream slicing and derives workload characteristics from aggregation queries to improve performance without sacrificing its general applicability. We provide an in-depth view on the algorithms of the general stream slicing approach. Our experiments show that Scotty outperforms alternative solutions by up to one order of magnitude.


ACM Transactions on Database Systems, Volume 46, Issue 1, Pages 1–46

2021


An Energy-Efficient Stream Join for the Internet of Things.

DaMoN 21 | Adrian Michalke, Philipp M. Grulich, Clemens Lutz, Steffen Zeuch, Volker Markl

Abstract: The Internet of Things (IoT) combines large data centers with (mobile, networked) edge devices that are constrained both in compute power and energy budget. Modern edge devices contribute to query processing by leveraging accelerated processing units with multicore CPUs or GPUs. Therefore, data processing in the IoT presents the challenges of 1) minimizing the energy consumed while sustaining a given query throughput, and 2) processing increasingly complex queries within a given energy budget. In this paper, we investigate how modern edge devices can reduce the energy requirements of stream joins as a common data processing operation. We explore three dimensions to save energy: workload characteristics, computational efficiency, and heterogeneous hardware. Based on our findings, we propose the ecoJoin that 1) reduces energy consumption by 81% at a given join throughput, and 2) enables scaling the throughput by two orders-of-magnitude within a given energy budget.


Proceedings of the 17th International Workshop on Data Management on New Hardware, June 2021, Pages 1–6

Parallelizing Intra-Window Join on Multicores: An Experimental Study

SIGMOD 21 | Shuhao Zhang, Yancan Mao, Jiong He, Philipp M. Grulich, Steffen Zeuch, Bingsheng He, Richard T. B. Ma, Volker Markl

Abstract: The intra-window join (IaWJ), i.e., joining two input streams over a single window, is a core operation in modern stream processing applications. This paper presents the first comprehensive study on parallelizing the IaWJ on modern multicore architectures. In particular, we classify IaWJ algorithms into lazy and eager execution approaches. For each approach, there are further design aspects to consider, including different join methods and partitioning schemes, leading to a large design space. Our results show that none of the algorithms always performs the best, and the choice of the most performant algorithm depends on: (i) workload characteristics, (ii) application requirements, and (iii) hardware architectures. Based on the evaluation results, we propose a decision tree that can guide the selection of an appropriate algorithm.


Proceedings of the 2021 International Conference on Management of Data, China June 20 - 25, 2021

ExDRa: Exploratory Data Science on Federated Raw Data.

SIGMOD 21 | Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsöder, Philipp M. Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, Steffen Zeuch

Abstract: Data science workflows are largely exploratory, dealing with under-specified objectives, open-ended problems, and unknown business value. Therefore, little investment is made in systematic acquisition, integration, and pre-processing of data. This lack of infrastructure results in redundant manual effort and computation. Furthermore, central data consolidation is not always technically or economically desirable or even feasible (e.g., due to privacy, and/or data ownership). The ExDRa system aims to provide system infrastructure for this exploratory data science process on federated and heterogeneous, raw data sources. Technical focus areas include (1) ad-hoc and federated data integration on raw data, (2) data organization and reuse of intermediates, and (3) optimization of the data science lifecycle, under awareness of partially accessible data. In this paper, we describe use cases, the overall system architecture, selected features of SystemDS' new federated backend (for federated linear algebra programs, federated parameter servers, and federated data preparation), as well as promising initial results. Beyond existing work on federated learning, ExDRa focuses on enterprise federated ML and related data pre-processing challenges. In this context, federated ML has the potential to create a more fine-grained spectrum of data ownership and thus, even new markets.


Proceedings of the 2021 International Conference on Management of Data, China June 20 - 25, 2021

2020


Grizzly: Efficient Stream Processing Through Adaptive Query Compilation

SIGMOD'20 | Philipp M. Grulich, Sebastian Breß, Steffen Zeuch, Jonas Traub, Janis von Bleichert, Zongxiong Chen, Tilmann Rabl, Volker Markl

Abstract: Stream Processing Engines (SPEs) execute long-running queries on unbounded data streams. They follow an interpretation-based processing model and do not perform runtime optimizations. This limits the utilization of modern hardware and neglects changing data characteristics at runtime. In this paper, we present Grizzly, a novel adaptive query compilation-based SPE, to enable highly efficient query execution. We extend query compilation and task-based parallelization for the unique requirements of stream processing and apply adaptive compilation to enable runtime re-optimizations. The combination of light-weight statistic gathering with just-in-time compilation enables Grizzly to adjust to changing data-characteristics dynamically at runtime. Our experiments show that Grizzly outperforms state-of-the-art SPEs by up to an order of magnitude in throughput.


Proceedings of the 2020 International Conference on Management of Data, Portland, USA, June 14 - 19, 2020

Disco: Efficient Distributed Window Aggregation

EDBT'20 | Philipp M. Grulich, Sebastian Breß, Steffen Zeuch, Jonas Traub, Janis von Bleichert, Zongxiong Chen, Tilmann Rabl, Volker Markl

Abstract: Many business applications benefit from fast analysis of online data streams. Modern stream processing engines (SPEs) provide complex window types and user-defined aggregation functions to analyze streams. While SPEs run in central data centers, wireless sensors networks (WSNs) perform distributed aggregations close to the data sources, which is beneficial especially in modern IoT setups. However, WSNs support only basic aggregations and windows. To bridge the gap between complex central aggregations and simple distributed analysis, we propose Disco, a distributed complex window aggregation approach. Disco processes complex window types on multiple independent nodes while efficiently aggregating incoming data streams. Our evaluation shows that Disco’s throughput scales linearly with the number of nodes and that Disco already outperforms a centralized solution in a two-node setup. Furthermore, Disco reduces the network cost significantly compared to the centralized approach. Disco’s tree-like topology handles thousands of nodes per level and scales to support future data-intensive streaming applications.


25th International Conference on Extending Database Technology, Amsterdam, Netherlands, 2020.

The NebulaStream Platform: Data and Application Management for the Internet of Things

CIDR'20 | Steffen Zeuch, Ankit Chaudhary, Bonaventura Del Monte, Haralampos Gavriilidis, Dimitrios Giouroukis, Philipp M. Grulich, Sebastian Bress, Jonas Traub, Volker Markl

Abstract: The Internet of Things (IoT) presents a novel computing architecture for data management: a distributed, highly dynamic, and heterogeneous environment of massive scale. Applications for the IoT introduce new challenges for integrating the concepts of fog and cloud computing as well as sensor networks in one unified environment. In this paper, we highlight these major challenges and outline how existing systems handle them. To address these challenges, we introduce the NebulaStream platform, a general purpose, endto-end data management system for the IoT. NebulaStream addresses the heterogeneity and distribution of compute and data, supports diverse data and programming models going beyond relational algebra, deals with potentially unreliable communication, and enables constant evolution under continuous operation. In our evaluation, we demonstrate the effectiveness of our approach by providing early results on partial aspects.


Conference on Innovative Data Systems, Amsterdam, Netherlands, 2020.

2019


Generating Reproducible Out-of-Order Data Streams

DEBS'19 | Philipp M. Grulich, Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Volker Markl, Tilmann Rabl

Abstract: Evaluating modern stream processing systems in a reproducible manner requires data streams with different data distributions, data rates, and real-world characteristics such as delayed and out-of-order tuples. In this paper, we present an open source stream generator which generates reproducible and deterministic out-of-order streams based on real data files, simulating arbitrary fractions of out-of-order tuples and their respective delays.


ACM International Conference on Distributed and Event-based Systems, Darmstadt, Germany, June 24 - 28, 2019.

Efficient Window Aggregation with General Stream Slicing

EDBT'19 | Jonas Traub, Philipp M. Grulich, Alejandro Rodriguez Cuellar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl

Abstract: Window aggregation is a core operation in data stream processing. Existing aggregation techniques focus on reducing latency, eliminating redundant computations, and minimizing memory usage. However, each technique operates under different assumptions with respect to workload characteristics such as properties of aggregation functions (e.g., invertible, associative), window types (e.g., sliding, sessions), windowing measures (e.g., time- or countbased), and stream (dis)order. Violating the assumptions of a technique can deem it unusable or drastically reduce its performance. In this paper, we present the first general stream slicing technique for window aggregation. General stream slicing automatically adapts to workload characteristics to improve performance without sacrificing its general applicability. As a prerequisite, we identify workload characteristics which affect the performance and applicability of aggregation techniques. Our experiments show that general stream slicing outperforms alternative concepts by up to one order of magnitude.


22nd International Conference on Extending Database Technology, Lisbon, Portugal, March 26-29, 2019.

2018


Collaborative Edge and Cloud Neural Networks for Real-Time Video Processing

VLDB'18 | Philipp M. Grulich, Faisal Nawab

Abstract: The efficient processing of video streams is a key component in many emerging Internet of Things (IoT) and edge applications, such as Virtual and Augmented Reality (V/AR) and self-driving cars. These applications require real-time high-throughput video processing. This can be attained via a collaborative processing model between the edge and the cloud---called an Edge-Cloud model. To this end, many approaches were proposed to optimize the latency and bandwidth consumption of Edge-Cloud video processing, especially for Neural Networks (NN)-based methods. In this demonstration. We investigate the efficiency of these NN techniques, how they can be combined, and whether combining them leads to better performance. Our demonstration invites participants to experiment with the various NN techniques, combine them, and observe how the underlying NN changes with different techniques and how these changes affect accuracy, latency and bandwidth consumption.


Very Large Databases Conference, Rio de Janeiro, Brazil, August 27-31st, 2018.

Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing

EDBT'18 | Philipp M. Grulich, René Saitenmacher, Jonas Traub, Sebastian Breß, Tilmann Rabl, Volker Markl

Abstract: Machine learning techniques for data stream analysis suffer from concept drifts such as changed user preferences, varying weather conditions, or economic changes. These concept drifts cause wrong predictions and lead to incorrect business decisions. Concept drift detection methods such as adaptive windowing (Adwin) allow for adapting to concept drifts on the fly. In this paper, we examine Adwin in detail and point out its throughput bottlenecks. We then introduce several parallelization alternatives to address these bottlenecks. Our optimizations lead to a speedup of two orders of magnitude over the original Adwin implementation. Thus, we explore parallel adaptive windowing to provide scalable concept detection for high-velocity data streams with millions of tuples per second.


21st International Conference on Extending Database Technology, Vienna, Austria, 2018.

Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing.

ICDE'18 | Jonas Traub, Philipp Marian Grulich, Alejandro Rodriguez Cuellar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl

Abstract: Computing aggregates over windows is at the core of virtually every stream processing job. Typical stream processing applications involve overlapping windows and, therefore, cause redundant computations. Several techniques prevent this redundancy by sharing partial aggregates among windows. However, these techniques do not support out-of-order processing and session windows. Out-of-order processing is a key requirement to deal with delayed tuples in case of source failures such as temporary sensor outages. Session windows are widely used to separate different periods of user activity from each other. In this paper, we present Scotty, a high throughput operator for window discretization and aggregation. Scotty splits streams into non-overlapping slices and computes partial aggregates per slice. These partial aggregates are shared among all concurrent queries with arbitrary combinations of tumbling, sliding, and session windows. Scotty introduces the first slicing technique which (1) enables stream slicing for session windows in addition to tumbling and sliding windows and (2) processes out-of-order tuples efficiently. Our technique is generally applicable to a broad group of dataflow systems which use a unified batch and stream processing model. Our experiments show that we achieve a throughput an order of magnitude higher than alternative stateof-the-art solutions.


34th International Conference on Data Engineering, Paris, France, April 16-19, 2018.

2017


I2: Interactive Real-Time Visualization for Streaming Data.

EDBT'17 | Jonas Traub, Nikolaas Steenbergen, Philipp M. Grulich, Tilmann Rabl, Volker Markl

Abstract: Developing scalable real-time data analysis programs is a challenging task. Developers need insights from the data to define meaningful analysis flows, which often makes the development a trial and error process. Data visualization techniques can provide insights to aid the development, but the sheer amount of available data frequently makes it impossible to visualize all data points at the same time. We present I2, an interactive development environment that coordinates running cluster applications and corresponding visualizations such that only the currently depicted data points are processed and transferred. To this end, we present an algorithm for the real-time visualization of time series, which is proven to be correct and minimal in terms of transferred data. Moreover, we show how cluster programs can adapt to changed visualization properties at runtime to allow interactive data exploration on data streams.


20th International Conference on Extending Database Technology, Venice, Italy, March 21-24, 2017