image

I am currently a Senior Software Engineer at Observe Inc., where I work on query optimization and execution. Previously, I pursued my Ph.D. in the Database Systems and Information Management research group at TU Berlin under the supervision of Prof. Volker Markl. I successfully defended my dissertation, titled “Query Compilation for Modern Data Processing Environments”, on November 8, 2023. The evaluation committee included Volker Markl, Carsten Binning, Stratos Idreos und Matthias Böhm.

My research focuses on the intersection of scalable data processing systems, stream processing, and compiler design. During my Ph.D., I worked extensively on the compilation-based execution engine of NebulaStream and contributed to several publications at top-tier conferences such as VLDB, SIGMOD, CIDR, and EDBT.

Before my Ph.D., I earned an M.Sc. in Computer Science from TU Berlin in March 2019, specializing in big data analytics systems. During my master's program, I spent two quarters as an exchange student at UC Santa Cruz, where I collaborated with Faisal Nawab on data management in fog infrastructures. I received my B.Sc. in Applied Computer Science from the Hamburg University of Applied Sciences.


See my full CV for an overview of my work!


Selected Publications

Selected publications. For more information, view my publication page.

Query compilation without regrets

Sigmod'24 | Philipp M. Grulich, Aljoscha Lepping, Dwi Prasetyo Adi Nugroho, Bonaventura Del Monte, Varun Pandey, Steffen Zeuch, Volker Markl

Abstract: Engineering high-performance query execution engines is a challenging task. Query compilation provides excellent performance, but at the same time introduces significant system complexity, as it makes the engine hard to build, debug, and maintain. To overcome this complexity, we propose Nautilus, a framework that combines the ease of use of query interpretation and the performance of query compilation. On the one hand, Nautilus provides an interpretation-based operator interface that enables engineers to implement operators using imperative C++ code to ensure a familiar developer experience. On the other hand, Nautilus mitigates the performance drawbacks of interpretation by introducing a novel trace-based, multi-backend JIT compiler that translates operators into efficient code. As a result, Nautilus bridges the gap between compilation and interpretation and provides the best of both worlds, achieving high performance without sacrificing the productivity of engineers.


Proceedings of the ACM on Management of Data, Volume 2, Issue 3

Towards Unifying Query Interpretation and Compilation

CIDR'23 | Philipp M. Grulich, Aljoscha Lepping, Dwi Prasetyo Adi Nugroho, Bonaventura Del Monte, Varun Pandey, Steffen Zeuch, Volker Markl

Abstract: Engineering high-performance query execution engines is a highly challenging task. Query compilation provides excellent performance but at the same time introduces a high system complexity, as it makes the engine hard to build, debug and maintain. In this paper, we discuss two fundamental challenges that hinder the adoption of query compilation.


13th Annual Conference on Innovative Data Systems Research (CIDR 2023). January 8-11, 2023, Amsterdam, The Netherlands.

Survey of window types for aggregation in stream processing systems

VLDB Journal | Juliane Verwiebe, Philipp M. Grulich, Jonas Traub, Volker Markl

Abstract: In this paper, we present the first comprehensive survey of window types for stream processing systems which have been presented in research and commercial systems. We cover publications from the most relevant conferences, journals, and system whitepapers on stream processing, windowing, and window aggregation which have been published over the last 20 years. For each window type, we provide detailed specifications, formal notations, synonyms, and use-case examples. We classify each window type according to categories that have been proposed in literature and describe the out-of-order processing. In addition, we examine academic, commercial, and open-source systems with respect to the window types that they support. Our survey offers a comprehensive overview that may serve as a guideline for the development of stream processing systems, window aggregation techniques, and frameworks that support a variety of window types.


The International Journal on Very Large Data Bases, 2023

Babelfish: Efficient Execution of Polyglot Queries

VLDB'22 | Philipp M. Grulich, Steffen Zeuch, Volker Markl

Abstract: Today's users of data processing systems come from different domains, have different levels of expertise, and prefer different programming languages. As a result, analytical workload requirements shifted from relational to polyglot queries involving user-defined functions (UDFs). Although some data processing systems support polyglot queries, they often embed third-party language runtimes. This embedding induces a high performance overhead, as it causes additional data materialization between execution engines. In this paper, we present Babelfish, a novel data processing engine designed for polyglot queries. Babelfish introduces an intermediate representation that unifies queries from different implementation languages. This enables new, holistic optimizations across operator and language boundaries, e.g., operator fusion and workload specialization. As a result, Babelfish avoids data transfers and enables efficient utilization of hardware resources. Our evaluation shows that Babelfish outperforms state-of-the-art data processing systems by up to one order of magnitude and reaches the performance of handwritten code. With Babelfish, we bridge the performance gap between relational and multi-language UDFs and lay the foundation for the efficient execution of future polyglot workloads.


48th International Conference on Very Large DatabasesSydney, Australia - September 05-09, 2022.

Grizzly: Efficient Stream Processing Through Adaptive Query Compilation

SIGMOD'20 | Philipp M. Grulich, Sebastian Breß, Steffen Zeuch, Jonas Traub, Janis von Bleichert, Zongxiong Chen, Tilmann Rabl, Volker Markl

Abstract: Stream Processing Engines (SPEs) execute long-running queries on unbounded data streams. They follow an interpretation-based processing model and do not perform runtime optimizations. This limits the utilization of modern hardware and neglects changing data characteristics at runtime. In this paper, we present Grizzly, a novel adaptive query compilation-based SPE, to enable highly efficient query execution. We extend query compilation and task-based parallelization for the unique requirements of stream processing and apply adaptive compilation to enable runtime re-optimizations. The combination of light-weight statistic gathering with just-in-time compilation enables Grizzly to adjust to changing data-characteristics dynamically at runtime. Our experiments show that Grizzly outperforms state-of-the-art SPEs by up to an order of magnitude in throughput.


Proceedings of the 2020 International Conference on Management of Data, Portland, USA, June 14 - 19, 2020

The NebulaStream Platform: Data and Application Management for the Internet of Things

CIDR'20 | Steffen Zeuch, Ankit Chaudhary, Bonaventura Del Monte, Haralampos Gavriilidis, Dimitrios Giouroukis, Philipp M. Grulich, Sebastian Bress, Jonas Traub, Volker Markl

Abstract: The Internet of Things (IoT) presents a novel computing architecture for data management: a distributed, highly dynamic, and heterogeneous environment of massive scale. Applications for the IoT introduce new challenges for integrating the concepts of fog and cloud computing as well as sensor networks in one unified environment. In this paper, we highlight these major challenges and outline how existing systems handle them. To address these challenges, we introduce the NebulaStream platform, a general purpose, endto-end data management system for the IoT. NebulaStream addresses the heterogeneity and distribution of compute and data, supports diverse data and programming models going beyond relational algebra, deals with potentially unreliable communication, and enables constant evolution under continuous operation. In our evaluation, we demonstrate the effectiveness of our approach by providing early results on partial aspects.


Conference on Innovative Data Systems, Amsterdam, Netherlands, 2020.

Collaborative Edge and Cloud Neural Networks for Real-Time Video Processing

VLDB'18 | Philipp M. Grulich, Faisal Nawab

Abstract: The efficient processing of video streams is a key component in many emerging Internet of Things (IoT) and edge applications, such as Virtual and Augmented Reality (V/AR) and self-driving cars. These applications require real-time high-throughput video processing. This can be attained via a collaborative processing model between the edge and the cloud---called an Edge-Cloud model. To this end, many approaches were proposed to optimize the latency and bandwidth consumption of Edge-Cloud video processing, especially for Neural Networks (NN)-based methods. In this demonstration. We investigate the efficiency of these NN techniques, how they can be combined, and whether combining them leads to better performance. Our demonstration invites participants to experiment with the various NN techniques, combine them, and observe how the underlying NN changes with different techniques and how these changes affect accuracy, latency and bandwidth consumption.


Very Large Databases Conference, Rio de Janeiro, Brazil, August 27-31st, 2018.
View all Publications

Work Experience

Senior Software Engineer

ObserveInc | 2023 - Present

    Research Associate

    Technische Universität Berlin | 2019 - 2023

      Research Assistant

      German Research Centre for Artificial Intelligence | 2016 - 2019

        Software Developer

        Seamless Interaction | 2014 - 2015

          Apprenticeship as a Computer Science Expert in Software Development

          Otto Group | 2010 - 2013

            Projects

            NebulaStream

            Open Source

            I am actively involved as lead scientist and engineer in the development of NebulaStream: a general purpose, end-to-end data management system for the IoT. My work mainly focuses on the development of an efficent execution engine for high performance data processing on heterogeneous devices.