image

I am currently pursuing my PhD in the Database Systems and Information Management research group at TU Berlin, under the esteemed guidance of Prof. Volker Markl.

My research focuses on stream processing systems and scalable data processing on modern hardware. At present, I'm working into the compilation-based execution engine of NebulaStream and contributed to several publications in top-tier conferences, including VLDB, SIGMOD, CIDR, and EDBT.

Before my PhD, I earned an M.Sc. in Computer Science from TU-Berlin in March 2019, with a specialization in big data analytics systems. During my master's program, I had the spent two quarters as an exchange student at UC Santa Cruz. There, I collaborated with Faisal Nawab and worked on data managment in fog infrastructures. I received a B.Sc. in Applied Computer Science from the Hamburg University of Applied Sciences.


See my full CV for an overview of my work!


Selected Publications

Selected publications. For more information, view my publication page.

Towards Unifying Query Interpretation and Compilation

CIDR'23 | Philipp M. Grulich, Aljoscha Lepping, Dwi Prasetyo Adi Nugroho, Bonaventura Del Monte, Varun Pandey, Steffen Zeuch, Volker Markl

Abstract: Engineering high-performance query execution engines is a highly challenging task. Query compilation provides excellent performance but at the same time introduces a high system complexity, as it makes the engine hard to build, debug and maintain. In this paper, we discuss two fundamental challenges that hinder the adoption of query compilation.


13th Annual Conference on Innovative Data Systems Research (CIDR 2023). January 8-11, 2023, Amsterdam, The Netherlands.

Survey of window types for aggregation in stream processing systems

VLDB Journal | Juliane Verwiebe, Philipp M. Grulich, Jonas Traub, Volker Markl

Abstract: In this paper, we present the first comprehensive survey of window types for stream processing systems which have been presented in research and commercial systems. We cover publications from the most relevant conferences, journals, and system whitepapers on stream processing, windowing, and window aggregation which have been published over the last 20 years. For each window type, we provide detailed specifications, formal notations, synonyms, and use-case examples. We classify each window type according to categories that have been proposed in literature and describe the out-of-order processing. In addition, we examine academic, commercial, and open-source systems with respect to the window types that they support. Our survey offers a comprehensive overview that may serve as a guideline for the development of stream processing systems, window aggregation techniques, and frameworks that support a variety of window types.


The International Journal on Very Large Data Bases, 2023

Babelfish: Efficient Execution of Polyglot Queries

VLDB'22 | Philipp M. Grulich, Steffen Zeuch, Volker Markl

Abstract: Today's users of data processing systems come from different domains, have different levels of expertise, and prefer different programming languages. As a result, analytical workload requirements shifted from relational to polyglot queries involving user-defined functions (UDFs). Although some data processing systems support polyglot queries, they often embed third-party language runtimes. This embedding induces a high performance overhead, as it causes additional data materialization between execution engines. In this paper, we present Babelfish, a novel data processing engine designed for polyglot queries. Babelfish introduces an intermediate representation that unifies queries from different implementation languages. This enables new, holistic optimizations across operator and language boundaries, e.g., operator fusion and workload specialization. As a result, Babelfish avoids data transfers and enables efficient utilization of hardware resources. Our evaluation shows that Babelfish outperforms state-of-the-art data processing systems by up to one order of magnitude and reaches the performance of handwritten code. With Babelfish, we bridge the performance gap between relational and multi-language UDFs and lay the foundation for the efficient execution of future polyglot workloads.


48th International Conference on Very Large DatabasesSydney, Australia - September 05-09, 2022.

Grizzly: Efficient Stream Processing Through Adaptive Query Compilation

SIGMOD'20 | Philipp M. Grulich, Sebastian Breß, Steffen Zeuch, Jonas Traub, Janis von Bleichert, Zongxiong Chen, Tilmann Rabl, Volker Markl

Abstract: Stream Processing Engines (SPEs) execute long-running queries on unbounded data streams. They follow an interpretation-based processing model and do not perform runtime optimizations. This limits the utilization of modern hardware and neglects changing data characteristics at runtime. In this paper, we present Grizzly, a novel adaptive query compilation-based SPE, to enable highly efficient query execution. We extend query compilation and task-based parallelization for the unique requirements of stream processing and apply adaptive compilation to enable runtime re-optimizations. The combination of light-weight statistic gathering with just-in-time compilation enables Grizzly to adjust to changing data-characteristics dynamically at runtime. Our experiments show that Grizzly outperforms state-of-the-art SPEs by up to an order of magnitude in throughput.


Proceedings of the 2020 International Conference on Management of Data, Portland, USA, June 14 - 19, 2020

The NebulaStream Platform: Data and Application Management for the Internet of Things

CIDR'20 | Steffen Zeuch, Ankit Chaudhary, Bonaventura Del Monte, Haralampos Gavriilidis, Dimitrios Giouroukis, Philipp M. Grulich, Sebastian Bress, Jonas Traub, Volker Markl

Abstract: The Internet of Things (IoT) presents a novel computing architecture for data management: a distributed, highly dynamic, and heterogeneous environment of massive scale. Applications for the IoT introduce new challenges for integrating the concepts of fog and cloud computing as well as sensor networks in one unified environment. In this paper, we highlight these major challenges and outline how existing systems handle them. To address these challenges, we introduce the NebulaStream platform, a general purpose, endto-end data management system for the IoT. NebulaStream addresses the heterogeneity and distribution of compute and data, supports diverse data and programming models going beyond relational algebra, deals with potentially unreliable communication, and enables constant evolution under continuous operation. In our evaluation, we demonstrate the effectiveness of our approach by providing early results on partial aspects.


Conference on Innovative Data Systems, Amsterdam, Netherlands, 2020.

Collaborative Edge and Cloud Neural Networks for Real-Time Video Processing

VLDB'18 | Philipp M. Grulich, Faisal Nawab

Abstract: The efficient processing of video streams is a key component in many emerging Internet of Things (IoT) and edge applications, such as Virtual and Augmented Reality (V/AR) and self-driving cars. These applications require real-time high-throughput video processing. This can be attained via a collaborative processing model between the edge and the cloud---called an Edge-Cloud model. To this end, many approaches were proposed to optimize the latency and bandwidth consumption of Edge-Cloud video processing, especially for Neural Networks (NN)-based methods. In this demonstration. We investigate the efficiency of these NN techniques, how they can be combined, and whether combining them leads to better performance. Our demonstration invites participants to experiment with the various NN techniques, combine them, and observe how the underlying NN changes with different techniques and how these changes affect accuracy, latency and bandwidth consumption.


Very Large Databases Conference, Rio de Janeiro, Brazil, August 27-31st, 2018.
View all Publications

Work Experience

Research Associate

Technische Universität Berlin | 2019 - Present

    Research Assistant

    German Research Centre for Artificial Intelligence | 2016 - 2019

      Software Developer

      Seamless Interaction | 2014 - 2015

        Apprenticeship as a Computer Science Expert in Software Development

        Otto Group | 2010 - 2013

          Projects

          NebulaStream

          Open Source

          I am actively involved as lead scientist and engineer in the development of NebulaStream: a general purpose, end-to-end data management system for the IoT. My work mainly focuses on the development of an efficent execution engine for high performance data processing on heterogeneous devices.