The DEEM Lab is a cross-organisational research group uniting the chair for the management of data science processes at Technische Universität Berlin with external members from multiple universities and industry. The lab is led by Prof. Dr.-Ing. Sebastian Schelter and is part of the Berlin Institute for the Foundations of Learning and Data (BIFOLD).
Our lab conducts fundamental research at the intersection of data management and machine learning, which addresses data-related problems in ML applications that cause negative economic, societal or scientific impact. Our goal is to foster the responsible management of data and to lower the technical bar for working with data science technologies.
Our research is accompanied by efficient and scalable open source implementations, many of which are applied in real world use cases, for example in the Amazon Web Services cloud and in large European e-commerce platforms. The focus areas of the lab are:
Our research contributions have been recognized with an ACM SIGMOD Systems Award, an ACM SIGMOD Best Demo Runner Up Award and a Best Paper Runner Up Award from the Table Representation Learning workshop at NeurIPS. We have ongoing collaborations with the University of Amsterdam and CWI in the Netherlands, as well as with the Center for Responsible AI at New York University.
Faculty, Postdocs & Staff |
|||
![]() |
![]() |
![]() |
|
PhD Students & Guests |
|||
![]() |
![]() |
![]() |
|
![]() (University of Amsterdam) |
![]() (University of Amsterdam) |
![]() (Motherduck) |
|
![]() (University of Amsterdam) |
![]() (Snowflake) |
||
Master Students |
|||
![]() (University of Amsterdam) |
![]() (University of Amsterdam) |
Sebastian Schelter is a Full Professor at the Berlin Institute on the Foundations of Learning and Data (BIFOLD) and Technische Universität Berlin. His research is focused on the intersection of data management and machine learning with the goal to foster the responsible management of data and to democratise data science technologies.
The research of his group is accompanied by efficient and scalable open source implementations, many of which are applied in real world use cases, for example in the Amazon Web Services cloud and in large European e-commerce platforms.
In the past, he has been an assistant professor at the University of Amsterdam, a faculty fellow at New York University, a senior applied scientist at Amazon Research and a research intern at Twitter and IBM Almaden in California. His research contributions have been recognized with an ACM SIGMOD Systems Award, an ACM SIGMOD Best Demo Runner Up Award, and a Best Paper Runner Up Award from the Table Representation Learning workshop at NeurIPS.
We offer the following courses during the summer semester 2025:
For taking one of our courses, please sign up on the corresponding course page on ISIS and attend the first lecture, where we will discuss the details for the formal registration.
If you are interested in writing a bachelor and master thesis with us, please check out our list of available topics at theses.tu-berlin.de.
We are looking for a PhD student to conduct research in responsible data engineering. The research will focus on data preparation and data pipelines for complex machine learning (ML) systems. Such ML systems are increasingly used to automate impactful decisions but suffer from many unsolved data management challenges with respect to their correctness, reliability, and compliance with legal regulations.
The goal of the research will be to design and efficiently implement data-centric methods to make ML systems guarantee their users control over their personal data (e.g., with respect to the "right-to-be-forgotten" from GDPR) and adhere to legal regulations such as the upcoming European AI Act.
This will be achieved via novel declarative methods to create, maintain and assess datasets for ML use cases. These will assist non-expert users with data-centric tasks, such as evaluating the robustness of their ML pipelines to data errors and potentially leverage the code generation capabilities of large language models. The resulting methods will be accompanied by efficient and scalable implementations and made publicly available as open source libraries. Teaching tasks.
Requirements
Not required, but nice to have
How to apply: Please send your application with the usual documents by e-mail to Prof. Dr. Sebastian Schelter at schelter [at] tu-berlin [dot] de , quoting the reference number IV-22/25, until 6th of June 2025.
We are looking for a PhD student to conduct research in responsible data engineering. The research will be conducted in close collaboration with Prof. Julia Stoyanovich from New York University. Responsible data engineering is emerging as a new discipline at the intersection of data engineering and AI that treats ethics, legal compliance, and inclusivity as central design considerations. The holistic nature of this approach is based on the observation that the decisions we make during data collection and preparation profoundly impact AI systems we build and deploy.
The goal of this position is to create a new system which helps data engineers to design data preparation pipelines that optimize model performance along a rich set of responsibility objectives, including accuracy, robustness, fairness, and legal compliance. For that, the system will proactively guide data engineers through the selection and evaluation of a large set of data preprocessing, data augmentation and feature selection operations. A reliable, efficient and easy-to-use open source implementation of this system will be created as part of the research project.
This endeavor is technically challenging in multiple ways. First, data preparation and model selection need to be optimized for multiple objectives, in contrast to existing approaches, which focus on a single objective only such as overall prediction accuracy. Second, the system will have to create, rewrite and concurrently execute large numbers of different pipeline variants, which requires an efficient runtime and novel query optimization techniques. Third, the research needs to account for current dramatic changes in the development practices of AI applications, e.g., AI assisted programming, tabular foundation models and AI-based data science agents.
Requirements
Not required, but nice to have
How to apply: Please send your application with the usual documents by e-mail to Prof. Dr. Sebastian Schelter at schelter [at] tu-berlin [dot] de , quoting the reference number IV-177/25, until 20th of June 2025.
We are currently looking for two student employees to support our research in tasks such as the implementation of research prototypes (for data validation, machine unlearning, feature stores, recommender systems), the collection, generation and preparation of training data or the analysis of program code with large language models. The students will also be listed as co-authors on the resulting scientific publications.
Requirements
Please consult the official job ad for details on how to apply.
Email: schelter [at] tu-berlin [dot] de
Technische Universität Berlin
FG Management of Data Science Processes
Sekr. TEL 9-2
Ernst-Reuter Platz 7
10587 Berlin
Germany
Responsibility under the German Press Law §55 Sect. 2 RStV:
Prof. Dr.-Ing. Sebastian Schelter