• DS3Lab @ ETH Zurich

A Research Group @ ETH Zurich

We are a computer science research group in the Systems Group, Department of Computer Science, ETH Zurich. Our collective dream as a research group is to eliminate the barrier between people and technology --- how can we provide state-of-the-art technology to those who can benefit from it without them having to understand all the technical details? As of today, our research spans from data management systems, modern hardware, machine learning systems and theory, to a range of domain applications such as biology, astronomy, and social sciences.

Ce Zhang, ce.zhang@inf.ethz.ch.

Data Sciences

How can machine learning enable new sciences and applications that make the world a better place?

Data Systems

How to build a learning system, over a collection of modern hardwares, that runs as efficient as possible?

Data Services

How to make machine learning as usable as possible, with maybe only a couple of clicks on the cloud?

Current Members (Hover over Pics for Bio)

David Dao

Shaoduo Gan

Shaoduo is a PhD student at DS3Lab since Sep. 2017. He is interested in automatic labelling and distributed training.

Nezihe Merve Gürel

Merve is a PhD student at DS3Lab with a research background in Information Theory. Currently, her research focuses on generalization of learning algorithms as well as building rigorous frameworks for efficient data acquisition, denoising and instrument calbration algorithms by incorporating computer systems aspects. Ensuring theoretical guarantees, she aims to advance computation methods used in scientific applications such as astronomy, proteomics and proteogenomics. Previously, she obtained her MSc in Computer and Communication Sciences of EPFL. During her MSc studies, she was hired by Information Theory Laboratory as research scholar.

Nora Hollenstein

Nora is a PhD candidate at DS3Lab with a background in Natural Language Processing (NLP). After an MSc in Artificial Intelligence from the University of Edinburgh she worked at IBM for a few years on various Watson projects in Germany and Switzerland. The focus of her work lies in enhancing NLP applications with cognitive data such as eye-tracking and EEG recordings. Her long-term goal is to reduce the amount of manual annotation work for training machine learning systems for NLP by passively supervising these systems with brain activity data.

Bojan Karlas

Bojan is a PhD student at DS3LAB currently working on building a scalable automated machine learning system. His prior ML experience includes building time-series prediction models from his master thesis at DS3LAB and automated speech processing with recurrent neutral networks from his internship at Logitech. He obtained his master's in computer science at EPFL Lausanne and bachelor's in software engineering at Belgrade University. He has 2 years of industry experience as a developer in Microsoft working on distributed database systems.

Susie Rao

With a background in Computational Linguistics and Natural Language Processing, Susie Rao is now working as a Data Science researcher at the Swiss Economic Institute (KOF) and a PhD candidate at the DS3Lab. Her research interests are dynamic network analysis, information extraction, natural language processing and their applicability to various domains such as applied economics, sociology, and biomedical. She has been working on the following research projects: pattern matching and record linkage in large databases (finalized), information extraction and categorization in legal text such as international investment treaties and court cases (finalized), data analytics and machine learning with Chinese clinical and patient health data (ongoing), topology and endogeneity of firm-product-trade network and discipline connectedness in the Web of Science (ongoing). Susie Rao graduated from the Institute of Computational Linguistics of the University of Zurich.

Johannes Rausch

Johannes Rausch is a PhD student at DS3Lab with a background in Computer Vision and applied Machine Learning. His research interests include visual document parsing systems, information retrieval and approaches to leverage large datasets in weakly-supervised training settings. Before joining ETH, he obtained his MSc in Computational Science and Engineering from Technical University of Munich and an Honors Degree in Technology Management from the Center for Digital Technology and Management in Munich. During his studies, he completed medical image processing projects at Stanford University (RSL) and Harvard University (BIDMC) as a visiting researcher. He previously worked for Siemens as well as the robotics start-up tacterion.

Cedric Renggli

Cedric is a PhD student at DS3Lab. He holds a bachelor degree from the Bern University of Applied Sciences and received his MSc in Computer Science from ETH Zurich in 2018. Cedric's main research interest lies in all kind of human interactions in a machine learning ecosystem beyond labeling. This spans from defining engineering principles, such as continuous integration (CI), to comparison-based/preferential optimization algorithms for tuning hyper-parameters and providing efficient methods for model-selection. Additionally, Cedric is working on different optimization techniques and systems for distributed machine learning algorithms.

Thomas Lemmin

Thomas is a postdoctoral fellow at the DS3Lab. He is interested in applying machine learning to biomedical research.

Ce Zhang

Ce is an Assistant Professor in Computer Science at ETH Zurich. He believes that by making data—along with the processing of data—easily accessible to non-CS users, we have the potential to make the world a better place. His current research focuses on building data systems to support machine learning and help facilitate other sciences. Before joining ETH, Ce was advised by Christopher Ré. He finished his PhD round-tripping between the University of Wisconsin-Madison and Stanford University, and spent another year as a postdoctoral researcher at Stanford. His PhD work produced DeepDive, a trained data system for automatic knowledge-base construction. He participated in the research efforts that won the SIGMOD Best Paper Award (2014) and SIGMOD Research Highlight Award (2015), and was featured in special issues including the Science magazine (2017), the Communications of the ACM (2017), “Best of VLDB” (2015), and the Nature magazine (2015).

Frances Ann Hubis

Frances joined the DS3Lab as a PhD student in September 2018 to investigate the flow of data between classical and quantum systems. She is interested in both optimization and verification of logical quantum circuits and machine-learning-based error-correction for fault-tolerant quantum computing. Frances was enrolled in the Interdisciplinary Science program at ETH and holds a B.Sc. degree with a focus on physical chemistry and information processing, and a M.Sc. degree with a focus on quantum systems and learning theory. She completed research projects on Bell-certified randomness expansion protocols and on scalable variational inference.

Open Positions

ETH Zurich is one of the top universities in Europe and the world. We, as part of the Systems Group, are always looking for top candidates for postdoc researchers, PhD, and Master’s thesis. Specifically, my group is looking for candidates who are excited about system research related to data management, machine learning, distributed systems, and computer architecture.

If you are interested, send an email to ce.zhang@inf.ethz.ch.


  • Nora Hollenstein and Jonathan Rotsztejn's system ranked first in the relation classification subtask among 28 international teams in SemEval 2018 (Task 7 Subtask 1) ! Their system also ranks top 1 and top 2 for two other relation extraction subtasks (Task 7 Subtask 2). The corresponding paper is also selected as the best paper of SemEval 2018 Task 7.

  • Our NIPS paper on decentralized learning is selected as oral presentation 40/3240 submissions <- Video hosted on YouTube might not be accessiable in some countries.

  • space.ml is featured in a News article in the Science magazine [link]. GalaxyGAN is selected as the Editor’s Choice in the Science magazine [link], the Atlantic [link], and WIRED Science [link].

Open Master Theses / 2018


  • Chen Yu, Hanlin Tang, Cedric Renggli, Simon Kassing, Ankit Singla, Dan Alistarh, Ce Zhang, Ji Liu. Distributed Learning over Unreliable Networks. ICML 2019.

  • Marc Fischer, Mislav Balunovic, Dana Drachsler-Cohen, Timon Gehr, Ce Zhang, Martin Vechev. DL2: Training and Querying Neural Networks with Logic. ICML 2019.

  • Cedric Renggli, Bojan Karlas, Bolin Ding, Feng Liu, Kevin Schawinski, Wentao Wu, Ce Zhang. Continuous Integration of Machine Learning Models: A Rigorous Yet Practical Treatment. SysML 2019.

  • Vojislav Dukic, Sangeetha Abdu Jyothi, Bojan Karlas, Muhsen Owaida, Ce Zhang, Ankit Singla. Is advance knowledge of flow sizes a plausible assumption? NSDI 2019.

  • Kaan Kara, Ken Eguro, Ce Zhang, Gustavo Alonso. ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation. PVLDB 2019.

  • Zeke Wang, Kaan Kara, Hantian Zhang, Gustavo Alonso, Onur Mutlu, and Ce Zhang. Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning VLDB 2019.

  • Nora Hollenstein and Ce Zhang. Entity Recognition at First Sight: Improving NER with Eye Movement Information. NAACL 2019.

  • Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Merve Gurel, Nick Hynes, Bo Li, Ce Zhang, Dawn Song, Costas J. Spanos. Towards Efficient Data Valuation Based on the Shapley Value. AISTATS 2019.

  • Chen Yu, Bojan Karlas, Jie Zhong, Ce Zhang, Ji Liu. AutoML from Service Provider's Perspective: Multi-device, Multi-tenant Model Selection with GP-EI AISTATS 2019.

  • Zhipeng Zhang, Bin Cui, Wentao Wu, Ce Zhang, Lele Yu, Jiawei Jiang. MLlib*: Fast Training of GLMs using Spark MLlib. ICDE (Industry) 2019.

  • Kaan Kara, Zeke Wang, Gustavo Alonso, Ce Zhang. doppioDB 2.0: Hardware Techniques for Improved Integration of Machine Learning into Databases. VLDB Demo 2019.

  • Cedric Renggli*, Frances Ann Hubis*, Bojan Karlaš, Kevin Schawinski, Wentao Wu, Ce Zhang. Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization. VLDB Demo 2019.

  • Hanlin Tang, Shaoduo Gan, Ce Zhang, Ji Liu. Communication Compression for Decentralized Training. NIPS 2018.

  • T Li, J Zhong, J Liu, W Wu, C Zhang. Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads. VLDB 2018.

  • Yu Liu, Hantian Zhang, Luyuan Zeng, Wentao Wu, Ce Zhang. MLBench: Benchmarking Machine Learning Services Against Human Experts. VLDB 2018.

  • J Jiang, B Cui, C Zhang, F Fu. DimBoost: Boosting Gradient Boosting Tree to Higher Dimensions. SIGMOD 2018.

  • Hanlin Tang, Xiangru Lian, Ming Yan, Ce Zhang, Ji Liu. D2: Decentralized Training over Decentralized Data. ICML 2018.

  • X Lian, W Zhang, C Zhang, J Liu. Asynchronous Decentralized Parallel Stochastic Gradient Descent. ICML 2018.

  • H Guo, K Kara, C Zhang. Layerwise Systematic Scan: Deep Boltzmann Machines and Beyond. AISTATS 2018.

  • Nora Hollenstein, Jonathan Rotsztejn, Marius Tröndle, Andreas Pedroni, Ce Zhang, and Nicolas Langer. ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading. Scientific Data 2018.

  • Maria Barrett, Joachim Bingel, Nora Hollenstein, Marek Rei and Anders Søgaard. Sequence classification with human attention. CoNLL 2018. (Special award for the best paper on research inspired by human language processing)

  • Jonathan Rotsztejn, Nora Hollenstein and Ce Zhang. ETH-DS3Lab at SemEval-2018 Task 7: Effectively Combining Recurrent and Convolutional Neural Networks for Relation Classification and Extraction. SemEval 2018. (SemEval Task 7 Best Paper; Top Ranked System)

  • D Grubic, L Tam, D Alistarh, C Zhang. Synchronous Multi-GPU Deep Learning with Low-Precision Communication: An Experimental Study. EDBT 2018.

  • Ivan Girardi, Pengfei Ji, An-phi Nguyen, Nora Hollenstein, Adam Ivankay, Lorenz Kuhn, Chiara Marchiori and Ce Zhang. Patient Risk Assessment and Warning Symptom Detection Using Deep Attention-Based Neural Networks. LOUHI 2018.

  • H Huang, C Zheng, J Zeng, W Zhou, S Zhu, P Liu, I Molloy, S Chari, C Zhang, Q Guan. A Large-scale Study of Android Malware Development Phenomenon on Public Malware Submission and Scanning Platform. IEEE Transactions on Big Data 2018.

  • Dominic Stark, Barthelemy Launet, Kevin Schawinski, Ce Zhang, Michael Koss, M Dennis Turp, Lia F Sartori, Hantian Zhang, Yiru Chen, Anna K Weigel. PSFGAN: a generative adversarial network system for separating quasar point sources and host galaxy light. Monthly Notices of the Royal Astronomical Society 2018.

  • Lia F Sartori, Kevin Schawinski, Benny Trakhtenbrot, Neven Caplar, Ezequiel Treister, Michael J Koss, C Megan Urry, Ce Zhang. A model for AGN variability on multiple time-scales. Monthly Notices of the Royal Astronomical Society 2018.

  • Sandro Ackermann, Kevin Schawinski, Ce Zhang, Anna K. Weigel, M. Dennis Turp. Using transfer learning to detect galaxy mergers. Monthly Notices of the Royal Astronomical Society 2018.

  • M. Dennis Turp, Kevin Schawinski, Ce Zhang. Exploring galaxy evolution with generative models. Astronomy and Astrophysics 2018.

  • Nino Antulov-Fantulin, Dijana Tolic, Matija Piskorec, Ce Zhang, Irena Vodenska. Inferring short-term volatility indicators from Bitcoin blockchain. ComplexNetworks 2018.

  • Bojan Karlas, Ji Liu, Wentao Wu, Ce Zhang. Ease.ml in Action: Towards Multi-tenant Declarative Learning Services. VLDB (Demo) 2018.

  • X Lian, C Zhang, H Zhang, CJ Hsieh, W Zhang, J Liu. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent. NIPS 2017. (Oral Presentation: 40/3240 submissions).

  • H Zhang, J Li, K Kara, D Alistarh, J Liu, C Zhang. The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning. ICML 2017.

  • L Yu, B Cui, C Zhang, Y Shao. LDA*: A Robust and Large-scale Topic Modeling System. VLDB 2017.

  • Z Zhang, Y Shao, B Cui, C Zhang. An experimental evaluation of simrank-based similarity search algorithms. VLDB 2017.

  • J Jiang, B Cui, C Zhang, L Yu. Heterogeneity-aware distributed parameter servers. SIGMOD 2017.

  • K M Owaida, H Zhang, G Alonso, C Zhang. Scalable Inference of Decision Tree Ensembles: Flexible Design for CPU-FPGA Platforms. FPL 2017.

  • K Kara, D Alistarh, G Alonso, O Mutlu, C Zhang. FPGA-accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-off. FCCM 2017.

  • K Schawinski, C Zhang, H Zhang, L Fowler, GK Santhanam. Generative Adversarial Networks recover features in astrophysical images of galaxies beyond the deconvolution limit. Monthly Notices of the Royal Astronomical Society 2017.

  • J Jiang, J Jiang, B Cui, C Zhang. TencentBoost: A Gradient Boosting Tree System with Parameter Server. ICDE (Industrial Track) 2017.

  • X Li, B Cui, Y Chen, W Wu, C Zhang. MLog: Towards Declarative In-Database Machine Learning. VLDB (Demo) 2017.

  • C Zhang, W Wu, T Li. An Overreaction to the Broken Machine Learning Abstraction: The ease.ml Vision. HILDA 2017.

  • H Huang, C Zheng, J Zeng, W Zhou, S Zhu, P Liu, S Chari, C Zhang. Android malware development on public malware scanning platforms: A large-scale data-driven study. IEEE Big Data 2016.

© 2018 DS3Lab @ ETH Zurich.