• DS3Lab @ ETH Zurich

A Research Group @ ETH Zurich

We are a computer science research group in the Systems Group, Department of Computer Science, ETH Zurich. Our collective dream as a research group is to eliminate the barrier between people and technology --- how can we provide state-of-the-art technology to those who can benefit from it without them having to understand all the technical details? As of today, our research spans from data management systems, modern hardware, machine learning systems and theory, to a range of domain applications such as biology, astronomy, and social sciences.

Ce Zhang, ce.zhang@inf.ethz.ch.

Data Sciences

How can machine learning enable new sciences and applications that make the world a better place?

Data Systems

How to build a learning system, over a collection of modern hardwares, that runs as efficient as possible?

Data Services

How to make machine learning as usable as possible, with maybe only a couple of clicks on the cloud?

Current Members (Hover over Pics for Bio)

David Dao [Personal Homepage]

David is a PhD student at DS3Lab, building future AI systems that aim to advance sustainability and health. Before joining ETH Zurich, he was an autonomous driving engineer at Mercedes-Benz Research in Silicon Valley and a graduate student at the Broad Institute of MIT and Harvard. He is a Global Shaper at World Economic Forum and his research was awarded grants and prizes from NVIDIA, Microsoft, Mercedes-Benz and the United Nations and was featured in The Scientist and MIT Technology Review.

Shaoduo Gan

Shaoduo is a PhD student at DS3Lab. He is broadly interested in the theory and practice of deep learning and reinforcement learning, which, he believes, can lead us the way to Artificial General Intelligence. Before joining DS3Lab, he got his bachelor and master degree in computer science at National University of Defense Technology, China.

Nezihe Merve Gürel [Personal Homepage]

Merve is a PhD student at DS3Lab with a research background in Information Theory. Currently, her research focuses on generalization of learning algorithms as well as building rigorous frameworks for efficient data acquisition, denoising and instrument calbration algorithms by incorporating computer systems aspects. Ensuring theoretical guarantees, she aims to advance computation methods used in scientific applications such as astronomy, proteomics and proteogenomics. Previously, she obtained her MSc in Computer and Communication Sciences of EPFL. During her MSc studies, she was hired by Information Theory Laboratory as research scholar.

Nora Hollenstein [Personal Homepage]

Nora is a PhD candidate at DS3Lab with a background in Natural Language Processing (NLP). After an MSc in Artificial Intelligence from the University of Edinburgh she worked at IBM for a few years on various Watson projects in Germany and Switzerland. The focus of her work lies in enhancing NLP applications with cognitive data such as eye-tracking and EEG recordings. Her long-term goal is to reduce the amount of manual annotation work for training machine learning systems for NLP by passively supervising these systems with brain activity data.

Bojan Karlas

BIO

Susie Rao [Personal Homepage]

With a background in Computational Linguistics and Natural Language Processing, Susie Rao is now working as a Data Science researcher at the Swiss Economic Institute (KOF) and a PhD candidate at the DS3Lab. Her research interests are dynamic network analysis, information extraction, natural language processing and their applicability to various domains such as applied economics, sociology, and biomedical. She has been working on the following research projects: pattern matching and record linkage in large databases (finalized), information extraction and categorization in legal text such as international investment treaties and court cases (finalized), data analytics and machine learning with Chinese clinical and patient health data (ongoing), topology and endogeneity of firm-product-trade network and discipline connectedness in the Web of Science (ongoing). Susie Rao graduated from the Institute of Computational Linguistics of the University of Zurich.

Johannes Rausch

Johannes Rausch is a PhD student at DS3Lab with a background in Computer Vision and applied Machine Learning. His research interests include visual document parsing systems, information retrieval and approaches to leverage large datasets in weakly-supervised training settings. Before joining ETH, he obtained his MSc in Computational Science and Engineering from Technical University of Munich and an Honors Degree in Technology Management from the Center for Digital Technology and Management in Munich. During his studies, he completed medical image processing projects at Stanford University (RSL) and Harvard University (BIDMC) as a visiting researcher. He previously worked for Siemens as well as the robotics start-ups tacterion and Magazino.

Cedric Renggli [Personal Homepage]

Cedric is a PhD student at DS3Lab. He holds a bachelor degree from the Bern University of Applied Sciences and received his MSc in Computer Science from ETH Zurich in 2018. Cedric's main research interest lies in Scientific Data Management, with the goal of enabling scientist of various domain to store, administer and analyze their large-scale datasets more efficiently. Additionally, he is working on different optimization techniques and systems for distributed machine learning algorithms.

Thomas Lemmin

Thomas is a postdoctoral fellow at the DS3Lab. He is interested in applying machine learning to biomedical research.

Ce Zhang [Personal Homepage]

Ce is an Assistant Professor in Computer Science at ETH Zurich. He believes that by making data—along with the processing of data—easily accessible to non-CS users, we have the potential to make the world a better place. His current research focuses on building data systems to support machine learning and help facilitate other sciences. Before joining ETH, Ce was advised by Christopher Ré. He finished his PhD round-tripping between the University of Wisconsin-Madison and Stanford University, and spent another year as a postdoctoral researcher at Stanford. His PhD work produced DeepDive, a trained data system for automatic knowledge-base construction. He participated in the research efforts that won the SIGMOD Best Paper Award (2014) and SIGMOD Research Highlight Award (2015), and was featured in special issues including the Science magazine (2017), the Communications of the ACM (2017), “Best of VLDB” (2015), and the Nature magazine (2015).

Hantian Zhang

BIO

Open Positions

ETH Zurich is one of the top universities in Europe and the world. We, as part of the Systems Group, are always looking for top candidates for postdoc researchers, PhD, and Master’s thesis. Specifically, my group is looking for candidates who are excited about system research related to data management, machine learning, distributed systems, and computer architecture.


If you are interested, send an email to ce.zhang@inf.ethz.ch.

News


  • Nora Hollenstein and Jonathan Rotsztejn's system ranked first in the relation classification subtask among 28 international teams in SemEval 2018 (Task 7 Subtask 1) ! Their system also ranks top 1 and top 2 for two other relation extraction subtasks (Task 7 Subtask 2). The corresponding paper is also selected as the best paper of SemEval 2018 Task 7.

  • Our NIPS paper on decentralized learning is selected as oral presentation 40/3240 submissions <- Video hosted on YouTube might not be accessiable in some countries.

  • space.ml is featured in a News article in the Science magazine [link]. GalaxyGAN is selected as the Editor’s Choice in the Science magazine [link], the Atlantic [link], and WIRED Science [link].

Open Master Thesis / 2018

Publications

  • T Li, J Zhong, J Liu, W Wu, C Zhang. Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads. VLDB 2018.

  • Yu Liu, Hantian Zhang, Luyuan Zeng, Wentao Wu, Ce Zhang. MLBench: Benchmarking Machine Learning Services Against Human Experts. VLDB 2018.

  • J Jiang, B Cui, C Zhang, F Fu. DimBoost: Boosting Gradient Boosting Tree to Higher Dimensions. SIGMOD 2018.

  • Hanlin Tang, Xiangru Lian, Ming Yan, Ce Zhang, Ji Liu. D2: Decentralized Training over Decentralized Data. ICML 2018.

  • X Lian, W Zhang, C Zhang, J Liu. Asynchronous Decentralized Parallel Stochastic Gradient Descent. ICML 2018.

  • H Guo, K Kara, C Zhang. Layerwise Systematic Scan: Deep Boltzmann Machines and Beyond. AISTATS 2018.

  • Jonathan Rotsztejn, Nora Hollenstein and Ce Zhang. ETH-DS3Lab at SemEval-2018 Task 7: Effectively Combining Recurrent and Convolutional Neural Networks for Relation Classification and Extraction. SemEval 2018. (SemEval Task 7 Best Paper; Top Ranked System)

  • D Grubic, L Tam, D Alistarh, C Zhang. Synchronous Multi-GPU Deep Learning with Low-Precision Communication: An Experimental Study. EDBT 2018.

  • H Huang, C Zheng, J Zeng, W Zhou, S Zhu, P Liu, I Molloy, S Chari, C Zhang, Q Guan. A Large-scale Study of Android Malware Development Phenomenon on Public Malware Submission and Scanning Platform. IEEE Transactions on Big Data 2018.

  • Dominic Stark, Barthelemy Launet, Kevin Schawinski, Ce Zhang, Michael Koss, M Dennis Turp, Lia F Sartori, Hantian Zhang, Yiru Chen, Anna K Weigel. PSFGAN: a generative adversarial network system for separating quasar point sources and host galaxy light. Monthly Notices of the Royal Astronomical Society 2018.

  • Lia F Sartori, Kevin Schawinski, Benny Trakhtenbrot, Neven Caplar, Ezequiel Treister, Michael J Koss, C Megan Urry, Ce Zhang. A model for AGN variability on multiple time-scales. Monthly Notices of the Royal Astronomical Society 2018.

  • Sandro Ackermann, Kevin Schawinski, Ce Zhang, Anna K. Weigel, M. Dennis Turp. Using transfer learning to detect galaxy mergers. Monthly Notices of the Royal Astronomical Society 2018.

  • Bojan Karlas, Ji Liu, Wentao Wu, Ce Zhang. Ease.ml in Action: Towards Multi-tenant Declarative Learning Services. VLDB (Demo) 2018.

  • X Lian, C Zhang, H Zhang, CJ Hsieh, W Zhang, J Liu. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent. NIPS 2017. (Oral Presentation: 40/3240 submissions).

  • H Zhang, J Li, K Kara, D Alistarh, J Liu, C Zhang. The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning. ICML 2017.

  • L Yu, B Cui, C Zhang, Y Shao. LDA*: A Robust and Large-scale Topic Modeling System. VLDB 2017.

  • Z Zhang, Y Shao, B Cui, C Zhang. An experimental evaluation of simrank-based similarity search algorithms. VLDB 2017.

  • J Jiang, B Cui, C Zhang, L Yu. Heterogeneity-aware distributed parameter servers. SIGMOD 2017.

  • K M Owaida, H Zhang, G Alonso, C Zhang. Scalable Inference of Decision Tree Ensembles: Flexible Design for CPU-FPGA Platforms. FPL 2017.

  • K Kara, D Alistarh, G Alonso, O Mutlu, C Zhang. FPGA-accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-off. FCCM 2017.

  • K Schawinski, C Zhang, H Zhang, L Fowler, GK Santhanam. Generative Adversarial Networks recover features in astrophysical images of galaxies beyond the deconvolution limit. Monthly Notices of the Royal Astronomical Society 2017.

  • J Jiang, J Jiang, B Cui, C Zhang. TencentBoost: A Gradient Boosting Tree System with Parameter Server. ICDE (Industrial Track) 2017.

  • X Li, B Cui, Y Chen, W Wu, C Zhang. MLog: Towards Declarative In-Database Machine Learning. VLDB (Demo) 2017.

  • C Zhang, W Wu, T Li. An Overreaction to the Broken Machine Learning Abstraction: The ease.ml Vision. HILDA 2017.

  • H Huang, C Zheng, J Zeng, W Zhou, S Zhu, P Liu, S Chari, C Zhang. Android malware development on public malware scanning platforms: A large-scale data-driven study. IEEE Big Data 2016.

© 2018 DS3Lab @ ETH Zurich.