Our collective dream as a research group is to eliminate the barrier between people and technology—how can we provide state-of-the-art technology to those who can benefit from it without them having to understand all the technical details? This is far from a new dream—a 1970 version of this, in Codd’s words, is that “future users of large data banks must be protected from having to know how the data is organized in the machine.” Four decades later, our research goal is to prevent users of machine learning systems from needing to know how machine learning is physically executed.
As of today, we are still far from scratching the surface of this goal. In our experiment of providing a modern machine learning system such as TensorFlow to power users such as astronomists, the number of physical decisions the users need to make significantly slows down their progress. To unleash the full potential of machine learning, we need to make the abstraction much more accessible than it is right now.
Why machine learning? We focus on machine learning for two reasons:
- We believe machine learning is the latest technology that we should bring to our users. If machine learning can be as ubiquitous as modern database systems, it will, in our opinion, enable the next wave of advancement for science and humanity.
- We believe that simply building syntax sugars for existing machine learning and data systems and is not enough to make machine learning accessible. To allow users who speak only rudimentary Python and SQL to take advantage of machine learning is an open research question in determining the correct level of abstraction to provide to users.
This defines the theme of the research we are doing.
Research Thrust 1 (Application Layer): Data-Driven Sciences. We put great emphasis on working with actual users and helping them with applications that can make a real difference in their daily jobs. We closely collaborate with users from other domains: astronomers, biologists, social scientists, meteorologists, users from the private sector, and even computer scientists without a machine learning background. We work together with our users—learning their domains, their jargon, and their dreams—and try to apply state-of-the-art machine learning to assist their applications. Applications we helped to build have been reported by media such as Science (Editor’s Choice), the Atlantic, and WIRED Science, and have been used to enable cutting-edge domain sciences.
Research Thrust 2 (Physical Layer): Data Systems on Modern Infrastructure. Applications we helped build provide inspiration and motivation to build better, faster, more scalable, and more energy-efficient data systems. Our research focuses on developing new methods to close the gap between state-of-the-art algorithms for machine learning and emerging hardware infrastructures such as GPGPU, FPGA, and massive data centers with fast interconnect. Focusing on machine learning allows us to develop new methods that often have much weaker guarantees in other workloads. Examples include decentralizing communications and dealing with heterogeneity over hundreds of machines, as well as lowering the precision of computation and data representation to further speed up machine learning on modern hardware (e.g., FPGA).
Research Thrust 3 (Logical Layer): Transparent and Declarative Data Services. The diversity of applications and the sophistication of systems call for a new logical layer to protect users who work in the application layer from needing to know how the physical layer works. Our research focuses on the usability angle—how do we provide a service-like abstraction for machine learning so that our users can work in a more declarative fashion? The first version of our vision attempts to bridge the gap with a declarative DSL for deep learning; and in the second version we are developing new methods for fully automatic model selection and hyperparameter tuning. This endless iterative development process will continue as we make progress on both applications and systems, and we hope the level of abstraction will keep raising in this process.