The information and data sciences are concerned with the acquisition, storage, communication, processing, and analysis of data. These intellectual activities have a long history, and Caltech has traditionally occupied a position of strength with faculty spread out across applied mathematics, electrical engineering, computer science, mathematics, physics, astronomy, economics, and many others disciplines. In the last decade, there has been a rapid increase in the rate at which data are acquired, with the objective of extracting actionable knowledge—in the form of scientific models and predictions, business decisions, and public policies. From a technological perspective, this rapid increase in the availability of data creates numerous challenges in acquisition, storage, and subsequent analysis. More fundamentally, humans cannot deal with such a volume of data directly, and it is increasingly essential that we automate the pipeline of information processing and analysis. All areas of human endeavor are affected: science, medicine, engineering, manufacturing, logistics, the media, entertainment. The range of scenarios that concern a scientist in this domain are very broad—from situations in which the available data are nearly infinite (big data), to those in which the data are sparse and precious; from situations in which computation is, for all practical purposes, an infinite resource to those in which it is critical to respond rapidly and computation must thus be treated as a precious resource; from situations in which the data are all available at once to those in which they are presented as a stream.
As such, the information and data sciences now draw not just upon traditional areas spanning computer science, applied mathematics, and electrical engineering—signal processing, information and communication theory, control and decision theory, probability and statistics, algorithms—but also a range of new contemporary topics such as machine learning, network science, distributed systems, and neuroscience. The result is an area that is new, fundamentally different from related areas like computer science and statistics, and that is crucial to modern applications in the physical sciences, social sciences, and engineering.
The Information and Data Sciences (IDS) option is unabashedly mathematical, focusing on the foundations of the information and data sciences, across its roots in probability, statistics, linear algebra, and signal processing. These fields all contribute crucial components of data science today. Further, it takes advantage of the interdisciplinary nature of Caltech by including a required set of application courses where students will learn about how data touches science and engineering broadly. The flexibility provided by this sequence allows students to see data science in action in biology, economics, chemistry, and beyond.
In addition to a major, the IDS option offers a minor that focuses on the mathematical foundations of the information and data sciences but recognizes the fact that many students in other majors across campus have a need to supplement their options with practical training in data science.