The information and data sciences are concerned with the acquisition, storage, communication, processing, and analysis of data. These intellectual activities have a long history, and Caltech has traditionally occupied a position of strength with faculty spread out across applied mathematics, electrical engineering, computer science, mathematics, physics, astronomy, economics, and many others disciplines. In the last decade, there has been a rapid increase in the rate at which data are acquired with the objective of extracting actionable knowledge—in the form of scientific models and predictions, business decisions, and public policies. From a technological perspective, this rapid increase in the availability of data creates numerous challenges in acquisition, storage, and subsequent analysis. More fundamentally, humans cannot deal with such a volume of data directly, and it is increasingly essential that we automate the pipeline of information processing and analysis. All areas of human endeavor are affected: science, medicine, engineering, manufacturing, logistics, the media, entertainment. The range of scenarios that concern a scientist in this domain are very broad—from situations in which the available data are nearly infinite (big data), to those in which the data are sparse and precious; from situations in which computation is, for all practical purposes, an infinite resource to those in which it is critical to respond rapidly and computation must thus be treated as a precious resource; from situations in which the data are all available at once to those in which they are presented as a stream.
As such, the information and data sciences now draw not just upon traditional areas spanning computer science, applied mathematics, and electrical engineering—signal processing, information and communication theory, control and decision theory, probability and statistics, algorithms—but also a range of new contemporary topics such as machine learning, network science, distributed systems, and neuroscience. The result is an area that is new, fundamentally different that related areas like computer science and statistics, and that is crucial to modern applications in the physical sciences, social sciences, and engineering.
The Information and Data Sciences (IDS) option is unabashedly mathematical, focusing on the foundations of the information and data sciences, across its roots in probability, statistics, linear algebra, and signal processing. These fields all contribute crucial components of data science today. Further, it takes advantage of the interdisciplinary nature of Caltech by including a required set of application courses where students will learn about how data touches science and engineering broadly. The flexibility provided by this sequence allows students to see data science in action in biology, economics, chemistry, and beyond.
In addition to a major, the IDS option offers a minor that focuses on the mathematical foundations of the information and data sciences but recognizes the fact that many students in other majors across campus have a need to supplement their options with practical training in data science.
IDS Option Requirements
- Computer Science Fundamentals. CS 1 or CS 1X; CS 2; CS 21 or Ma/CS 6c; and CS 38.
- Mathematical Fundamentals. Ma 2; Ma 3 or Ma/ACM/IDS 140a; Ma 108a; (CS13 or Ma/CS 6a or Ma 121a); (Ma/CS 6b or Ma 121b). The analytical tracks of Ma1bc are strongly recommended.
- Scientific Fundamentals. 18 units selected from the following courses: BE/Bi 25, BE 153, Bi 8, Bi 9, Bi 117, Ch 21abc, Ch 24, Ch 25, Ch 41abc, Ph 2abc, or Ph 12abc. Advanced 100+ courses in Bi, Ch, or Ph with strong scientific component can be used to satisfy this requirement with approval from the option representative, but cannot simultaneously be used to satisfy the “Applications Elective” requirement or the “Advanced Electives” requirement.
- Communication Fundamentals. SEC 10; and one of SEC 11-13.
- Information and Data Science Core Requirements.
- Linear Algebra: ACM/IDS 104; ACM 106a.
- Probability: ACM/EE/IDS 116.
- Statistics: IDS/ACM/CS 157.
- Machine Learning: CMS/CS/CNS/EE/IDS 155 or CS
- CNS/EE 156a.
- Signal Processing: EE/IDS 111 or ACM/EE/IDS 170.
- Information Theory: EE/IDS 160
- Applications Electives. At least 18 units from the following list: Ay 119, BE/Bi 103 ab, BE/Bi 205,Bi/CNS/NB 162, Bi/BE/CS 183, CNS/Bi/EE/CS/NB 186, ME/CS/EE 133b, ME/CS/EE 134, EE/CNS/CS 148, Ec/ACM/CS 112, Ec 122, Ec/PS 124, ESE 136, Fs/Ay 3, Fs/Ph 4, Ge/Ay 117, Ge 165, HPS/Pl/CS 110, SS 228. Other courses that include applications of data science may be substituted with approval from the option representative. Courses used to fulfill this requirement may not also be used to fill the any requirement above
- Advanced Electives. At least 54 units from the following list: IDS courses numbered 100 or above, CS/CNS/EE 156ab, ACM 106b, ACM 95/100ab, CMS/ACM/EE 122, CS 115, Ma 112 ab. Courses used to fulfill this requirement may not also be used to fill the any requirement above.
Courses used to fulfill requirements in the “Applications of Data Science” and “Advanced Electives” requirements cannot be used to fulfill the institute humanities and social sciences requirements.
Units used to fulfill the Institute Core requirements do not count toward any of the option requirements. Pass/fail grading cannot be elected for courses taken to satisfy option requirements. Passing grades must be earned in total of 486 units, including all courses used to satisfy the above requirements.
IDS Double Majors
Students interested in simultaneously pursuing a degree in a second option must fulfill all the requirements of the Information and Data Sciences option. Courses may be used to simultaneously fulfill requirements in both options. However, it is required that students have at least 54 units of ”Advanced Electives” and 18 units of ”Applications of Data Science” that are not simultaneously used for fulfilling a requirement of the second option, i.e., the requirements of the Advanced Electives and the Applications of Data Science sections must be fulfilled using courses that are not simultaneously used for fulfilling a requirement of the second option. Any proposal to replace these courses must be discussed with the option administrator. To enroll in the program, the student should meet and discuss their plans with the option representative. In general, approval is contingent on good academic performance by the student and demonstrated ability for handling the heavier course load.
IDS Typical Course Schedule
Units per term | ||||
1st | 2nd | 3rd | ||
Second Year | ||||
CS 1 | Intro. to Computer Programming | 9 | - | - |
CS 2 | Intro. to Programming Methods | - | 9 | - |
CS 38 | Algorithms | - | - | 9 |
Ma 2 | Differential Equations | 9 | - | - |
Ma 3 | Intro. to Probability and Statistics | - | 9 | - |
Ma/CS 6 ab | Intro. to Discrete Methods | 9 | 9 | - |
ACM/IDS 104 | Applied Linear Algebra | 9 | - | - |
HSS Electives | 9 | 9 | 9 | |
Scientific Fundamentals | - | 9 | 9 | |
Other Electives | - | - | 9 | |
Total | 45 | 45 | 36 | |
Third Year | ||||
SEC 10 | Technical Seminar Presentations | - | 3 | - |
CMS/CS/CNS/EE/IDS 155 | Machine Learning & Data Mining | - | 12 | - |
One of
SEC 11-13 |
Written Communication | - | - | 3 |
Ma 108 a | Classical Analysis | 9 | - | - |
EE/IDS 111 | Signal-Processing Systems and Transforms | 9 | - | - |
IDS/ACM/CS 157 | Statistical Inference | - | - | 9 |
ACM/EE/IDS 116 | Intro. to Probability Models | 9 | - | - |
HSS Electives | 9 | 9 | 9 | |
Advanced Electives | 9 | 9 | 9 | |
Applications Electives | - | 9 | - | |
Other Electives | - | - | 9 | |
Total | 45 | 42 | 39 | |
Fourth Year | ||||
ACM/EE 106 a | Intro. Methods of Computational Math. | 12 | - | - |
EE/IDS 160 | Fundamentals of Information Transmission and Storage | - | 9 | - |
Advanced Electives | 9 | 9 | 9 | |
Applications Electives | 9 | 9 | - | |
HSS Electives | 9 | 9 | 9 | |
Other Electives | 9 | 9 | 18 | |
Total | 48 | 45 | 36 |
IDS Advising
Starting in the sophomore year IDS students will be assigned a faculty adviser whom they should meet with regularly, typically once per quarter. Students in the program are advised by faculty interested in the information and data sciences from across the institute. This includes all the CMS faculty, as well as the following faculty that pursue data science-related research and participate in IDS advising: Mike Alvarez, Justin Bois, Fernando Brandao, Jaksa Cvitanic, Frederick Eberhardt, Babak Hassibi, Jonathan Katz, Victoria Kostina, Kirby Nielsen, Pietro Perona, Antonio Rangel, Vikram Ravi, Mikhail Shapiro, Mark Simons, Matt Thomson, Zhongwen Zhan. Students seeking an IDS adviser should contact the undergraduate option secretary at [email protected].
IDS Minor Requirements
- Computer Science Fundamentals. CS1 or CS 1X; CS2; CS21 or Ma/CS6c; and CS38.
- Mathematics Fundamentals. Ma 3 or Ma/ACM/IDS 140a; (CS13 or Ma/CS 6a or Ma 121a).
- Information and Data Science Core Requirements.
- Linear Algebra: ACM/IDS 104.
- Probability: ACM/EE/IDS 116.
- Statistics: IDS/ACM/CS 157.
- Machine learning: CMS/CS/CNS/EE/IDS 155 or CS/CNS/EE 156a.
- Signal Processing: EE/IDS 111 or ACM/EE/IDS 170.
- Applications Electives. At least 9 units from the following list: Ay 119, BE/Bi 103 ab, BE/Bi 205, Bi/CNS/NB 162, Bi/BE/CS 183, CNS/Bi/EE/CS/NB 186, ME/CS/EE133b, ME/CS/EE 134, EE/CNS/CS 148, Ec/ACM/CS 112, Ec 122, Ec/PS 124, ESE 136, Fs/Ay 3, Ge/Ay 117, Ge 165, HPS/Pl/CS 110, SS 228. Other courses that include applications of data science may be substituted with approval from the option representative.
- Advanced Electives. At least 9 units from the following list: IDS courses numbered 100 or above, CS/CNS/EE 156ab, ACM 106b, ACM 95/100ab, CMS/ACM/EE 122, CS 115, Ma 112 ab. Courses used to fulfill this requirement may not also be used to fill the any requirement above.
Courses used to fulfill requirements in the “Applications of Data Science” and “Advanced Electives” requirements cannot be used to fulfill (i) a requirement for another major or minor; or (ii) the institute humanities and social sciences requirements. Any replacement of these courses must be discussed with the option administrator.
Pass/fail grading cannot be elected for courses taken to satisfy option requirements. Courses taken as part of the data science minor are counted toward the total 486 units needed for Institute graduation requirements.
Typical course schedule
A typical course sequence is to take CS 1 during the first year; Ma/CS 6a, Ma 3, CS2 and CS38 during sophomore year; ACM/EE/IDS 116, ACM/IDS 104, CMS/CS/CNS/EE/IDS 155, and IDS/ACM/CS 157 during junior year; and EE/IDS 111 and the elective courses during senior year.