A look at datasScience at UC Berkeley
Data science: more than a buzzword
When Foundations of Data Science, or Data 8, first launched in 2015, there were less than 100 students in the class. Today, more than 1,600 students are enrolled in what has become a staple on UC Berkeley’s list of must-have undergraduate courses.
Although introduced as a stand-alone course, Data 8 has always been part of a much larger plan developed by staff and faculty across campus. According to current Data 8 Associate Professor and course co-creator John DeNero, the hope was that once students got through Data 8, they would ask themselves, “What do we take next?
That’s exactly what happened. The students, DeNero said, were enthusiastic about the course.
“It was clear to many of us that this content was foundational, something every student should know,” DeNero said. “It made sense that a lot of people would take it.”
According to DeNero, data science degrees were previously only offered at the graduate level. UC Berkeley developed a bachelor’s degree program when campus members realized how applicable the content was to any undergraduate education.
As UC Berkeley is located just east of Silicon Valley, the school has a tradition of producing leaders in the field of technology. However, despite the hype surrounding Data 8 and the data science major, faculty and students say there is more to data science than buzz.
It’s about drawing conclusions from data, said Kevin Miao. It’s about “making sense of the world,” said campus senior and Data 8, or uGSI, undergraduate Margaret Misyutina. For senior campus and Data 8 uGSI Varun Jadia, data science is about learning to think and asking the right questions.
“To me, data science means using what we know around us to make predictions about the future,” said Campus Junior Will Furtado, Head of Pedagogy for Data 8.
Accessibility and Representation: No Outliers in Data Science
For many, Data 8 is the catalyst that initiates a discovery of the world through data.
“Before enrolling in Data 8, I had no idea what data science was,” Misyutina said. “Data 8 was my first introduction to the field and I fell in love with it.”
Misyutina has now been on the Data 8 course staff for 8 semesters.
It’s a sentiment that reflects the journey of hundreds of students who take the course as their first exposure to computational and statistical methods.
Jadia took the course in her very first semester at Berkeley as her first coding class.
“That’s what kicked things off for me,” Jadia said.
Since then he has been involved in Data 8 as a mentor.
Aarushi Karandikar, senior in campus data science and Data 8 uGSI, arrived at UC Berkeley with the intention of following a pre-medical track. However, after following Data 8 his first semester, the “trajectory of (his) college career” changed. This is now Karandikar’s sixth semester on the course staff.
As a class that has no prerequisites and makes no assumptions about prior knowledge, Data 8 serves as an accessible introduction to technical concepts that can be applied to any area of a student’s interest. According to the Data 8 website, the course is designed specifically for students without prior statistics or computer knowledge.
Although the course is designed for students to bring their own interests and domain knowledge, learning how to apply new computer lenses can be the scariest part, added Miao, who is also an instructor in the Data Scholars program.
Sometimes it’s hard for students from a non-traditional background who are new to the field to “imagine themselves to be data scientists,” Miao noted.
This hesitation is one of the reasons the Data Scholars program was created. The program aims to provide mentorship and community to students from historically underrepresented groups.
Campus senior Carlos Ortiz, who helped develop Data Scholars, said he was pursuing a teaching position in data science to inspire minority students in the field.
“I want you to feel welcome and I want you to feel supported,” Ortiz said. “After taking Data 8, I was like, ‘I’ve never had Latino teaching assistants; I really want to be one. ”
A unique aspect of Data 8 — and the data science program as a whole — is the student-to-mentor pipeline, DeNero said.
Furtado came to Berkeley without knowing how to code. Now, he says, his favorite types of students to mentor in Data 8 are those without a technical background.
Being a mentor to underrepresented students new to the field has been instrumental in Ortiz’s experience at Berkeley — and a motivating factor in helping students from these groups succeed in data science.
“Mentoring these students is the best part of my week,” Ortiz said.
Ortiz said seeing Data Scholars grow from 30 students to more than 100 in the space of a year makes him feel “on top of the world.”
Ortiz noted that her goal with the program is for every underrepresented student to see themselves represented on the course staff.
“I want everyone to see someone on staff who they can relate to, whether it’s culture, race, ethnicity or transfer status,” Ortiz said. “It all matters.”
Finding Correlations: Data Science as a Bridge Between Majors
UC Berkeley undergraduates often categorize majors into two types: STEM and humanities or social sciences. One side includes computer science, math, and biology. Jump through to discover psychology, art and history.
The promise of data science: bridging the gap between humanities and STEM.
“Data science is a tool that can be used with anything else. For me, it’s econ,” said Alice Chen, Senior Data Science and Economics and Data 8 uGSI. “Some people intend to become data science majors, and others see it as a toolkit for their other disciplines.”
Data science can also be used for more fun activities. For example, the summer after taking Data 8, Furtado used what he learned in the course to glean insights from his Spotify listening history data spanning three years.
With the influx of students from all majors enrolling in the class, especially in recent semesters, Furtado said course faculty and staff have been focused on both making the course accessible to all and on the presentation of the many applications of the content.
A course project helps students understand the impacts of population growth. Another project studies climate change by analyzing historical temperature and precipitation data. Students also learn how to use screenplay data to predict movie genres.
According to Ortiz, integrating projects from very different disciplines requires a group of students with very different backgrounds and interests, united by the common thread of data. He is currently applying data science techniques in his thesis on understanding the relationship between race and ethnicity, population characteristics, and pollution.
“When you’re talking about data, the same people with the same point of view won’t get you anywhere,” Ortiz said. “But a diverse group of perspectives – that’s where you make a difference.”
Karandika, who is a human rights miner, is working on a project that aims to identify potential biases in judges’ decisions when granting asylum using data from the US Department of Immigration, a project she describes as “the intersection between data science and human rights”.
UC Berkeley also offers a slew of “connection courses,” designed to help students who have taken Data 8 connect the techniques they’ve learned to other concepts, according to the campus’ data science website. . These include everything from “race, policing and data science” to “data science for genetics and genomics”.
Linear growth: Berkeley as a model
Just 3 years ago in 2019, the very first cohort of undergraduate students to graduate with the Data Science degree numbered 82 students. The following year saw the completion of 438 majors and 89 minors. Last year, those numbers rose to 668 majors and 369 minors.
And it’s spreading. The “wild popularity” at Berkeley has inspired many colleges and universities across the country to adopt part of the campus data science curriculum into their own undergraduate programs, DeNero said.
One of the things that makes data science at UC Berkeley so special, Miao added, is that it was one of the first colleges in the nation to create an undergraduate data science major. Campus, he said, has the power to actively “set the tone” for what should be included in all data science curricula at all levels.
“Berkeley faculty has a long history of writing textbooks that are used around the world — the modern variation of that is creating a comprehensive course,” DeNero said. “Some universities want to incorporate it and they are more than happy to use our video lectures as the basis of their course and adapt it to suit their curriculum.”
DeNero noted that all Data 8 materials created by UC Berkeley — including the slides, online textbook, lecture videos, assignments, and projects — are freely available online.
Beyond Data 8, the broader Data Science curriculum also continues to develop new pathways for students, with courses such as Data 102, “Data, Inference, and Decisions” and Data 101, “Data Engineering.” , whose concepts were previously only taught at the graduate level, are being introduced, DeNero said.
Data 8 is also constantly evolving, Furtado said. Professors and course staff work together to improve and diversify the material from semester to semester, with the aim of drawing examples from “as many disciplines as possible”, he added.
Rita Wang, a fourth-year data science and computer science student who has been on staff for the Data 8 course every semester since she took it in first year, said she learns something new every iteration of the course.
What’s exciting for DeNero, as the program continues to expand, is the “reach and breadth” involved in creating such courses. This is not the job of a few teachers; rather, it’s a collaborative, campus-wide effort, he said.
“Data science is the place to be,” Ortiz said.
Lydia Sidhom is the principal academic and administrative journalist. Contact her at [email protected]and follow her on Twitter at @SidhomLydia .