IBM’s Open Source Data Science Curriculum

Ana Maria Echeverri, AI Skills Learning and Certifications Lead at IBM, joined the Enrollment Growth University podcast to talk about how higher ed can spin up new programs faster, and the open source data science curriculum kit they’re building together with University of Pennsylvania and the Linux Foundation.

An Overview of IBM’s Open-source Data Science Curriculum Kit

It is no secret that data science is a field of tremendous growth and great job opportunities. And although academic institutions have been responding to that demand, creating bachelor’s and master’s programs, what we see in the market is that there is still a shortage of skills. 

A couple of years ago, IBM started thinking about finding a way to accelerate the creation of data science programs. They decided to work with the University of Pennsylvania to build a curriculum kit that colleges and universities around the world could use to get their programs started without having to build them from scratch.

Professors created the curriculum kit, and IBM launched it to the market as an open-source project so that other universities’ faculty members could contribute by donating additional modules. This approach makes it easy for any academic institution that’s in the process of building data science programs to leverage these materials as their builder programs.

The initial kit has 15 different modules focused on data science and machine learning. Every module has a slide deck, Jupyter Notebooks, and links to open datasets. It’s all built using Python, so this is not an IBM product.

Academic Governance Within an Open-source Curriculum

IBM’s goal is for institutions to be able to spin up new data science programs much faster than they currently are, not having to reinvent the entire wheel every time on their own. But individual institutions and faculty will still have the flexibility to customize the curriculum enough to feel like and truly have academic ownership of their program.

Regarding governance, IBM partners with the Linux Foundation to go over the project. Linux is in fact the entity behind ensuring that the project has a technical steering committee driving it. 

Furthermore, any academic institution can either take the whole thing or they can take modules that they feel may fit individual or specific needs. Everything has been published under creative commons license with attribution. Hence, whoever takes a module can make changes to it as long as they give attribution to the fact that this came out of the OpenDS4All project. So, some universities will use the materials as they are. Others will just take the kit as a starting point and make it their own.

Is this Program a Feeder for IBM?

“We at IBM are continuously looking for new talent with data science expertise,” Ana Maria said. “This is one of our really hot fields. Especially those people that are using Python, because Python has become kind of the defacto standard for data science, and that’s where our own needs and our client’s needs tend to be.” 

Schools, their students, upcoming graduates, and alumni can search the IBM career site for data science positions. They can also create notifications so that they’re alerted when new positions are created. Internship opportunities are also posted on university campuses or solutions used at universities, such as Handshake. 

IBM has apprenticeship programs for people without expertise, including a paid program that lasts two years. During that time, IBM provides education, mentorship, and real-world expertise. And at the end of the two years, the participants get a certificate from the Department of Labor indicating that they are now ready to work as junior data scientists.

How Will Data Science Education Change at the University Level?

“In the long term, I think data science, even if it’s as a continuous existing, as a stand-alone program, is going to become embedded in any other program,” Ana Maria said. “Biology, chemistry, all of these fields that have data that are completely irrelevant to their space will end up embedding some of these data science foundational elements.” 

Some modules may be applicable to literally any program. So, you could take the introductory module or the ethics module and embed them into any programs. Other modules may require additional mathematical foundation, such as probability, inferential statistics, linear algebra, and, in technical skills, or having the ability to code Python.

Next Steps for Institutions that Want to Try This

Go to the GitHub repository for OpenDS4All. There, you can find guidance for instructors and a sample curriculum along with additional resources. If you want to talk to someone, the best way is also through the GitHub page. There’s an area where you can open an issue, because that’s kind of the standard communication behavior on a GitHub platform.

“We have a technical that’s formed from data science faculty from different universities around the United States,” Ana Maria told us. “And we’re constantly monitoring this to see which requests we get, and we’ll be happy to talk to you and provide additional guidance.” 

IBM has done a lot of work around defining the skills competencies for the data science role, so the company has a very granular definition of what skills are required and what skills an individual needs to develop in order to be and become successful data scientists. So, there are a lot of collaboration opportunities.


This post is based on a podcast interview with Ana Maria Echeverri from IBM. To hear this episode, and many more like it, you can subscribe to Enrollment Growth University.

If you don’t use iTunes, you can listen to every episode here.