Data Pipelines for Science Spring School – Registration now open

21 February 2023

Well-curated and managed data is central to the effective use of AI, in science and elsewhere. How can scientists build the data pipelines they need to accelerate their research with AI?

Machine learning is an important tool for researchers across disciplines. Scientists today have access to more data, from a greater range of sources and at greater speed than ever before, and opportunities to extract insights from this data using AI. But before deploying AI, researchers must have a data pipeline that transforms their data into a state that is suitable for the machine learning algorithms being used.

Following the successful the Winter School, we are pleased to launch registration for our Spring School. The Spring School will cover topics including automating, testing and publishing pipelines. The school will take place in person at the Department of Computer Science and Technology from Wednesday 29th – Thursday 30th March. Over the two days the course will equip researchers with the skills and techniques to overcome data pipeline challenges.

Over 30 participants from 16 University Departments joined the Winter School learning how to build the data pipelines they need to accelerate their research with AI.

Senior Machine Learning Engineer Dr Ahmad Abu-Khazneh, part of the team delivering the course said “From my experience working closely with scientists across disciplines over the last few years data pipelines issues are often one of the main causes delaying or stalling research projects that make use of machine learning. The purpose of this school is to equip scientists with best-practice software engineering principles and hands-on approaches that can help them identify and resolve data pipelines issues on their own. Moreover, well-engineered data pipelines that are published and shared correctly are sometimes surprisingly more impactful and valuable to the research community as reproducible artefacts than any single result that the pipelines were originally implemented to generate.”

PhD student Linying Shang from the Yusuf Hamied Department of Chemistry said that knowledge from the Winter School has given her the knowledge to manage her data going forward: “the Data Pipeline Winter School talked us through every aspect of the data pipeline, from why it’s important to how to implement and publish it. I’m dealing with a rather small dataset now, but knowing what to consider about feature engineering from the very beginning will be beneficial for ensuring data consistency and making it easier for handling large dataset in the long run.”

“the Data Pipeline Winter School talked us through every aspect of the data pipeline, from why it’s important to how to implement and publish it. I’m dealing with a rather small dataset now, but knowing what to consider about feature engineering from the very beginning will be beneficial for ensuring data consistency and making it easier for handling large dataset in the long run.”

Linying Shang, PhD Student, Yusuf Hamied Department of Chemistry

The Spring School is open to students and researchers from the University of Cambridge who work with large datasets in their research and are interested in making the transition to data science-led research, but does not have significant expertise in software or data engineering.

Further information and registration details for the Spring School are available here, with registration closing on Friday 10th March.