Registration now open for next Data Pipelines for Science School

09 June 2023

Well-curated and managed data is central to the effective use of AI, in science and elsewhere. How can scientists build the data pipelines they need to accelerate their research with AI?

Machine learning is an important tool for researchers across disciplines. Scientists today have access to more data, from a greater range of sources and at greater speed than ever before, and opportunities to extract insights from this data using AI. But before deploying AI, researchers must have a data pipeline that transforms their data into a state that is suitable for the machine learning algorithms being used.

We are pleased to launch registration for our next Data Pipelines School, following the success of our Winter and Spring Schools. The School will cover topics including automating, testing and publishing pipelines. The school will take place in person at the Department of Computer Science and Technology from Monday 25th – Tuesday 26th September. Over the two days the course will equip researchers with the skills and techniques to overcome data pipeline challenges.

Over 70 participants from 30 University Departments have taken part in the previous Data Pipelines Schools learning how to build the data pipelines they need to accelerate their research with AI.

PhD student Linying Shang from the Yusuf Hamied Department of Chemistry said that knowledge from the Winter School has given her the knowledge to manage her data going forward: “the Data Pipeline Winter School talked us through every aspect of the data pipeline, from why it’s important to how to implement and publish it. I’m dealing with a rather small dataset now, but knowing what to consider about feature engineering from the very beginning will be beneficial for ensuring data consistency and making it easier for handling large dataset in the long run.”

“the Data Pipeline Winter School talked us through every aspect of the data pipeline, from why it’s important to how to implement and publish it. I’m dealing with a rather small dataset now, but knowing what to consider about feature engineering from the very beginning will be beneficial for ensuring data consistency and making it easier for handling large dataset in the long run.”

Linying Shang, PhD Student, Yusuf Hamied Department of Chemistry

The Data Pipelines School is open to students and researchers from the University of Cambridge who work with large datasets in their research and are interested in making the transition to data science-led research, but do not have significant expertise in software or data engineering.

Further information and registration details for the Data Pipelines School are available here, with registration closing on Friday 8th September.