‘Data Pipelines for Science’ Winter School

How can researchers design and implement data pipelines for scientific research? Join our Winter School on 5-7 December 2022 to learn how to correctly, efficiently and robustly prepare your datasets for machine learning in your scientific projects.

Machine learning is an important  tool for researchers across disciplines. Scientists today have access to more data, from a greater range of sources and at greater speed than ever before, and opportunities to extract insights from this data using AI. But before deploying AI, researchers must have a data pipeline that transforms their data into a state that is suitable for the machine learning algorithms being used. 

These pipelines are important independent research outputs as they enable others to easily inspect, reproduce, refine or extend the scientist’s work. However, implementing data pipelines present numerous software challenges that might be difficult to resolve or even identify to scientists who do not have a significant expertise in software engineering concepts and practices. 

Such challenges include: how do I ensure the correctness of my pipeline? How do I structure my pipeline in a way that makes it easier for others to reuse and extend? How do I ensure my pipeline is robust enough to deal with different types and volumes of data? How do I document and publish my pipeline? How do I ensure my pipeline adheres to privacy and anonymisation constraints?

Accelerate Science’s 2022 Winter School will help scientists overcome such data pipeline challenges by equipping them with the latest best-practice software techniques. It will consist of a blend of lectures and labs, with a focus on discussing general principles and case-studies during the lectures, and a focus on hands-on exercises in Python during the labs. Participants will also have the opportunity to discuss and share data pipeline issues encountered in their own research with the course instructor and cohort, and to relate it to the course content.

The Winter School will take place from 5-7 December at the Intel Teaching Labs at the William Gates Building. If you’re a PhD student or researcher at the University of Cambridge and would like to apply to join the course, please complete the form at https://7kuzlokgqop.typeform.com/to/S2ZdyWFx by Monday 14 November, 17:00 (UK).

For further information, please read the FAQs at this link: https://acceleratescience.github.io/data-engineering-school