How can we …. use AI to help formulate greener shampoo?

Aniket Chitre, PhD student, Department of Chemical Engineering and Biotechnology

12 December 2022


Accelerate spark data science residency

Traditional ingredients in everyday products such as shampoo are getting eco-conscious consumers in a lather. However, transitioning to using greener feedstocks or chemicals from a cleaner synthetic route and changing the formula is far from simple.

My research centres around developing models to better predict the properties of formulations, so they could one day be used to design better, more environmentally-friendly products that also have the potential to reach shop shelves more quickly.

Formulating a solution

Having developed an interest in machine learning during the process systems engineering and optimisation modules of my undergraduate degree, I decided to combine my interest in this and the chemical sciences for my PhD, resulting in a day-to-day life mixing molecular modelling and experiments.

I’m working on an industrially-funded project on using hybrid modelling, the combination of data-driven and physical models, to accelerate liquid formulations development. Predicting the properties of the molecules industry wants to make - or formulations a priori - is often slow and inefficient. Currently, they use traditional chemists with a lot of domain knowledge, experience and expertise, who manually fiddle with formulations of products.

Enabling data to bubble up

Computational property prediction is one way to accelerate this painstaking process, by giving formulation experts a better toolkit with which to make eco-friendly substitutions.

The majority of ingredients in shampoo are derived from petrochemicals. The main two classes of ingredients in consumer products, and the ones I’m investigating, are surfactants and polymers in liquid formulations (PLFs). PLFs are typically used as thickeners, emulsifiers and binders in shampoos as well as other products from paints and coatings to agriculture. They are likely to enter the environment as they pass through wastewater treatment plants at the end of their life, damaging the environment and wasting a valuable resource.

I’m investigating three properties for a variety of consumer liquid formulations: phase stability related to shelf life; turbidity, or cloudiness, related to product appearance and viscosity, related to texture. Making sustainable refinements to formulations is big business, with the PLFs alone worth $125.2 billion to the global economy. I’m currently carrying out high throughput experiments in Singapore, splitting my time between Cambridge CARES and the Institute of Materials Research and Engineering, A*STAR. I’m focusing on lab automation and building my own high throughput setup, in order to generate the dataset that I will train my property prediction models on and which the community lacks.

Taking part in the Accelerate Spark Program helped me hone the skills I need to curate a dataset of molecular descriptors for the ingredients. The current state-of-the-art is purely statistical, black-box machine learning models capable of being used to optimise formulations, however, these models are only specific to the ingredients they are trained on, so they have no deep chemical understanding of molecules. I plan to extend this through the incorporation of domain knowledge & simulations into the descriptors and models. The goal of this would be to offer formulators a way of pinpointing ingredients that would give them the qualities they are looking for in a shampoo, for example.

A recipe for success

Having completed the summer school, I’m now taking part in the Accelerate Spark Machine Learning Academy to advance my machine learning skills. In particular, the hands-on experience with the practicals and assignments is invaluable in exposure to lots of different techniques and handling datasets of different sizes and complexities. They say machine learning can be as much an art as a science and this experience is transferable knowledge to tuning the models for my own research.

I have learned a lot about lab automation as well as building hardware from scratch. A combination of the two experimental and computational fields has also resulted in some interesting side-projects along the way to having the desired workflow running. While the actual design of greener shampoo will be left back with the industrial partners as an extension of my work, I hope to have elucidated the relevant molecular descriptors to generally predict liquid formulation properties by the end of my PhD.

Aniket took part in our Data Science Residency in 2021, you can find details about the course here. Please get in touch by emailing if you are interested in attending a future Residency.