According to a meta study by Italo-Austrian scholars, less than half of AI models could be reproduced with reasonable effort. A similar study on Machine Learning for Health (ML4H) applications, also reports poor reproducibility outcomes. Now, what exactly is reproducibility?
It is a broad term to establish scientific validity and verification of research results. Some of the commonly used terms are reproducibility, replicability, reliability, robustness, and generalisability. The two main terms we often hear are:
- Reproducibility: Duplication of the results of a prior study using the same elements as were used by the said study. That is, one may use the same raw data to build the same analysis files and implement the same statistical analysis (or methodology) to yield the same results;
- Replication: Duplication of the results of a prior study if the same procedures are followed but new data are collected. It is also known as repeatability.
So one might ask, isn’t all this about robustness of scientific experiments and trust in science, and why are we talking about Machine Learning?
In the simplest sense, Machine Learning projects are experiments. They are newbies within the realm of scientific methods of inquiry. Artificial Intelligence, and in particular Machine Learning technologies have only relatively recently, progressed from theoretical concepts to practical experiments. There is a lot more interest and excitement around the ambition and grandeur of experiments rather than taking a pensive step back to design and implement a sound methodology and standards for these experiments. Some of the key aspects that need to part the methodology within the MLOps pipeline are not limited to: diligent record keeping of decision records, models, versioning, change logs of data, hyperparameters etc. Ensuring reproducibility is cumbersome and time consuming in the MLOps cycle. The effort seems thankless since it doesn't yield eureka moments that data scientists experience while doing the more cool and exciting tasks like model building. However, if a machine learning system fails to consistently replicate an intended behavior in an operational environment different from its training/lab context, may result in dramatic, even fatal, consequences (read healthcare). Not to mention the legal and regulatory consequences.
We believe that setting up standards and automating the methodology to facilitate reproducibility and replication is paramount to make sure that it should be a default in the MLOps workflows of data scientists. At Clearbox AI, we consider reproducibility as a fundamental element of our product offering, through our model registry feature.
AI Control Room’s model and data registry acts as a centralised tracking system to store lineage, versioning, and metadata of your datasets and models. Assessments generated are securely persisted along with models and datasets.
Model registry is undoubtedly a key facilitator of reproducibility and repeatability in Machine Learning towards establishing robustness and trust in the model results. Oftentimes, reproducibility issues are also caused due to the unavailability of datasets on account of privacy or commercial constraints. In the upcoming blogs we will also discuss these issues and explain how synthetic datasets can help overcome them. Stay tuned!
“In God we trust; all others must bring data’’- W. Edwards Deming (American Statistician)