If you have ever worked with a Machine Learning (ML) model in a production environment, you might have heard of MLOps. The term explains the concept of optimizing the ML lifecycle by bridging the gap between design, model development, and operation processes.
Nowadays, MLOps is not just a concept but a much-discussed aspect of machine learning that is growing in importance every day as more and more teams are working on implementing AI solutions for real-world use cases. If done right, it helps teams all around the globe to develop and deploy ML solutions much faster.
Usually, when reading up on the term, MLOps is referred to as DevOps for Machine Learning. That is why the easiest way to understand the MLOps concept is to return to its origins and draw a parallel between it and DevOps.
Let’s jump in.
What is DevOps?
Feel free to skip this part for those of you who already know.
Development Operations is a set of practices that combines software development (Dev), testing, and IT operations (Ops). DevOps aims to turn these separate processes into a continuous pipeline of interconnected steps. So, if you follow the DevOps philosophy, you will shorten the systems development life cycle and provide continuous delivery with high software quality.
The core principles of DevOps are processes automation, feedback loops, and the CI/CD concept.
In software engineering, CI/CD loop refers to the combined practices of continuous integration (CI), continuous delivery (CD), and continuous deployment. Let’s define these terms and check how the loop works.
CI is the practice of automating code changes from multiple contributors into a single software project. So, CI helps to optimize the code changes.
CD provides automated and consistent code delivery to various environments, for example, testing or development. When the newest iteration of the code is delivered and passes the automated tests, it is time for continuous deployment that automatically deploys the updated version into production.
To simplify, CI is a set of practices performed during the coding stage, whereas CD practices are applied whenever the code is ready.
So, the CI/CD loop combines software development, testing, and deployment processes into one workflow. It heavily relies on automation and aims to accelerate software development. With CI/CD, development teams can quickly deploy minor edits or features. Thus, you can develop software in quick iterations while keeping its quality.
Moreover, since programmers can spend less time manually coding, deploying changes, and doing other routine tasks, they can focus more on the customers’ requests to update some functionality or create new features. With DevOps, when working in short cycles, your users will not have to wait long for some big release to get an upgraded version of an application.
To summarize, with DevOps, you will be able to improve communication and collaboration between your development and operation teams to increase the speed and quality of software development and deployment.
Got it. Now, what is MLOps?
With the basic knowledge of the DevOps concept and its benefits, let’s move on to MLOps. Here things are a bit different. Today, ML model development and operations are often entirely separate. Also, the deployment process is manual. Therefore, building and maintaining a solution might take longer than expected. Machine Learning (ML) Operations (Ops) is a set of techniques used to optimize the entire ML lifecycle. Its aim is to bridge the gap between design, model development, and operations.
MLOps focuses on combining all the stages of the ML lifecycle into a single process workflow. Such a goal requires collaboration and communication between many departments in a company. However, if you manage to achieve it, MLOps provides a common understanding of how ML solutions are developed and maintained to all stakeholders. It is similar to what DevOps does for software.
The key MLOps principles are:
- Versioning - keeping track of the versions of data, ML model, code around it, etc.;
- Testing - testing and validating an ML model to check whether it is working in the development environment;
- Automation - trying to automate as many ML lifecycle processes as possible;
- Reproducibility - we want to get identical results given the same input;
- Deployment - deploying the model into production;
- Monitoring - checking the model’s performance on real-world data.
The core practices of MLOps are continuous integration (CI), continuous delivery (CD), continuous training (CT), and continuous monitoring (CM). Let’s cover each of them:
- We have already gone through the definition of CI in DevOps above. In MLOps, we essentially do the same. However, in addition to testing and validating the code, CI includes testing and validating data and ML models;
- CD works with an ML training pipeline that automatically deploys the model into production;
- CT is a unique property for ML solutions that is concerned with automatically retraining and serving ML model;
- CM is about monitoring production data and measuring model performance using specific metrics (for example, some business metrics).
So, the MLOps loop is pretty similar to the DevOps one with slight adjustments that are ML-specific.
Following the MLOps philosophy when developing an ML solution has many benefits. To give a short overview, they include:
- Your team will have more time to develop models. If you choose a reliable MLOps tool with deployment automation, you can deploy ML models without expertise in the cloud infrastructure. So, you will be able to spend more time developing the model without the need to spare a significant amount of time for model deployment;
- You will need less time to deliver an MVP. As MLOps aims to automate the ML lifecycle processes and uses CI/CD concept, you can deliver the solution in quick iterations much faster;
- You will obtain better model performance because MLOps helps you to overcome the data shift problem faster. We will talk about it a bit later in this post.
- You will build practical training and serving ML pipelines that could be reused multiple times for different projects.
To sum up, with MLOps, you can deploy an ML training pipeline that can automate the retraining and deployment of new models, which is way better than deploying a single model available via an API endpoint.
MLOps Vs. DevOps
As you might have noticed, there are a lot of similarities between MLOps and DevOps concepts. It should not be a surprise because MLOps borrows a lot of principles developed for DevOps.
Both DevOps and MLOps concepts encourage and facilitate collaboration between the development teams, for example, programmers, ML engineers, employees who manage the IT infrastructure, and other stakeholders. Also, both aim to automate the continuous development processes to maximize the speed and efficiency of your engineering team.
However, despite DevOps and MLOps sharing similar principles, it is impossible to take DevOps tools and straightforwardly use them to work on an ML project. Unfortunately, the devil is in detail, so MLOps has some ML-specific requirements. Let’s check them out.
The first thing you should keep in mind is the versioning differences between these two concepts. In DevOps, it is pretty straightforward as you use versioning to provide clear documentation of any changes or adjustments made to the software under development. So, 99,9% of the time, it is only about the code. That is why in DevOps, we usually refer to versioning as code versioning.
However, when working on a Machine Learning project, code is not the only thing that might change. In addition to the code, MLOps aims to keep an eye on the versions of the data, hyperparameters, logs, and the ML model itself.
Second, if you have ever worked on an ML project, you might know that training an ML model requires a lot of computational resources. For most software projects, the solution’s build time is entirely irrelevant, and therefore, the hardware does not play a significant part. Unfortunately, in ML, the situation is vice versa. It might take plenty of time to train larger ML models, even if you use large GPU clusters. Thus, there are more stringent hardware testing and monitoring requirements in MLOps.
Last but not least, DevOps and MLOps have a difference in monitoring approaches. In software development, the characteristics of your solution might not need any changes over time, whereas in Machine Learning, ML models must change to stay competitive. In ML, once you deploy the model into production, it starts working on the data it receives from the real world. Real-life data is constantly changing and adapting as the business environment changes. So, the quality of the model decreases as time proceeds. MLOps provides automated procedures that facilitate continuous monitoring, model retraining, and deployment to minimize this problem. Thus, the model will remain up-to-date and keep its performance on the same level.
It is no secret that there are many obstacles you might face when developing, deploying, or operating a Machine Learning solution. However, it is always better to know your enemy before you meet him to develop a potential solution. So, let’s identify them:
- Lack of automation. Nowadays, we do many processes in the ML lifecycle manually. Therefore, specialists spend tons of time doing something relatively simple, for example, data annotation. The conventional approach to the labeling process is manually labeling each data asset in the dataset. However, the process can be semi-automated using AI-assisted annotation to save time and budget. Such a pattern is true to many ML lifecycle stages.
- Ineffective solutions. Unfortunately, many Machine Learning projects do not get into production because they face some unsolvable issues in the development stage. It might happen because of various reasons, but in general, usually, such solutions are simply not good enough to take a shot in production;
- Lack of communication. To build an effective ML solution that will be successful both in development and in production, your development and operation teams must interact with one another and provide the necessary support, guidance, and expertise if needed. Unfortunately, in real-life, these teams are often entirely separated;
- Shifting of job duties. Sometimes data scientists are viewed as universal soldiers who can find and label the data, develop an effective ML model, deploy it into production, and provide network security. However, that is not how it should work in a perfect world. Data scientists are not ML engineers, expert annotators, or information security specialists. They should do what they know the best - work with the data and develop complex ML models. In any other field, they might lack the necessary knowledge and competence;
- Potential decrease of model’s accuracy in production due to data shift.
Potential solutions and additional benefits
However, with MLOps, you will address these issues and get some additional benefits:
- Process automation. Choosing the right MLOps tool will automate some ML lifecycle processes, such as model retraining and deployment. Moreover, you can use some additional ML automation tools, for instance, Hasty;
- Constant communication. MLOps is impossible without getting everyone on the same page and aligning the development and deployment strategies. So, to succeed under the MLOps system, your departments will have no other option but to communicate and collaborate;
- When working on an ML project in quick iterations, the team is likely to deliver a working product instead of staggering somewhere in the middle lost in the sea of data, model versions, experiments, hypothesis, etc.;
- No shifting of job duties. With the automated deployment process, Data Scientists will be able to focus on their primary tasks;
- You can create repeatable training and serving ML pipelines that you can then reuse for other ML projects;
- You can focus on feedback. As mentioned above, having more automated processes results in sparing time developers can spend on improving the solution by analyzing and implementing customers’ requests and ideas;
- The data shift problem can be solved using the Data-centric AI and Data Flywheel approaches complementing the MLOps philosophy.
MLOps Vs. Model decay problem
Unfortunately, even if your ML model is in production and served well, you are far from done. As mentioned above, an ML model might underperform in the production environment because of the model decay. Strictly speaking, the model decay term refers to the phenomena of an ML model’s accuracy decreasing over time.
Unfortunately, there is not much you can do to prevent the decay ahead of time. If it occurs, it occurs. It happens because your model is not operating in a vacuum but in the ever-changing production landscape. The world is in a constant state of change, and the data follows that. Moreover, the data your model was initially trained on is also likely to differ from the real-life data. Chances that you covered all edge cases when initially training your model are low. Thus, you can expect a mismatch between the data the model saw when training and what it sees in production. And you might face the model decay as a result.
You might come up with an idea to adapt your model over time to avoid this problem. However, it is tough to fix model decay on the model side as the lack of performance is not the model’s direct fault. It happens because of changes in the data.
Let’s take a look at a simple example. Imagine working on a vision AI project that detects whether a person’s eyes are open or closed. As a training set, you use plenty of images of humans’ eyes, but none with glasses. So, when you deploy your model, you will find out that your model underperforms on the glasses use case. The glasses use case is data shift, and the whole situation is the model decay.
However, if you have not noticed the model decay but have already moved on to another project, it might take a while before you see that your previous model underperforms. So, it would help if you had a modern solution that would automatically detect the decay and address the problem to avoid this. And that is where MLOps can back you up. With a well-rounded MLOps tool, you will be able to set up a CT/CI/CD pipeline that will automatically detect the decay, retrain the model and change the model in production to an updated version.
How Hasty can help (aka a shameless plug)
For those of you that are looking for an MLOps solution - look no further! Hasty is a vision AI platform that helps you throughout the ML lifecycle. To date, we can help you with:
- Automating up to 90% of all automation
- Make quality control 35x faster
- Train models directly on your data using our low-code model builder
- Take any custom models trained in Hasty and deploy them back to the annotation environment in one click
- Export any models you create in commonly used formats
- Or host any model in our cloud
- Monitor inferences made in production
- Most importantly, we offer all this through an API for easy integration.
In short, we take care of a lot of the MLOps so you don’t have to. Book a demo if you want to know more.