NEW
All annotation is now free in Hasty.

2022.02.25

Bringing active learning, model explainability, and Bayesian networks to Hasty with the help of ProFit

Lisa Wäntig

How we plan to use active learning further to improve the automation and effectiveness of machine learning tasks.

Before we go into what we’ll do, we have to start with the problem.

Today's machine learning engineers, data scientists, and software developers struggle with traditional cumbersome processes like annotating data and tuning models. In the ML lifecycle, there are many steps that require quite a bit of this manual work. From annotating data to doing quality control, data processing heavily depends on manual input. And so is model training. Although computation does most of the heavy lifting, organizing your data and understanding it takes up considerable time for most data scientists.

In Hasty, we are already reducing the scale of these issues with our labeling and QA automation and our no-code model playground. However, we are always trying to improve on what we have.

Our next big improvement step is adding new and innovative features to all components of Hasty. This project has been made possible by a grant from the ProFit funds for technological innovations by the Investment Bank Berlin, with kind co-support of the European Union.

In total, we are looking at adding three new features. Those are:

Bayesian neural networks and diversity analysis

The goal here is to highlight biases or correct errors in the user's labeled data before it is used to train production-ready models. It has the potential to save huge amounts of time, money, and budget, as quality assurance can take up to 70% of an ML project’s budget.

Our stated goal here is to empower users to use Bayesian networks to find errors in their labeled data once they have labeled or finished labeling their data. We already offer our AI consensus scoring feature. Still, we think that adding Bayesian networks will enable us to give all our users better results and fewer false positives and negatives.

We base our belief on the fact that Bayesian neural networks have uncertainty estimation built into the paradigm and techniques such as Monte Carlo sampling that can be used to debug data or highlight labeling errors automatically.

Neural network explainability

Next, we want to give users insight into their individually trained neural networks by opening up the black box of AI and adding further explainability features to our Model Playground.

Today, users can see performance metrics after training a model, but this is not enough to tell them whether they can trust certain results. Our new explainability features will provide insights into the model, telling you which parts of the image were important to the prediction. It also helps you understand the capabilities and limitations of your models and could help with defining better retraining strategies or inform you that additional data collection is needed.

Continual learning

A major problem with Deep Learning is that Artificial Neural Networks are very poor at retaining information. Standard neural networks tend to quickly and completely forget previously learned tasks when trained on a new task. A phenomenon called catastrophic forgetting. Essentially, every time you retrain a model, it learns something new but forgets everything else.

This problem often occurs when you add a new task or fine-tune it for an additional use case. Although you might get good results on your new addition, you might also see a degradation of performance on the older use cases or tasks. Many previous works address this
problem and attempt to solve it. However, most of these attempts focus on the image classification problem in one form or another. While there has been progress in the area of continuous learning for image classification, very little work has been done in the context of
object recognition, which requires both detection and localization of objects. We are aiming to change this, meaning all AI assistants in Hasty will be future-proofed against catastrophic forgetting.

You also have a cold start problem. Any automation offered at the initial labeling stages of a project in any software you can find will be based on some type of pre-trained model. But your data is different. Even if we use models that have seen all of COCO or ImageNet, it will likely not perform that well on your data. We solve this in Hasty by training our automation models on your data as you label, but that still means that when you are in the initial stages, you will not get that much in terms of labeling automation as our models haven’t seen enough data yet.

Here, too, continual learning can help. We can create the steepest possible learning curve by adding continual learning to identify and highlight the images that the model understands the least well. This means you’ll label images in an order that gets you to automated labeling faster.

What this means for you as a user

You’ll get higher degrees of automation for all data preparation tasks and more insight into the what and how your models perform. This will enable you to do less with more and use state-of-the-art machine learning techniques without having to implement the technology yourself. We see this as another step towards enabling teams big and small to get to production faster and with less risk. In short, the best vision AI platform is getting better.

With that said, a reasonable question might be when you can expect to make use of these new features. To that, we’ll have an exciting announcement coming in the next couple of weeks.

If you can’t wait until then, feel free to book a call with us here. We are looking for beta testers, so we would love to show you what we’ve been working on if you're interested.

Keep reading

Tuple

Hasty has taken our data labeling to the edge. Both semantic and bounding box labeling has gone from weeks or months on our large data sets to days. For QA, I just reviewed 19,000 labels in 5 hours. WTF!

Removing the risk from vision AI.

Only 13% of vision AI projects make it to production, with Hasty we boost that number to 100%.