In vision AI, we’ve seen a lot of improvements over the last couple of years when it comes to automating labeling with software getting better at making you do more with less. However, even with higher degrees of automation, you still have the problem selecting what data to label. Of course, you want to ensure you have as good of a distribution as possible while covering all edge cases prevalent in your training data. But how do you go about it?
From what we’ve seen, this task is challenging for most humans to do. The majority of projects end up doing a variation of splitting up their larger dataset into segments and then labeling those segments as evenly as possible. Although this approach is fine, it still relies on your assumptions about how the model will perceive your data and lacks feedback from the model itself.
In the end, this leads to redundancy in your labeling efforts. Maybe your model already understands scenario A quite well while needing more context for scenario B - but if you don’t know that, chances are you’ll continue labeling data related to scenario A.
Introducing Active Learning
With our new beta release, you are using AI to tell you what images you should label next to improve your model. If we go back to our scenario above, that means you’ll remove unnecessary labeling work and ensure that all work you and your team do actually brings a positive impact to your model.
By using our new Active Learning feature, you’ll have an AI model go through your project and look for the images that are most diverse compared to your already labeled data. By labeling the most diverse images from what you already have, you ensure three things:
- Get a better-performing model with fewer data. As the old adage goes, “Work smarter, not harder”. With Active Learning, you are using a machine to prioritize what data to label for maximum impact, meaning you only label what matters for the model to learn.
- Healthy data distribution. With Active Learning, you have a fail-safe against weighing your labeling efforts too heavily for a certain scenario or class.
- Covering edge-cases. Edge case scenarios are, by their nature, diversifying from the rest of your dataset. Using Active Learning, you’ll ensure that edge cases are prioritized in your labeling efforts.
As an additional benefit, using Active Learning will also give you higher degrees of labeling automation in Hasty itself, as our models will benefit from increased diversity as well.
What results to expect
Our Active Learning feature has been developed with the help of ProFit, a grant for developing state-of-the-art technology in the EU. As part of that, we were required to get our technology verified by a third party. In this case, a German university tested our approach and found that our implementation of active learning can give you a model performing equally well to one where random selection has been applied using 70% fewer images.
Start using state-of-the-art Active Learning today
So the benefits of Active Learning are real, but it can be quite difficult and costly to develop yourself. However, by making it accessible in Hasty with the click of a button…
…you can now use state-of-the-art Active Learning for any project without spending months doing R&D.
If you are interested in trying it yourself, please take one of the following approaches: