General best practices

This wiki dives deep into the world of visionAI and it's easy to get lost in the woods. This is why we want to share two lessons we learned working on hundreds of visionAI projects together with our users here at Hasty.ai.

#1 Spend more time on your data than on the model

Most of the ML research focuses on coming up with new model architectures and fancy hyperparameters. However, when you go into the real world, you're operating under completely different conditions—the most drastic change probably being that data in the real world is never as clean in the lab. Consequently, when you're building a model for production, you should really understand the interplay between your model and your data and work on getting the right data instead of chasing after what's SOTA.

Let's make this a bit more tangible: Andrew Ng and his team got stuck at an accuracy of 76%. Then they split up the team for the next two weeks: one group trying to improve the data, the other one working on the model. The data team could increase the model's performance to 93.1%, whereas the other team couldn't improve it at all [source]. Similar results occur often.

#2 Iterate quickly and often

In traditional software development, the word "agile" is so widely used that it became inflated and part of the typical buzzword bingo. Almost everyone knows it and many teams worldwide implemented the concepts successfully.

In ML, however, most teams still follow a linear approach. They collect data, annotate it for weeks if not months, and only then train their first model. More often than not the first model performs poorly and the team notices that they should have collected different data, chosen another annotation strategy (e.g., masks instead of bounding boxes), ... The result: 60% of ML projects get killed in the proof-of-concept stage because many resources have been invested, but the results are disappointing.

I don't want to sound like a mediocre business consultant stating the obvious, but the answer to this is a more agile approach. Instead of annotating the whole dataset first, only a subset of the data should be labeled and the first model should be trained early in the process. Then, you can add more data, and when you got that part right, start tweaking the initial model.

There are a lot of new tools and coming up aiming to enable such a process. Hasty.ai is one of them.

Some more wisdom

Check out our blog to read more practical guides around visionAI in production. To list a few of many posts:

Boost model performance quickly with AI-powered labeling and 100% QA.

Learn more

Last modified 2yr ago

Previous - Getting started

Overview of topics

Next - Key principles of Computer Vision

Convolutional layer