If you have ever worked on a Machine Learning (ML) project, you know that the most vital component of the solution's success is a good data asset. For some lucky organizations, they can find that in publically available datasets. For most though, it means having to create it yourself.
The conventional data annotation approach is often a low-cost, quantity approach, requiring an in-house or outsourced manual labeling effort of enormous amounts of data. In many cases, a low-cost approach to labeling can even be ineffective and harmful for your project because:
- Usually, as there's low willingness to pay for annotators, they might not always have the necessary qualifications or expertise you require. This can lead to erroneous labeling. If you are lucky, you catch it early. If you are less lucky, you've paid thousands of dollars for unusable data
- To remedy that, some organizations spend significant time teaching their annotators what to label - which is hidden, added cost to the company while at the same time creating a risk of that knowledge also leaving the organization
- Oftentimes, when paying the bare minimum, you don't really get the QA you would expect either
- This leads to having to redo QA either in-house, pay the annotation provider to redo it, or find another service provider
- You also have to consider the data security and privacy questions. In AI, data is the most important asset you have. The more people you add to a project, the higher the likelihood of something leaking out
We also see more and more advanced AI applications being built all over the world. With that in mind, we think it's time to reevaluate the annotation workforce.
The importance of expert annotators
If you think any annotator can effectively label any data asset, you might want to reconsider. This might seem obvious but there are still some in the vision AI space that thinks any annotator can perform any task with good enough guidance. Sure, back in the day, Computer Vision started with use cases that did not require any hard-to-find expertise. For example, a dog/cat classification task, ImageNet task, or even the autonomous driving problem. Most human beings could label data assets for all of these use-cases easily.
However, since you can now apply Vision AI algorithms in many industries, this pattern is changing. Nowadays, if you are working on a complex Artificial Intelligence (AI) solution for a certain industry, you often find yourself needing a specific expert to assist with annotation. For example, in healthcare, you need a doctor; in agriculture, you need a botanist, and so on. Only with expert help can you create an accurate and comprehensive dataset for your Machine Learning team to work with.
The most important factor here is the complexity of a project. As with all technology, the first use cases that organizations tackle are often the easiest ones. However, as the technology matures and as competition gets fiercer, complexity goes up.
Furthermore, you might need specific specialists in your field. In healthcare, most of the time it's not enough with a general practitioner. You will need a radiologist or a gynecologist to do the annotation for you.
What happens if you don't have the right experts? A lack of expertise is likely to lead to poor labeling and your model underperforming as a result. As most applied AI projects are working towards achieving a specific target metric, a badly annotated data asset might even kill your project. Like most things in life, data annotation quality tends to outperform quantity.
Experts are expensive - but you can reduce that cost
So let's say you're interested in working with real experts for your next AI project. You'll, quite quickly, find out that they don't come for cheap. For a basic annotation task, you might pay 5$ per hour per annotator - for an expert, that might be 25$ or even 100$ per hour. Some quick back-of-the-napkin calculations will tell you that bringing in experts is too costly for your project. Before deciding that though, there are two mitigating factors to consider.
The first factor to take into consideration here is that good experts will outperform lower-skilled annotators in terms of quality, and for complex use cases, speed. So even though the hourly rate is higher, the actual cost might be lower when factoring in an additional cost for QA and reannotation if working with a team that's not up to the task.
The second factor that's important to know is that you can use AI to automate a large chunk of the annotation and QA work for you. For example, we at Hasty have developed AI annotation assistance that both automatically train a model for you and learns as you label. To give you an example of what automation we can offer, here we are annotating PCB boards.
Here, we manage to annotate most of the PCB board automatically (The model has trained on 90 images)
As you can see, our model picks up 90% of all annotations in the image correctly. What that means for you, is that those high-paid experts cost 90% less per image annotated. Suddenly, the cost doesn't look that massive.
We also (as the only end-to-end software in the world) offer an AI-powered quality control feature called Error Finder. Error Finder reviews your annotations automatically and provides feedback on any cases where the model disagrees with your manual annotations. It looks like this:
By highlighting potential errors and letting you decide what's actually an error, you save up to 95% on quality control without any reduction in data quality
Quality control can be anything from 10–50% of the total data creation cost. It also requires expert knowledge, so being able to review potential issues, accepting the correct ones, and rejecting the bad ones will not only save you time (and reduce strain on your eyes), it will drastically reduce your annotation budget.
Want more content like this?
Next Tuesday we are releasing the next part of this series, where we'll go through the considerations you need to make before deciding to in-house or outsource your annotation work. Stay tuned for that.
We also recently released a "Vision AI blueprint" where we go through modern approaches to vision AI development. Check it out if you're interested in learning more about how you can do AI projects in a more agile, data-centric manner.
If you are interested in trying us out, we offer a free two-month trial where you can try and see if AI automation is right for you.
Also, if you have any questions or just want to pick our brains before deciding on how to proceed with your next project, you can email our Head of Product at [email protected]. He always has 15 minutes to help new projects get off the ground - and I promise he will not try to sell you on Hasty before first helping you figure out the basics.