In our previous article, we went through the increasing need for expert annotators and how to reduce the cost of bringing in expensive experts for annotation. Today, we'll look at one of the main questions that our customers often struggle with when starting a project: Should they annotate in-house or outsource the job to an annotation provider?
Nowadays, most labeling tasks are outsourced to annotation service providers because such an approach saves the company's employees time and the company money. Also, if we're honest, most of us don't like to annotate and prefer paying someone else to do it. However, if you are working on a complex use case, outsourcing your annotation is risky and might not be the right way to go.
Let's say that you find an excellent annotation provider (if you're looking, we are mentioning our favorites at the end of this article) that has the experts you need. Everything should be fine. You hand over the project to the annotation provider and expect to get quality data back at some defined period down the line. No need to worry, you might think. You're mistaken.
Let's give a simple example. Imagine you are working on detecting a specific type of cancer. To do that, you need to label a lot of images with specialist doctors. After the first week, you might find out differences in how different annotators label the data. Although you spent considerable time writing excellent documentation for your specialists, none of them seem to be on the same page. Why is that?
In general, as we increase complexity, we increase the level of subjectivity we see in our data. Labeling a cat with a bounding box is easy - deciding what part on an x-ray is cancer, what stage it is in, and which specific type it requires a lot more from your annotators. To add more complexity, how you annotate medical images can vary from country to country, from hospital to hospital, even from doctor to doctor. There's not much in terms of setting standards to follow.
And what if the doctors that help you are all over the globe, have different backgrounds, and do not know each other? In this case, the subjectivity problem will only grow worse, and you will be forced to develop a solution quickly. To newcomers, this might seem surreal, but it is a real example (check the related article.)
To solve this subjectivity issue, you need to align your annotation strategy. Essentially, you reduce subjectivity by doing quick iterations - annotating a hundred or so images - and then going through and seeing where there's consensus and where opinions diverge. You have to get everyone together and get them on the same page before doing the same exercise again. Rinse and repeat until your annotators are so aligned that the divergence is minimal.
Here's where the first hurdle with working with annotation providers comes in. You, as a company, know what you need from your data. For example, in our cancer use case, it makes sense to say that false positives are acceptable, but false negatives are not. When working on complex use cases, there are often hundreds of these decisions that need to be made and where you need to align with your team. Together, we call all of those decisions made an "annotation strategy".
As most people that have worked on an annotation project know - an annotation strategy is a living document that will change considerably during the first couple of weeks and months.
When outsourcing labeling, this becomes harder. Most times, you don't have direct access to the team. This means you will have to communicate with a third party - may be a customer success manager - and hope that this person communicates your strategy to the team correctly.
You will also get less feedback from the annotators' themselves, making it harder to understand when there is divergence or confusion with your strategy.
This is because you have different power dynamics when outsourcing work. If you do something in-house, you all work towards a common goal of creating the best possible data asset. When you outsource, that is still true, but you add a client-provider relationship on top of that. This makes it harder for annotators to reach out and say "Are we sure we are annotating this correctly?", as an example, because they are not only trying to produce the best data asset possible but also manage you as a customer.
And finally, as annotation providers often deliver data in batches, you often only detect these issues after a significant amount of work has already been carried out. So if the communication hasn't been great between you and them, you might be surprised when you see what you get back.
The simple fact is that finding these issues and making the right decisions is a lot easier to do internally, as you have a much better understanding of the use case and the intended business logic. You also have more direct communication, better power dynamics, and can collaborate in real-time.
In most cases, you also have the experts needed to make in-house a possible option. Having an expensive in-house talent doing your labeling has not always been an option, but with advancements in AI-assisted annotation and QA, your experts can be a lot more efficient than before (more on that later).
Let's say that you are OK with the issues mentioned above and are still leaning towards outsourcing the annotation job. Before deciding, there's still one more hidden cost you need to be aware of - the training of your new annotation workforce.
As we detailed above, using the expertise and know-how you already have internally will greatly increase your chances of annotating correctly on the first try.
To migrate this expertise, especially when working on more complex projects, you will have to transfer your in-house knowledge to your annotation provider. In other words, you need to train the workers to see the data in the same way you do.
We see all too often that the time and expense of this process are heavily underestimated. Many first-time project managers think it will take a week or two to get the outsourced annotation team up to speed. This is seldom the case.
Let's say you are about to start a project where you want to annotate different potatoes on a sorting belt. You want to check three things:
You've prepared proper documentation. You send it over to the outsourced team to review, and everything seems fine. Just to be sure, you do a first, smaller, annotation cycle of 100 images. You get the results back after a week.
Already at this stage, you'll probably start seeing the first issues pop up. In this case, likely, you are not 100% aligned with your annotators of how a ripe potato looks.
What is happening here happens for every project. You can write all the documentation you want, but to annotate properly, annotators need to internalize that knowledge and develop, for lack of a better word, a good "feel" for the use case. You have that "feel" already internally, but now you need to communicate it to your annotation provider. In other words, you need to train them.
Next, let's say that you successfully solved all the issues from the first annotation batch. You do another set of 500 images. You get the results back and once again find yourself scratching your head.
It might smile, but how does the fact that it's sprouting affect in which class it should go? These edge cases will require considerable calibration and communication to get right. Photo by Łukasz Rawa.
This process will keep occurring throughout a project. You'll find that what you thought would be a 1-week job turns out to be a constant demand for clarification and training. This is natural. You are an expert in the field; your annotators are not. But you are training them to be.
What you've done is to transfer in-house knowledge to a third party. Now, that in-house knowledge might not always seem valuable to you, but there are risks you need to consider.
Firstly, how do you ensure that the money and time you've spent on training up the workforce do not benefit your competition? This question will have to be handled on the contract side before the project even starts.
Secondly, what is the risk of data or knowledge leaking out from the annotation provider? Although most annotators are honest and hard-working, their pay is often low. For some, the incentive of taking your data and selling it or taking their newly acquired knowledge and switching to the competition might be a siren call hard to resist. As always with data security, the more people and the more organizations involved, the higher risk for leakage.
Thirdly, what can annotators do with the knowledge you've been transferring to them? By giving them access to the data and training them to be experts, you've given a third party 2/3s of the recipe needed to build a copy of your AI solution. What's stopping an enterprising person from going out and building their own data asset and finding an ML engineer to create something very similar you are working on?
In the end, what you will have to decide on is risk appetite versus cost.
Outsourcing annotation work is almost always a lot cheaper than doing it in-house - but never as cheap as the price you'll receive up-front. You will still need to train; you will still need to manage and communicate. There are hidden costs that you might not be aware of and risks to consider.
This article outlined some of the issues we've seen with organizations outsourcing their annotation work. To be clear, we are not saying don't do it. We're saying that you need to understand the risks and the hidden costs before you decide.
We haven't yet discussed, though, a recommended approach for making the best use of third-party annotation services. From what we've seen, we would recommend:
Only 13% of vision AI projects make it to production. With Hasty, we boost that number to 100%.
Our comprehensive vision AI platform is the only one you need to go from raw data to a production-ready model. We can help you with:
All the data and models you create always belong to you and can be exported and used outside of Hasty at any given time entirely for free.
You can try Hasty by signing up for free here. If you are looking for additional services like help with ML engineering, we also offer that. Check out our service offerings here to learn more about how we can help.
There are many good annotation companies out there that we would trust with our data. Among the best we've worked with are:
We can offer higher quality data and faster speed at a lower price than anyone else, thanks to a unique combination of workforce and automation.