A lot has been written about the different annotation strategies for computer vision projects explaining the differences between object detection, instance, and semantic segmentation. But if it comes to the question, which one is the right one for you, the answer is always: 'It depends on your use-case.' This is why we created the infographic below to help you find the right approach for your particular use-case.

1 pIqRdurd8Y9CYwANK8iWtg This infographic is under the 'CC BY-SA' license. So you can share it with everyone you know. Link to the high-resolution image

The different annotation strategies explained 

Either if you're new to the topic or want to refresh your knowledge, we have a small overview of the three main annotation strategies for you.

Object Detection 

Object detection is used to locate discrete objects in an image. The annotation is relatively simple as one simply has to draw a tight box around the intended object. The benefits here are that storing this information and the required computations are relatively light. The drawback is that the 'noise' in the box - the 'background' captured - often interferes with the model learning the object's shape and size. Thus, this method struggles when there is a high level of 'occlusion' (overlapping or obstructed objects) or high variance in an object's shape. That information is important - think of types of biological cells or dresses.

1 0OIfh8VVXO0PE6rSeBuWQA An example for an object detection label. It is less computational effort to train an object detection model, but it has its limitations when objects overlap each other or differ across a class's instances.

The most well-known model architecture for object detection is the infamous YOLO v3 by Joseph Redmon. SOTA ("State of the art") at the moment is the Cascade Eff-B7 NAS-FPN architecture.

Semantic Segmentation

This is useful in indicating the shape of something where the count is not important such as the sky, road, or background. The benefits here are that there is much richer information on the entire image as you annotate every pixel. Your goal is to know exactly where regions are and their shape. The challenge with this method is that every pixel needs to be annotated, and the process is time-consuming and error-prone. Also, it is not possible to differentiate single instances of one class. I.e., the final model will only be able to tell if a pixel belongs to a car or not. But not how many cars are in an image.

0 bgz rmurWrfljLzg An example for semantic segmentation. It is impossible to distinguish between different instances of a class, for example, the different cars or persons.

SOTA on the Cityscapes dataset-which is often used for training in autonomous driving use-cases where semantic segmentation is used a lot-is HRNet-OCR.

Instance Segmentation

This is useful in indicating discrete objects such as car 1, car 2, flower a, flower b, or actuator. The benefits are that objects' shapes and attributes are learned far faster, having to be shown fewer examples, and occlusions are handled much better than with object detection. The challenge is that this method has a very time-consuming and error-prone annotation process.

1 yKaVjvDmP1sBFu iXSWMJg An example for instance segmentation on an image taken on a trip to San Francisco. The single persons are recognized as single instances.

The most used model architecture for instance segmentation is Mask R-CNN. SOTA is the Cascade Eff-B7 NAS-FPN (as for object detection as well).

Some general advice for annotating images

From working on practical computer vision projects daily here at Hasty.ai, we learned quite a lot about how to approach the image annotation process. Generally speaking, there are three tips we can give you along the way (no matter which annotation strategy you use):

  1. Annotate as much data as you can yourself, don't just outsource it to another company or delegate it to the intern. Labeling the data yourself will help you develop a deep understanding of your data and detect issues like data shift early on. Make sure to use an annotation platform with high levels of annotation automation that allows you to annotate fast, so you don't spend weeks or months doing work that can be automated.
  2. Prototype quickly, don't wait until all the data is labeled. Only then, you'll be able to identify potential pitfalls only. And let's be honest, even the best ML engineers need to run through a few iterations before deploying a model to production. Thus, when choosing an annotation platform, make sure that it allows you to work in an agile fashion
  3. Double-check on the quality of your data. A model's quality is limited by the quality of the data fed to it. So make sure to have a quality assurance process for your data in place. SOTA annotation platforms provide features that leverage neural networks to do this for you.

We hope that our infographic and post provide some value to you and help you to get started with your annotation strategy. If you have any questions, or comments reach out to me at [email protected] or join the discussion in our community

About Hasty.ai

We're a Berlin-based startup and building the next-gen annotation tool for computer vision. We're constantly trying out new stuff to see how we could improve the algorithms behind our AI-Assistants. They allow you to annotate data 10x faster and provide you with rapid feedback so you can validate and adapt your models as you work.

If you're interested in learning more, check out our website and try the tool (you can start for free 😉).

What content are you interested in?

We just started our blog. We want to continue sharing our experience of working with computer vision projects on a daily basis. To make sure that we provide content relevant to you, please fill out this survey. It takes less than 1min, we promise!