A guide to finding the right approach for your use-case
A lot has been written about the different annotation strategies for computer vision projects explaining the differences between object detection, instance, and semantic segmentation. But if it comes to the question, which one is the right one for you, the answer is always: 'It depends on your use-case.' This is why we created the infographic below to help you find the right approach for your particular use-case.
This infographic is under the 'CC BY-SA' license. So you can share it with everyone you know. Link to the high-resolution image
Either if you're new to the topic or want to refresh your knowledge, we have a small overview of the three main annotation strategies for you.
Object detection is used to locate discrete objects in an image. The annotation is relatively simple as one simply has to draw a tight box around the intended object. The benefits here are that storing this information and the required computations are relatively light. The drawback is that the 'noise' in the box - the 'background' captured - often interferes with the model learning the object's shape and size. Thus, this method struggles when there is a high level of 'occlusion' (overlapping or obstructed objects) or high variance in an object's shape. That information is important - think of types of biological cells or dresses.
An example for an object detection label. It is less computational effort to train an object detection model, but it has its limitations when objects overlap each other or differ across a class's instances.
The most well-known model architecture for object detection is the infamous YOLO v3 by Joseph Redmon. SOTA ("State of the art") at the moment is the Cascade Eff-B7 NAS-FPN architecture.
This is useful in indicating the shape of something where the count is not important such as the sky, road, or background. The benefits here are that there is much richer information on the entire image as you annotate every pixel. Your goal is to know exactly where regions are and their shape. The challenge with this method is that every pixel needs to be annotated, and the process is time-consuming and error-prone. Also, it is not possible to differentiate single instances of one class. I.e., the final model will only be able to tell if a pixel belongs to a car or not. But not how many cars are in an image.
An example for semantic segmentation. It is impossible to distinguish between different instances of a class, for example, the different cars or persons.
SOTA on the Cityscapes dataset-which is often used for training in autonomous driving use-cases where semantic segmentation is used a lot-is HRNet-OCR.
This is useful in indicating discrete objects such as car 1, car 2, flower a, flower b, or actuator. The benefits are that objects' shapes and attributes are learned far faster, having to be shown fewer examples, and occlusions are handled much better than with object detection. The challenge is that this method has a very time-consuming and error-prone annotation process.
An example for instance segmentation on an image taken on a trip to San Francisco. The single persons are recognized as single instances.
The most used model architecture for instance segmentation is Mask R-CNN. SOTA is the Cascade Eff-B7 NAS-FPN (as for object detection as well).
From working on practical computer vision projects daily here at Hasty.ai, we learned quite a lot about how to approach the image annotation process. Generally speaking, there are three tips we can give you along the way (no matter which annotation strategy you use):
We hope that our infographic and post provide some value to you and help you to get started with your annotation strategy. If you have any questions, or comments reach out to me at [email protected] or join the discussion in our community
We're a Berlin-based startup and building the next-gen annotation tool for computer vision. We're constantly trying out new stuff to see how we could improve the algorithms behind our AI-Assistants. They allow you to annotate data 10x faster and provide you with rapid feedback so you can validate and adapt your models as you work.
If you're interested in learning more, check out our website and try the tool (you can start for free 😉).
We just started our blog. We want to continue sharing our experience of working with computer vision projects on a daily basis. To make sure that we provide content relevant to you, please fill out this survey. It takes less than 1min, we promise!