As you know, to achieve the best performance of your model, it is crucial to have a rich and properly labeled dataset. However, it is not always possible — or necessary — for you to gather your own dataset from scratch, as it requires massive effort and costs. Luckily, there exist large publicly available datasets that will save you time and provide you with high-quality topic-specific images.
In this article, we will:
Let’s jump in!
Instance Segmentation (IS) in Computer Vision (CV) refers to the task that combines the main principles behind Object Detection (OD) and Semantic Segmentation (SemSeg). The model receives images as an input and returns pixel-precise masks with labels for each object in the image.
Instance Segmentation task is used in many industries and domains. Below we will list a few most common examples.
1. Medicine
Using Instance Segmentation in medicine can help researchers and medical workers study humans' anatomical structures, identify and localize tumors and abnormalities in body tissues, and address many other practical and research-oriented tasks.
2. Biology
Biology can use Instance Segmentation in many ways, for example, one can perform the IS task on bacteria colonies to understand their morphology, distribution, growth, and diversity.
3. Video Segmentation
Instance Segmentation can be used in videos for various purposes - from building self-driving vehicles to analyzing sports matches and teaching robots to perform surgeries.
4. Satellite imagery
Governments and businesses use satellite imagery for a wide variety of tasks - from weather forecasting and environment assessment to warfare and urban planning. Instance Segmentation allows to detect and label the finest details in the image, for instance, each individual house, tree, or car.
5. Product classification
Shops and stores might utilize Instance Segmentation to review their stocks easier or to improve customer experience by automatically recognizing the items in the shopping cart, to name a few examples.
COCO is a large-scale Object Detection, Segmentation, and Captioning dataset. It was introduced in 2014 in a paper by Lin et al. and has set the state-of-the-art Object Recognition since then. The dataset is maintained by a team of contributors, and its sponsors include CVDF, Microsoft, Facebook, and Mighty AI.
To the day, the COCO dataset contains around 330K images which encompass:
You can explore different object categories on COCO’s website. If you select specific classes, you can view all the images that contain all the specified labels - for example, cat and computer mouse.
The images presented in COCO are mostly non-iconic and provide information about contextual relationships between objects. Such images are harder for the model to train on, since they naturally contain a lot of noise, clutter, occlusions, and other problematic aspects. In their paper, the authors illustrate the difference between iconic and non-iconic images.
Besides Instance Segmentation, you might want to use the COCO dataset to train or evaluate your model for other CV tasks as well, including:
The COCO dataset serves as a benchmark that helps to assess the performance of various CV models. Thus, it becomes easier to compare the models between each other and evaluate the improvement of each given model over time.
If you want to check out the state-of-the-art models for Instance Segmentation evaluated on the COCO dataset, you can follow this page and track changes.
Cityscapes is a large-scale dataset focusing on the semantic understanding of urban street scenes. It was presented in 2015 in the paper by Cordts et al. and has been extended by various contributors since then.
The dataset consists of 5 000 images with fine annotations and 20 000 images with coarse annotations. The difference between fine and coarse annotations is illustrated below.
The annotations in Cityscapes are divided into 8 categories with 30 classes. For example, group “human” includes classes “person” and “rider,” and group “flat” includes classes “road,” “sidewalk,” and so on.
To ensure sufficient diversity, the images were gathered from 50 cities under different conditions, like time of the year, day, and weather state. Initially, the dataset was recorded as a video, so only frames that met certain criteria (a large number of dynamic objects, varying scene layout and background) were selected.
The images in Cityscapes contain some metadata that might be of interest to you:
Apart from Instance Segmentation, with Cityscape, you can train and evaluate your model for the following CV tasks:
The Cityscapes dataset serves as a benchmark that helps to assess the performance of various CV models. Thus, it becomes easier to compare the models and evaluate the improvement of each given model over time.
If you want to check out the state-of-the-art models for Instance Segmentation evaluated on the Cityscapes dataset, you can follow this page and track changes.
The first PASCAL (Pattern Analysis, Statistical Modeling and Computational Learning) VOC (Visual Object Classes) challenge took place in 2005 and featured two competitions: classification and detection. The results of the challenge were presented in the paper "The 2005 PASCAL Visual Object Classes Challenge" by Everingham et al. Back then, the final dataset contained only 4 classes: bicycles, cars, motorbikes, and people.
Since then, the challenge has been conducted each year up to 2012. The competition categories have expanded extensively, and the data was gathered for the following tasks:
Currently, the training and validation sets have 11,530 images containing 27,450 ROI annotated objects and 6,929 segmentations.
To the day, the PASCAL VOC 2012 dataset contains 20 object categories, including:
The PASCAL VOC dataset serves as a benchmark that helps to assess the performance of various CV models. Thus, it becomes easier to compare the models between each other and to evaluate the improvement of each given model over time.
If you want to check out the state-of-the-art models for Instance Segmentation evaluated on the PASCAL VOC dataset, you can follow this page and track changes.
LVIS is a dataset for Large Vocabulary Instance Segmentation. It was presented in the paper “LVIS: A Dataset for Large Vocabulary Instance Segmentation” in 2019 by Gupta et al.
The LVIS dataset uses the COCO 2017 train, validation, and test image sets and adds its own annotations to it. The data split is the following:
As the authors claim, due to the Zipfian distribution of categories in natural images, datasets usually have a few most commonly appearing categories. At the same time, there is a long tail of categories that appear rarely and have only few training samples that are not enough for the model to learn from. Hence, the LVIS dataset aims to provide an exhaustive annotation of underrepresented object categories.
The LVIS dataset contains almost 2 million instance annotations and 1203 object categories - from general and common ones, like cat, book, bus, to more refined ones, like earring, flower arrangement, and knee pad.
On the LVIS website, you can access the images with specific labels from a specific set (training or validation).
The annotation for LVIS is performed without prior knowledge of the categories that will be labeled. This allows the annotators to naturally uncover the long tail of categories that appear in the images.
Apart from Instance Segmentation, you might want to use the LVIS dataset for Object Detection task as well.
The LVIS dataset serves as a benchmark that helps to assess the performance of various CV models. Thus, it becomes easier to compare the models between each other and to evaluate the improvement of each given model over time.
If you want to check out the state-of-the-art models for Instance Segmentation evaluated on the LVIS dataset, you can follow this page and track changes.
The NYU-Depth V2 dataset consists of video sequences from various indoor scenes recorded by both the RGB and Depth cameras from the Microsoft Kinect. It was presented in 2012 in the paper “Indoor Segmentation and Support Inference from RGBD Images” by Silberman et al.
The dataset aimed to enable CV models to explore physical relationships between the objects in the images, possible actions that can be performed with them, and the geometric structure of the scene.
The images contain scenes of offices, stores, and rooms of houses with many occluded and unevenly lightened objects. Each object is labeled with a class and an instance number (chair1, chair2, chair3).
Overall, there are:
Among others, the NYU-Depth V2 dataset contains annotations of large planar surfaces, like floors, walls, and table tops. Hence, many objects can be interpreted in relation to those surfaces.
Objects are also classified into structural classes that reflect their physical role in the scene:
The images are divided into the labeled and raw datasets.
The file weighs approximately 428GB, so if you do not want to download the entire dataset in a single file, you can choose individual categories.
Apart from Instance Segmentation, with NYUv2, you can train and evaluate your model for the following CV tasks:
and so on.
The NYUv2 dataset serves as a benchmark that helps to assess the performance of various CV models. Thus, it becomes easier to compare the models between each other and to evaluate the improvement of each given model over time.
If you want to check out the state-of-the-art models for Instance Segmentation evaluated on the NYUv2 dataset, you can follow this page and track changes.
YouTubeVIS is a large-scale dataset for Video Instance Segmentation. It is based on the YouTube-VOS (Video Object Segmentation) dataset and was presented in 2019 in a paper by Yang et al.
Video Instance Segmentation transfers the image Instance Segmentation task to the video domain. New tasks that come with that are simultaneous detection, segmentation, and tracking of object instances in videos. The instance masks should be labeled and associated across frames to identify the same objects.
As for the 2022 version, the dataset contains:
Apart from Video Instance Segmentation, you can train your CV model to perform Video Semantic Segmentation task on the YouTubeVIS dataset as well.
The YouTubeVIS dataset serves as a benchmark that helps to assess the performance of various CV models. Thus, it becomes easier to compare the models between each other and to evaluate the improvement of each given model over time.
If you want to check out the state-of-the-art models for Instance Segmentation evaluated on the YouTubeVIS dataset, you can follow this page and track changes.
As you might know, data annotation might be a bottleneck for AI startups as the conventional labeling approach is both costly and time-consuming. Hasty’s data-centric ML platform addresses the pain and automates 90% of the work needed to build and optimize your dataset for the most advanced use cases with our self-learning assistants using AI to train AI.
The primary focus of Hasty is the Computer Vision field. Therefore, Hasty is a perfect Instance Segmentation annotation tool as it implements all the necessary instruments to help you with your Instance Segmentation task.
Let’s go through the available options step-by-step. To streamline your Instance Segmentation annotation experience, Hasty offers:
As for the annotation quality control process, Hasty has you covered with its AI Consensus Scoring feature that has a separate Instance Segmentation review option. With the help of AI CS, you can find missing labels, extra labels, and different artifacts. Also, you can better understand how a machine sees your data, which might be valuable for your annotation strategy.
When it comes to model building, Hasty’s Model Playground supports many modern neural network architectures. For Instance Segmentation, these are:
As a backbone for these architectures, Hasty offers:
As a Machine Learning metric for the Semantic Segmentation case, Hasty implements mask mean Average Precision (mask mAP).
Hasty also has a Youtube channel featuring video tutorials for each vision AI task. Here is one about Instance Segmentation.
How to streamline your Instance Segmentation labeling experience with Hasty.ai
As of today, these are the key options Hasty has for the Instance Segmentation cases. If you want a more detailed overview, please check out the further resources or book a demo to get deeper into Hasty with our help.
Automate 90% of the work, reduce your time to deployment by 40%, and replace your whole ML software stack with our platform.
Start for free Check out our services