All annotation is now free in Hasty

Active Learning in Hasty

As the application of the different Machine Learning tools gets wider, data assets are created at an unprecedented rate. But the majority of these data assets are raw, and raw data is not useful for the ML models. For the machine to interpret and process certain data assets, they must be annotated.

All Supervised Machine Learning algorithms require labeled data (annotations). With these labels, the model trains to do a certain task. Unfortunately, creating the labeled data might be a very time- and resource-consuming task depending upon the type of data used as the input. For example, images and videos might be challenging to label and might take days and weeks, especially if you need to annotate tens of thousands of them. For example, a self-driving vehicle algorithm might easily require having an incredible amount of video clips and images depicting scenes of different environments. Therefore, labeling each and every data asset might seem like a challenging and somewhat far-fetched task.

What is Active Learning?

This is where the concept of Semi-Supervised Machine Learning comes into the mix. In a Semi-Supervised setting, a small amount of labeled data is used in conjunction with many unlabeled data without compromising the accuracy of the model. Active Learning is one of the use cases of Semi-Supervised Machine Learning which is implemented by Hasty. The ultimate goal of Active Learning is to create the best model possible out of the least amount of annotated data. For this, some heuristics are used to judge which samples of unlabelled data should be labeled to yield the highest performance increase in the model.

Active Learning workflow

Before diving into heuristics, let's first understand the basic Active Learning workflow. It is as follows:

  1. You start with an unlabeled pool of data;

  2. You use a sampling technique (heuristic) that takes the unlabeled data as input and returns a rank. The image that gets the highest rank is chosen for labeling;

  3. This unlabeled data asset is annotated by a human and then put into the training set, and the ML model is trained (or retrained);

  4. Repeat steps 1-3 until you achieve the desired model's performance.

The most crucial step listed above is ranking the data and finding the next data asset for labeling. It is done with the help of some heuristics.

Learn more about the heuristics:

Removing the risk from vision AI.

Only 13% of vision AI projects make it to production, with Hasty we boost that number to 100%.