One of the heuristics used to rank the unlabeled data is Entropy. Mathematically Entropy is defined as:
Entropy measures the "usefulness" of a data asset. In simple words, if an event that is highly likely to occur occurs, it is not surprising at all. Imagine having a picture of a dog on plain white background. In this case, it will not be a surprise if the classification algorithm makes a correct prediction. Such an event does not carry a lot of valuable information. On the other hand, if you come across a data asset that might result in the algorithm's underperformance (for example, an image of a dog that looks like a muffin), you can extract valuable information from this case. So, it might be reasonable to label such a "useful" edge case to boost your model's performance.
The Entropy heuristic calculates the entropy of the images over different classes. Data samples with higher entropy are ranked higher. With Entropy, you can annotate more images that are very diverse. Such an approach is quite beneficial for the model's performance.
Learn more about the other heuristics:
Only 13% of vision AI projects make it to production, with Hasty we boost that number to 100%.