AI Consensus Scoring

For AI Consensus Scoring to produce useful results, you should first have a mature AI model helping you with annotation automation. If the AI assistant of your choice is not yet mature, running AI CS will give you many false positives.

AI Consensus Scoring (AI CS) is our AI-powered quality assurance feature. In short, it uses a variation of AI models to find potential errors in your dataset. Those errors are then presented to you. From there, you can do one of three things:

  • Accept the suggestion
  • Reject the suggestion
  • Or, if the error was correctly identified but the suggestion was less than optimal, edit the annotation manually

We built this feature as it became clear to us how much time annotation teams spend on quality assurance. In some cases, quality assurance stood for 40% of all time spent on an annotation project. What was taking time was not necessarily correcting errors, but finding them. Although there are fewer smart solutions out there like consensus scoring, oftentimes that just gives you a sample of potential issues, so annotation teams have to go through and review every single image by hand.

We wanted to change that by finding all the potential issues for teams, so they can focus on fixing errors and not spending hours finding them first.

How to access AI Consensus Scoring

You can access AI Consensus Scoring in the burger menu on the left or in the project dashboard.

Creating a run

A run, or a sweep if you prefer, is when we use AI to QA check all annotations in a project. Without going into complex details, we see how confident our model is in the existing annotations made by humans.

To create a new run, click the "Create new run" button.

This will open up the "Create a new run" modal.

In the modal, give the run a name. A good idea is to give it a name you can recall in the future, maybe using the date of the run or something similar.

Then, select what type of run you want to make. At the moment, you can choose from:

  • Classification - this type of run checks the class of annotations
  • Object - this checks all bounding boxes in a project
  • Instance - this checks all polygons and masks in a project that has an Object class

After deciding on what type of run you want to make, you need to select which data you want to check. You can do so in two ways. First, you can select specific datasets that you want to check. Then, you can select if you want the model to only look at images with a specific image status.

Finally, you have two additional options. Those are:

  • Retrain model - this retrains the model we use from scratch. Its main use is if you've added a substantial amount of new data since the last AI CS run
  • Preview mode - If you switch on this option, you will only be able to see 10% of the potential errors we found. You will also only be charged 10% of the cost. If you find the results to be good, you can unlock all potential errors later.

After you've specified all options, click on "Create" to start the run.

After having done so, you will go back to the overview. The run will be visible and its status will be "initialized".

It might take some time until results are ready - it can take 10 minutes for smaller runs to hours if you are checking tens of thousands of images. When the run has been completed, you will receive a notification in the top right of the screen.


When the run has been completed, you can click on it. When doing so, you will go to the summary screen:

Here you get an overview of how many errors we found, how many of those errors you will be able to see in the preview (if you checked that box), and the error percentage in your project.

You will also see some graphs that will give you a more detailed overview of what we managed to find. The graphs are dependent on what type of run you've made. Here, we made a classification run so we can see errors per class:

By clicking on "Confusion matrix" - we can also see another graph. It looks like this:

Here we can see where the model became confused. For example, we see that there are 10 instances where the model thinks that annotations that have the "player" class should be reclassified to "Referee".

The next graph we can see is "Histogram"

Here we see how confident the model is with the predictions it made when identifying potential errors. For "Referee", the confidence is quite high.

Results page

By clicking "See results" at the bottom of the screen, or in the left menu, we go to the results page.

Here, we can see all the different potential errors the model found.

To go through the layout, we can start at the top-left corner. Here you have an opacity controller where you can control how strongly the annotations should be shown on the image.

In the top-right corner, you can adjust how many potential errors you want to see per screen. You can also set up different filters. Click on it to open up the "Filters" side panel.

Filtering can be very helpful when you have a large project with many possible errors

Underneath the filter button, you have the actual results. Here you can see the potential errors our model found for you.

The result card has two pieces of important information. First, you see what the predicted class is, and how confident the model is in that prediction. Secondly, you see what the current class is, and the confidence of the model for that specific class.

Please note that different types of runs show different information here

You also have two action buttons - a checkmark to accept the prediction, or a cross to reject it. By clicking on the checkmark, the class will be updated for the displayed annotation.

If you click on the image, you will get some more detail on the annotation and what our model is "seeing".

Here we can get some metadata on when the annotation was created and who created it. We can also see a table that tells us how confident the model was in its predictions for every class in the project.

If you want to change the class manually, you can do so by using the dropdown at the top.

If we click on the image again, we will go to the annotation environment for that specific image with the annotation pre-selected - making it easier for you to edit it manually. This is not necessary when doing classification, but can be helpful for object detection and instance segmentation runs.

Using AI Consensus Scoring to improve your model

A side-effect of AI CS is that you get a better understanding of how the model processes your images. As an example, when looking at our basketball project, we can see the following:

Here, we see that the model is confused by the black and white uniforms of one of the teams in our dataset. This information can be valuable, as it gives us an insight into how to extend our dataset.

As an example, we might say that we need to add more annotations for teams with black and white uniforms as well as annotating more referees for our model to be more accurate.

Last updated on Aug 18, 2022

Removing the risk from vision AI.

Only 13% of vision AI projects make it to production, with Hasty we boost that number to 100%.