NEW
CloudFactory launches Accelerated Annotation after acquiring Hasty.ai

Labelbox vs. Hasty

Nowadays, to effectively work with data assets, you need them to be labeled. Unfortunately, in the modern world, most of the data you might have or can acquire for free is raw, unlabeled data. Therefore, if you do not have a nicely prepared dataset for your subsequent Machine Learning (ML) project, you are likely to spend a lot of time working through the data trying to annotate it. 

At first, annotating seems like an easy task. However, in a real-life scenario, you may have to annotate a massive amount of images, texts, audio files, and other data types. You will swiftly understand that a conventional annotation approach is time-consuming, repetitive, and boring. At this point, you might come up with the idea of outsourcing your annotation task. However, outsourcing can be inefficient because such labor-intensive labeling by a third party can be inaccurate and hard to control. This means that outsourcing does not guarantee high-quality annotation as an outcome.

This especially tends to be true in more complex use cases when domain knowledge expertise is needed. For example, an advanced agricultural use case might require trained botanists to create the initial ground-truth dataset. You might also require in-house knowledge of how you should label something – one man’s weed is another man’s protected species.

You might decide to start the initial data labeling in-house, but creating an annotation team or department can be pretty costly. An experienced botanist or doctor, for example, does not come for cheap. Alternatively, you might choose to outsource the annotation task to the third-party, but there is still considerable manual labor involved.

Fortunately, there is an efficient way to make the life of annotators easier and speed up the labeling process – integrating AI into the pipeline and using AI to automate a significant portion of the work.

For our post today, we’ll look at two of the more commonly used tools in the market that offer AI-based annotation: Labelbox, the market leader in annotation solutions, and ourselves, who we can call the plucky European underdog.

Labelbox

Labelbox was founded in 2018 and is based in San Francisco, USA. As its official website states, Labelbox provides “Data curation, AI-assisted labeling, model training & diagnostics, and labeling services, all in one platform”. In other words, Labelbox is an end-to-end data platform that helps to create and manage training data.

As for funding, Labelbox has raised a total of $188.9 million over six rounds. They raised their latest funding on January 6th, 2022.

The Labelbox platform is based on four core concepts:

Labelbox promises that they will drastically speed up your annotation process and allow you to manage your annotation tasks effectively.

Hasty

Hasty is an end-to-end AI platform that lets ML engineers and domain experts deliver computer vision models faster – reducing the time to market for transformative products and services. Our end-to-end solution has built-in feedback loops across the ML pipeline.

The company was founded in January 2019 and is based in Berlin, Germany. During Hasty’s first funding round back in September 2020, we raised a total of 3.7 million dollars. 

In September 2022, Hasty merged with CloudFactory, a large cloud labor platform that connects tech teams to a network of over a million annotators. With the help of Hasty, CloudFactory now provides accelerated image annotation that allows you to train high-quality vision AI-models.

Hasty supports:

Besides standard annotation functions, Hasty’s key features are:

Criteria

Now, let’s form criteria for evaluating the tools for annotation for Computter Vision tasks. Hasty and Labelbox will be analyzed based on the following criteria:

  1. Annotation functions: manual and AI-powered tools;
  2. Quality Assurance (QA) and Quality Control (QC) – they must be there to minimize mislabeling;
  3. User experience (UI, UX, product management tools, etc.);
  4. Model experimentation – the opportunity to play around with Deep Learning models;
  5. Inference engine/Model export – whether you can easily get more information about your model and export it;
  6. APIs/Integrations – what do you need to use a tool effectively;
  7. Services – anything additional that is worth mentioning;
  8. Official documentation, availability of valuable tutorials, and simple examples – crucial things to tune in, set everything up, and get started;
  9. Regular updates – using an up-to-date tool is always a benefit;
  10. Pricing.

Labelbox Vs. Hasty

Alright, let’s compare Labelbox and Hasty following the criteria.

1. Annotation functions

Annotation functions in Labelbox

Labelbox offers various annotation tools for each supported data type, including image, video, text, audio, document, HTML, DICOM, and tiled imagery.

Check the image below to see what annotation types are supported by Labelbox.

Image source: Labelbox official documentation

As for AI-assisted annotation (AIAA), Labelbox proposes using Model-assisted labeling (MAL). It allows you to import computer-generated labels or labels created outside of Labelbox as pre-labels on some asset. These imported annotations will then appear in the labeling editor.

However, pre-labels still need to be converted into real annotations by a human labeler before you start working with them.

AI-assisted annotation (AIAA) brings AI into the annotation process. AIAA proposes using interactive AI tools to semi-automate or even fully automate the labeling process. For example, you can use a small amount of manually annotated images to train a Deep Learning (DL) model that will predict annotations for you. Then you can check and correct the model’s predictions. This approach is called pre-labeling and can be helpful to get you started.

What Labelbox doesn’t offer is any UI for showing predictions to annotators. This pre-label approach can work well only when you already have an excellent model and want to enhance what you already have. However, it can be hard to use if your model has not yet achieved high accuracy.

In those cases, you might cause more harm than good. By importing annotations from an immature model, your annotators will have to double-check every single annotation. When working with more complex use cases such as Object Detection or Segmentation, annotators often have to edit the predictions as they are seldom of good quality.

Image source: Labelbox official documentation

Annotation functions in Hasty

Hasty focuses on Computer Vision problems. That is why we provide various manual annotation tools for image labeling, including:

For your convenience, we also support semi-automated tools that use pre-trained AI models to speed up annotation. You can use them from the very start of your project. These tools are:

Hasty also supports annotation using Apple pencils, an often requested feature from experts like botanists and doctors who are more comfortable annotating images in a more tactile approach.

Where the real difference lies, however, is in automating the annotation process. Hasty’s core idea is to train AI-assistants that learn on human annotations and improve continuously. The benefits of this approach are four-fold.

1. You get automation out-of-the-box
When using Hasty, you don’t need to have your model available. We handle both the training and the inferencing for you. This allows you to focus on creating your ground-truth data without having to wrestle with Python SDKs and building out a production-level data pipeline. Furthermore, this minimizes the need for engineering in the early stages of a project – so your tech-team can focus on training models and building software.

2. You get custom models trained on your data
As most ML engineers know, models trained on publicly available data seldom perform well on custom, complex own use cases. For that, you need a model trained on your data. This is the approach we use in Hasty. For our AI-assistance, we train our models on your data, giving you results no matter what the use case. These models are private and accessible to you and your team only.

3. The more you annotate, the more automation you get
We retrain the models every time there is a 20% increase in labeled data. To be precise, this means you get a new AI model to help you after 10, 12, 15, 18 labeled images, and so on. With this approach, you will get higher and higher degrees of automation as you annotate your data. Most can automate 50% of their annotation after 200 images or 1000 annotations (whatever comes first). With 1000 images or 5000 annotations, that automation should reach around 75-95% percent, depending on the complexity of your project.

An example of Hasty’s AI-assistants

4. You get constant feedback on what models see
In Hasty’s UI, you are in control of the training process. We’ve built it with a human-in-the-loop approach in mind. In practice, that means that you interact with the models we provide and then give feedback on what you want to keep and what you want to change or reject. The flow is easy:

An example of suggested annotations for Instance Segmentation task

The additional benefit is that you see what the model understands and where it struggles. Let’s say we want to detect and classify adults and children in the Medieval paintings. When using the Instance Segmentation assistant, you might see the following picture:

We can see that the model detected well 2 adults and 1 child, but it classified 2 children as adults. Using this information, you can adapt your annotation strategy on the fly and add more data where the model struggles (in our case, we should annotate more images with Children class). This early feedback saves you from over-annotating classes that your model already understands and allows you to focus on complex edge cases and classes.

5. Completely automated labeling
When you have an annotation model that works, we also allow you to label the remaining images in your dataset with the click of a button using Automated Labeling feature. Thus, you can save your time and supercharge your annotation efforts.

Summary

Labelbox has a lot of good manual annotation tools and supports many different types of annotation. For automation and active learning, they have a Python SDK that you can use to build Labelbox into your pipeline. However, they offer nothing out-of-the-box.

Hasty offers the same possibilities for manual annotation but comes with a more convenient approach to annotation automation. You get custom models trained on your data within the platform as well as a UI to interact with them.

Which one to choose here depends on your needs.

2. Quality Assurance (QA) and Quality Control (QC)

QA and QC in Labelbox

For the most part, QA reviews are done manually. Still, Labelbox gives you some tools to ease the pain of manual quality control.

To fix the ground-truth errors in labels made by humans, Labelbox lets you filter the rows where model predictions and ground truth labels disagree. To start with this option, you first have to upload model predictions and upload model metrics on your labeled data.

Surface mispredictions on images
Source
The highlighted label seems out of distribution
Source

QA and QC in Hasty

For Hasty, QA and QC come with automation. There are manual quality control features as well, but the real timesaver here is our AI-powered quality assurance feature called AI Consensus Scoring (AI CS).

AI CS uses a variation of AI models to find potential errors in the dataset. During the run, the model checks if it is confident in the existing annotations. The results are presented in the UI together with suggestions for improvement. Human annotators can further accept, reject, or edit suggested annotations.

The main gains here are:

  1. You remove the need for multiple annotators labeling the same image, which saves you time, annotation budget, and minimizes redundancy.
  2. You reduce the manual supervision work of checking and correcting errors, so you save on your QA/QC budget. For more complex projects, QA/QC checks often take up to 30-50% of the total data creation cost, so it is quite a big gain.
An example of AI CS run results
An example of an error: missing label

Summary

Both tools offer quality control features, but how they implement them is very different.

Which one to use depends on your attitude towards AI-powered QA/QC. If you feel confident in it, Hasty is your way to go. If you prefer to stick to conventional approach, Labelbox offers more.

3. User experience and access control

UI/UX in Labelbox

Overall, Labelbox is a convenient tool to work with. It has an excellent user-friendly minimalistic interface that is easy to navigate through. You can use shortcuts to make your life easier and annotate pretty quickly (unfortunately, shortcuts are not customizable). Also, through Labelbox, you can manage your team and review their work. Moreover, Labelbox provides many statistics on your performance to get valuable insight out of it.

Source

In essence, they offer a pretty standard interface where you get both a good annotation environment and excellent data management and team functionality.

For access control, Labelbox allows you to set up user controls on who can access what. In total, five different user roles can be assigned, with 12 additional permissions. The user roles cannot be customized.

Additionally, Labelbox allows you to customize the UI yourself. However, the documentation hasn’t been updated for two years now, so some functionality might be missing compared to the web application.

UI/UX in Hasty

As for Hasty, it offers excellent workflow management, starting with the Project dashboard. The dashboard gives you an overview of your project status.

Project dashboard

You can also manage your data efficiently using the File manager.

File manager

Shortcuts are also available for every functionality in Hasty. From there, the offering is relatively similar to Labelbox regarding UI and UX. The annotation view and other essential features are pretty similar.

For access control, Hasty offers completely customizable user roles and permissions. In total, you can set over 20 different permissions for every role.

Users and roles

Summary

Both are fairly good in terms of user experience. However, the big difference is that Hasty allows for the customization of access control, whereas Labelbox does not.

4. Model experimentation

Labelbox

Under its active learning flag, you can find some functionality for model diagnostics. Here, you can use their Python SDK to send data on how your model performs to Labelbox. Then, you can use their AI to filter the data and see where the model struggles (has least confident predictions).

Also, if you want to use a third-party service to train your model and keep experiment metrics, there are some issues with Labelbox. Their export format does not play too nicely with existing solutions like Weights and Biases, and Neptune.

Hasty

Hasty offers in-platform model diagnostics with visualizations of what the model sees in the annotation view. Beyond that, through the Model Playground feature, Hasty allows you to create your models on top of the data you have in Hasty. This works similarly to how you would do it yourself if you coded a model, but without the need to code. Instead, you just input values for all critical parameters, and then we handle the rest.

The advantages of this approach are:

1. A source-of-truth for your models
Instead of having many different models everywhere, from local computers to cloud storage, you aggregate all models in one place. This gives you a source-of-truth and increased transparency in the organization.

2. Quicker model iterations
You don’t have to preprocess your data or transfer it to the training environment. You can simply launch experiments and quickly see what parameters configuration works better on your data.

3. Using successful models for annotation
With our Model Playground, you can swap out any and all of our default models with your own. For that, you need two clicks. This allows you to have custom models trained by you, helping in your annotation efforts.

4. Managed MLOps
You do not have to handle all the difficulties that come with MLOps. Hasty takes care of it for you – saving your time and money.To date, Model Playground has support for all types of use cases (all types of Segmentation, Object Detection, Image classification, Attribute Prediction) and supports the following architectures:

You can control every aspect of the configuration for every architecture, just as if you had trained the model yourself.

Summary

Although Labelbox has some functionality, Hasty is the clear winner if you want an integrated data and model building tool.

Suppose you are looking for a dedicated data annotation tool and prefer a third party for model experimentation and building. In that case, Hasty also has a slight advantage as it supports data exports in formats, which most third-party tools can handle, including:

5. Inference engine/Model export

Summary

As Labelbox does not support this, it’s a slightly one-sided comparison. 

In Hasty, you can easily export your model when you are happy with its performance and want to move it into production.

You can either export the model (we support ONNX and PyScript exports) or use our infrastructure to serve your model via API.

6. APIs/Integrations

Labelbox

Labelbox offers a GRAPHQL-based API. It provides a seemingly extensive API that covers most functionality that exists in the UI. There’s also a Python SDK. Although lighter in functionality compared with the API, it does offer an integration option.

There is one existing integration we could find that should work out-of-the-box, which is the one for Diffgram.

Hasty

Hasty offers a RUST API that has 100% coverage compared to what’s available in the UI. There’s also a Python wrapper for the API.

As with Labelbox, it is relatively straightforward to build integrations to integrate Hasty with your pipeline.

Summary

Both products have similar offerings here with API access and Python libraries. Both allow you to integrate the product in a larger flow.

The main difference is that Labelbox has an existing integration with Diffgram out-of-the-box.

7. Services

Labelbox

With the Pro or Enterprise pricing plans, Labelbox can provide professional labeling teams – handling annotation and quality control for you. Also, Labelbox has a labeling operations team that “can support you throughout building your labeling pipeline, designing your ontologies, and training labeling teams.” 

Moreover, Labelbox is free for academic use, so if you conduct non-commercial research, you might want to check it out.

Hasty + CloudFactory

Similar to Labelbox, Hasty can provide you with a fully managed solution. In our parlance, we call it “Ground-truth-as-a-service.” We combine both workforce and software into one offering – something that enterprise customers often request. Since Hasty has merged with CloudFactory, we also provide access to a wide in-house network of professional annotators. Our labeling team uses automated tools for annotation, and hence is able to deliver results 5x faster than services that rely on manual labeling.

For training and technical support, you can always contact our Customer Support team.

Hasty is also free for academic purposes – just contact [email protected] if you’re interested.

Summary

In terms of service offerings, both Hasty and Labelbox are fairly similar. The major difference is that Hasty can also consult you on the ML engineering side, helping you build or debug models. 

8. Official documentation and availability of valuable tutorials

Summary

Labelbox’s official documentation seems complete and easy to navigate through. It covers all the major aspects of the tool and provides text and video tutorials on working with Labelbox. Moreover, Labelbox has a YouTube channel featuring guides on how to work with tool. This should make it easy for newcomers to tune in and start working with Labelbox.

Hasty’s documentation consists of three major components:

Thus, Hasty has comprehensive documentation with many examples that will answer all of your questions. We also have our YouTube channel with detailed guides on Hasty’s tools.

9. Regular updates

Summary

Both Labelbox and Hasty are maintained, swiftly developing, and improving their functionalities. Their documentation and platforms are regularly updated. Moreover, some of the features these platforms have were added recently and still are in beta. So, you should probably stick around and see how the tools will change when these improvements hit release.

10. Pricing

Labelbox

Labelbox does not have transparent pricing, so it’s hard to say what it will cost you. They provide a tool that allows you to estimate costs. However, from what we heard from users switching to Hasty, it tends to be quite expensive. This makes sense – it’s a mature product for mature teams. 

As the official website states, “your unit cost decreases as your volume increases“. This means that the deal becomes appealing if you have large volumes of data.

Hasty

In comparison, Hasty is entirely transparent with pricing. We price you only on your usage of our automation features. The more automation you use, the more you pay. From our perspective, this is fair as you only pay when we can help you do something faster and better than a manual workforce.

Hasty is also free to try – including all our AI features.

TL;DR

In short, both Labelbox and Hasty are good choices if you want to create or enhance your data asset. In terms of manual annotation, both tools are pretty similar. However, beyond manual annotation, there are a lot of key differences. 

Labelbox is very good at supporting manual workflows but doesn’t offer anything out-of-the-box for annotation automation. For that, you have to bring your model and send predictions to the app, meaning you need to spend engineering time to make everything work. 

With Hasty, you get automation out-of-the-box without integrating the tool in your data pipeline.

To sum up, for teams that have mature models already, Labelbox might be preferable. For those that are building new data assets, Hasty will give more automation faster.

In terms of quality control, Labelbox is heavily dependent on manual workflows. There are features for gold standards and consensus scoring. If you prefer tried-and-true QA approaches, Labelbox has the upper hand. Hasty, on the other hand, offers AI-powered quality control. This can mean a drastic reduction in time and budget spent on QA.

In terms of inference engine/model export, Labelbox does not support these options, while in Hasty, you can pretty easily export your model and move it into production.

In terms of labeling services, both Labelbox and Hasty provide them. However, the combination of workforce and AI-powered tool under one roof allows Hasty to offer labeling speed and quality otherwise not available. Moreover, the automation of annotation reduces the total project costs for tech teams and businesses.

For user experience, API, access control, and services, both tools are relatively similar. There are only some slight differences:

Both platforms provide extensive documentation and regular updates, too.

The real difference is that Hasty is an end-to-end tool where you can train, export, and deploy models while creating training data. If you are looking for a complete tool for your first project, Hasty might be the answer. Also, if you want to streamline your tool pipeline from 4-5 different ones to just one solution, Hasty can help.

Buy Labelbox if…

Buy Hasty if…

Comparison between Hasty vs. Labelbox

Resources

  1. Labelbox official website
  2. Labelbox official documentation
  3. Labelbox Python SDK
  4. Hasty.ai official website
  5. Hasty.ai official documentation
  6. Hasty.ai official API documentation
  7. Hasty.ai Vision AI wiki

Appendix:

What is AI-assisted annotation?

AI-assisted annotation (AIAA) brings AI into the annotation process. AIAA proposes using interactive AI tools to semi-automate or even fully automate the labeling process. 

For example, you can use a small amount of manually annotated images to train a Deep Learning (DL) model that will auto-complete the task for you. In this case, you will only have to check and correct the model’s prediction. This approach is called pre-labeling and can be helpful to get you started. However, there’s always a danger in using immature models to annotate – if your model is not working that well, you might end up “scaling” wrong annotations, which can take longer to fix than the time you initially saved.

Another approach we use in Hasty, is to train the model very early in the annotation process (after ten images) but then leave it to the user what to accept and reject. This more interactive workflow has some big advantages. For example, by constantly giving you feedback on model performance, you can easily understand what is working and what is not working, giving you a great idea of how the model is performing and changing. This, in turn, allows you to adapt your annotation strategy on the fly – understanding what you need more of and what the model already understands. Of course, this human-in-the-loop approach doesn’t scale as nicely at the start as pre-labeling, but with more human input at the beginning, you run less risk of scaling errors while improving the model in quicker iteration steps.

In the long run, the model will improve using both approaches as you will have more training samples as the task proceeds. That means higher and higher degrees of automation and less need for human annotation – essentially, the workflow becomes one of supervision instead.

Compared to manual data labeling, AIAA is highly efficient. Moreover, it reduces human error from fatigue and repetitive work (error rates for more extensive projects tend to be somewhere between 6-12%).

Although some teams still use open-source tools for the more straightforward vision AI use cases (image tagging, for example), as soon as you start using Object Detection or Instance Segmentation, the time savings from automation becomes invaluable.

Keep reading

Fix your data bottleneck

80% of vision AI teams don’t make it to production because of bad or insufficient data. Hasty solves removes that risk.