What’s important to note for this comparison, in particular, is that Sagemaker is huge. We have therefore concentrated on the core features on offer.
To make our lives even more complicated, it’s unclear what parts of the Sagemaker ecosystem support vision AI use cases. For example, Sagemaker Clarify is their bias detector, but we couldn’t find any mention of it working with vision AI. Therefore, we left it off this comparison.
As this comparison is quite long, we will start with our summary:
You are an AWS organization already and don’t mind spending engineering time setting up workflows and integrations. Or if you want advanced solutions for model deployment and hosting.
In terms of value offering, Sagemaker has:
They deliver on most of their promises. However, Sagemaker is not the most accessible platform to use. To give an example, if you wanted to get an AWS Sagemaker certification, you would have to study for at least a month.
You prioritize development speed and do not want to spend weeks (or even months) dealing with MLOps. Hasty also comes with built-in feedback loops between data and model so that you can work in a more agile manner. Additionally, Hasty is much easier to use than Sagemaker.
Hasty’s value offering is:
Hasty uses AI to automate much of your Vision AI development process, letting you build and deploy accurate Vision AI models in 1/10th the time required by other tools and thus keep product development on schedule.
Hasty takes an agile, data-centric approach to machine learning, so you work on your model and data in tandem. This allows for rapid feedback on what adjustments need to be made to your data and model and lets you iterate faster towards a working model ready for production.
Although both platforms want to assist users throughout the ML lifecycle, it is different how they go about it. As already mentioned, Sagemaker is built mainly for users that prefer SDKs and APIs. Hasty has a more ease-of-use approach with all functionalities available and accessible through the UI.
Hasty lets you keep your models and data private and secure -- so that your Vision AI-powered product remains differentiated in the market.
Using Hasty, you will have a better idea of what you will pay for up-front than with Sagemaker.
Before we jump into the details, let’s look at the big picture first.
Amazon Web Services is, of course, a leading general-purpose cloud platform. These range from foundational capabilities like storage (S3), compute (EC2), and databases (RDS) to more esoteric offerings like Ground Station (satellites-as-a-service). Sagemaker is their ML service offering with “fully managed infrastructure, tools, and workflows.” It does not cover all their ML offerings, but we think it’s fair to say that all their ML offerings for professional teams can be found under this umbrella term.
Coverage is wide and covers the whole ML lifecycle. However, not all Sagemaker features are compatible with computer vision use cases. Amazon’s bread and butter when it comes to ML are working with tabular data, and that shows in their offerings with some services (Clarify, Canvas) being only available when working with this type of data.
In short, there’s plenty to choose from in terms of features, and there are bucketloads of APIs and SDKs for you to dig into as well.
Today, Sagemaker in particular, and AWS in general, is one of the most commonly used platforms for machine learning.
Hasty is an agile machine learning platform that lets machine learning engineers and domain experts deliver computer vision models faster, due to built-in feedback loops across the ML pipeline, reducing the time to market for applications powered by Vision AI.
The company was founded in January 2019 in Berlin, Germany. During Hasty’s first funding round back in September 2020, it raised a total of 3.7 million dollars.
Like Sagemaker, Hasty has functionality across the ML lifecycle. Unlike Sagemaker, it is built only for vision AI use cases.
Today, Hasty has the following core features:
For annotation, we think there are five different criteria to consider.
What workforce do I want to have to work on my images?
How much control do I want to have over auto-labeling features?
Do I think getting feedback while annotating will be a significant advantage?
Do I want to use customized or my own model(s) for auto-labeling?
Annotation in AWS comes in the form of their AWS Sagemaker GroundTruth feature. There is also an additional service offering called AWS Sagemaker GroundTruth Plus. To separate the two, think of GroundTruth as an annotation functionality in the larger Sagemaker ecosystem and GroundTruth Plus as the premium service they offer to annotate data for you using a professional workforce.
We’ll focus on the original SageMaker functionality and then do a quick summary of what you can get extra with the Plus offering at the end.
So, what is Sagemaker GroundTruth? It is a couple of different UIs and workflows purpose-built for various annotation tasks at its core. For vision AI, they offer support for:
The standard GroundTruth workflow is the following. You connect your S3 bucket. You set up your project. You pick your workforce (Mechanical Turks or your own), then select which annotation UI you want to use above.
This is all fine, but most users today don’t want to annotate complete datasets by hand. They want annotation automation (sometimes called auto-labeling) where annotators can use AI to speed up the labeling process. Here, GroundTruth offers a mixed bag.
They offer automated labeling but only provide these automation features for image classification, semantic segmentation, and object detection—not instance or panoptic segmentation. They take human annotations, train an AI model on those labels, and then automatically label unlabelled images using that model. Data that the model is unsure about gets sent back to human labelers. They label this data, the model retrains, and (hopefully) get better. Then, the process starts over. This approach is, technically, the same we use in Hasty.
The main difference here is control. Amazon doesn’t offer any. They handle everything for you. Everything is hardcoded, so you can’t change the automation. This is quite strange as different projects have different quality requirements and require different thresholds. Another potential issue is the low threshold they have. A 60% IoU (their standard for object detection) means the predicted and actual label can differ by as much as 40%, but they still consider that a good enough prediction for their auto-labeling.
If you are working on a complicated dataset or a dataset requiring exact labeling, the low thresholds set can cause more harm than good. Having to go through and correct imperfectly automatically generated labels often take longer than creating them manually.
On top of what’s outlined above, there’s also GroundTruth Plus. This fully-managed service from AWS sets up and annotates a project for you. There are some exciting claims here - for example, they say that they use more automation than what is available in the GroundTruth product and can reduce labeling costs by 40%.
However, it is unknown how they do so or what they charge for a project. There’s also a claim about using an expert workforce which is intriguing. This can be an alternative if you are looking for a fully-managed solution, but many others have similar value offerings and more transparent methods.
Hasty support the following annotation approaches:
The standard flow is quite similar. You set up your project, upload your images either directly to the tool or by connecting your storage bucket, and then you start annotating.
The strength of Hasty is its auto-labeling features. In Hasty, they are called Label Assistants. This AI-assisted annotation trains automatically on your data without you needing to do anything other than label the first ten images. The models keep improving as you add more data, giving you better and better automation. With 1000 images or 5000 annotations (whatever comes first), most users see 85-95% automation percentages. From there, as you add even more data, you can go up to 100%.
What’s important to note is that all models only train on your data using few-shot learning. So there are no generalized models used. This means Hasty’s AI assistants will work for every use case, also the edge-cases for which no public data exists Having this custom-tailored approach is unique to Hasty. We also don’t charge you for re-training the models behind the assistants.
Hasty also comes with built-in feedback loops. You can press a button and get suggestions from an AI model of possible annotations when you annotate. Then, you can accept suggestions by clicking them individually or pressing Enter to accept all of them.
This approach is essential for getting auto-labeling right, as you don’t have to go back and edit bad suggestions created by an AI but instead can cherry-pick good ones only, which saves a lot of time on editing. It also gives you feedback on how AI models view your data, what it understands, and where it struggles.
This is essential knowledge for your ML engineers and data scientists. First, it gives them feedback on how AI models see the data without training them themselves. With this, they can adjust the labeling strategy on the fly.
Secondly, as you can replace Hasty's default AI models with your own, your ML team will get feedback from the data workers on how their models perform on actual data. This can help you debug and improve the model, allowing you to enhance model performance before it is even in production.
Quality control is probably the most overlooked aspect of machine learning today. However, it is also one of the most time-intensive. For some companies with high-quality requirements, QC can take up 20-30% of the total budget for a project.
To compare Hasty with AWS Sagemaker, here we have to make it visual:
Out of all categories, this is where Hasty is outperforming Sagemaker the most. Still, some will prefer tried-and-tested approaches and might opt for using Sagemaker.
For labels, AWS Sagemaker offers QA features in GroundTruth. They call it image label verification. It is another label job where workers can indicate if the existing annotations are correct or rate label quality. This manual approach is true and tested and increases overhead for labeling projects.
To that, you can add a label adjustment function, tasking workers to correct any annotations they see as erroneous.
Secondly, also to be found in GroundTruth, they offer what they call annotation consolidation. You might be more familiar with the more used term consensus scoring. Essentially, they compare labels made on the same image and then, through the wisdom of the crowd, decide which annotations are most accurate. An excellent addition is that AWS Sagemaker goes beyond most manual consensus scoring features by consolidation labels for you. However, this doesn’t solve the redundancy problem. AWS suggests that 3-5 workers annotate the same data item for most use cases.
You can also create your custom annotation consolidation function if you are inclined.
Today, that is it in terms of quality control of data in Sagemaker.
Automates QA using AI, making the process up to 35x faster and more efficient compared with Sagemaker.
It gives you an overview of common problems in your data so you can adapt your labeling strategy.
Removes the need for built-in redundancies with 3-5 labelers labeling the same image.
It improves the more data you add, making QA automation scalable.
It doesn’t support currently popular QA workflows like manual consensus scoring.
Requires some initial data (100 annotations per class) for automation to work.
In Hasty, you do quality assurance using AI automation. There are manual quality control features, but the real timesaver here is Hasty's AI-powered quality assurance feature called Error Finder.
Their AI Consensus Scoring feature uses a variation of AI models to find potential errors in the dataset. So, an AI model checks if it is confident in the existing annotations made by humans. The results are presented in the UI for further human analysis.
The QA models also get better the more data you add, giving you a higher degree of automation as you scale. This unique approach makes QA scalable as it breaks the correlation between data labeled and the time needed for QA.
You also get an overview of what problems exist and can easily check if you have common problems such as class distribution imbalances.
To decide what to use, the following decision criteria can be of help:
How much do I prioritize ease of use?
How much do I care about feedback loops?
How much interaction do I have between my data workers and my ML team?
So, AWS is not known for beautiful interfaces or even easy-to-use interfaces. That is still true here, but to be fair to AWS, they have collected most Sagemaker functionality into their console and their Studio UI. Sagemaker Studio UI is probably the closest you get to a classic software product. It comes with the following functionality:
All of this sounds quite decent. So if you are OK using AWS Sagemaker and Jupyter notebooks, there’s ease-of-use to be had here. However, as outlined above, very few things are simple in Sagemaker. You will have to write custom code. You will have to build integrations and connect your model to other AWS services. And you shouldn’t expect everything you need out of the box.
Hasty is the complete opposite of Sagemaker in that it prioritizes ease of use more. One of the core philosophies behind Hasty is to minimize the amount of code and effort needed to develop and maintain ML models and data by users through the use of a unified user experience. To give an example, in Hasty, you can:
You can do all of the above without writing a single line of code. However, for those who want it, there’s an API that mirrors all functionality in the UI 1:1.
Furthermore, as we already touched on, Hasty comes with built-in feedback loops.
You can visually see predictions from Hasty's, or your, AI models that allow you to do annotations faster and get insight into how the model performs on your data.
You can see any potential errors in aggregate and every mistake we found using AI. This way, you can find patterns in what’s going wrong with your project and what is working. So once again, you are getting insight - not only automation.
You can visualize how the model performed on your dataset and go beyond the usual ML metrics. Again, this gives you insight into what works and what doesn’t.
This becomes truly powerful because it brings your data workforce and data scientists closer together. In Hasty, data scientists can deploy any model trained in Hasty to the annotation environment in 4 clicks. This gives your data workforce better AI automation and allows them to check how the model is working on actual data.
They can then feedback this information to your data scientists, who can train a better model. And your data scientists can inform your data workers about problems they see with the data when training the model (class imbalances, for example), allowing the data team to prioritize what to label next—all of this without any code being written.
Here, the main concern is about implementation. Both offer granular user control, but AWS can be tricky to set up. To benchmark your expectations, you can look at this article from AWS. If that looks manageable to you and something you would be comfortable with, Sagemaker is a viable option. If you are not convinced, a product like Hasty might be preferable.
As for all of AWS, Sagemaker makes use of IAM. For those of you that don’t know, it’s AWS Identity and Access Management (IAM) that helps an administrator securely control access to AWS resources. IAM administrators control who can be authenticated (signed in) and authorized (have permissions) to use SageMaker resources. IAM is very powerful and very customizable. It is also a bit of a hassle to set up.
For access control, Hasty offers completely customizable user roles and permissions. In total, you can set over 20 different permissions for every role, create and remove roles as you see fit, and create specific service accounts for any automation or integration you want to build yourself. The big difference here is that you can do it directly in the UI, which is a time-saver.
For model building, we propose to have the following decision criteria:
What’s essential for me? Complete control but having to code everything myself or having a managed environment but less control in pre- and post-processing?
Somewhat related, do I prioritize speed or control?
How much engineering support do I require?
So here, we have a lot of different options available for you. We’ll not go into depth into all of them but focus on what you can use for vision AI use cases. This limits the field considerably.
You can use two available products in AWS Sagemaker to build models. Those are:
This new service (in beta) is essentially JupyterLab with built-in integrations for AWS computing services. Meaning you can code up your models in a notebook environment and then run it on AWS hardware. You get some prepackaged ML frameworks to help you develop faster too. This is a valid option if you are looking for notebook development and are an AWS company.
JumpStart is a library of pre-built models applied to your use case. This offering is mainly for beginners but can be worth exploring if you are new to the field and want to perform a use case they support. For computer vision, they only support defect detection of products.
AWS Sagemaker also has support for automated hyperparameter tuning. This is a brute-force approach to machine learning where you create many versions of a model you want to train with different parameters. After running enough training runs, you should find the best possible model parameter setup. This is pretty advanced functionality and is quite challenging to use if you’re not very experienced with Sagemaker, but it can be good to know it exists.
Beyond the services mentioned above, AWS supports integrating the building process and performing it on AWS hardware. For this, they support most of the standard ML frameworks that exist today.
Once again, the core job-to-be-done is the same in Hasty, but how you go about it is different. In Hasty, you have a fully-managed model-building environment that we call Model Playground. Here, you can create model experiments without having to write any code. However, what is a bit different from most no-code environments is that you still have many options for training your model (you can set over 100 parameters, to be exact). Our ML team is quite advanced themselves and thinks that simplifying the model building process to a few clicks severely limits the usefulness of any no-code solution. So we give you control of all the parameters for any supported architecture.
However, we do offer value for teams that want to move quickly. The main advantages of Model Playground are two-fold. First, you do not need to handle MLOps yourself. This, we manage for you. With that, you can focus on finding the correct config and not spending time debugging and updating your training pipeline. Secondly, as the data is already in Hasty, you don’t need to build data loaders or do any data wrangling beforehand. Simply select the information you want to use, and we handle the rest.
To summarize, our aim with Model Playground is to give data scientists the tool they need to quickly run many experiments at once on top of their data to get answers on what works and what doesn’t without spending hours coding and debugging first.
Also, we’re adding many more components and features to it every week. The following prominent features will be notebooks for further customizability and model architectures for video tracking. If you’re interested, you can contact [email protected] to be part of the beta rollout.
For model tracking, the following decision criteria can lead you to make an informed decision:
How much work do I want to put in to get the monitoring I need?
How important is it for me to compare model experiments over time?
How important is it for me to see visualize the predictions made by the experiment on my actual data?
We conclude that the model tracking in Sagemaker is created with tabular data in mind first and foremost. This means that it doesn’t have some functionality you might expect for computer vision. Hasty is the better option if tracking and benchmarking are vital for you.
Important note: Here, we are unsure how much support there is for vision AI use cases as there was not much in terms of demos or tutorials for the field. We decided to give Sagemaker the benefit of the doubt and assume the following features include vision AI-relevant tracking.
The main feature AWS Sagemaker has for experiment tracking is called Experiments. Experiments track everything related to your trained model (inputs, configs, parameters, and results) and then show the results in their UI.
Sagemaker says that it automatically tracks “SageMaker independently executed training, batch transform, and processing jobs as trial components.” So there should be out-of-the-box tracking for anything trained on the Sagemaker platform. We say should as we struggled ourselves with getting this to work when we trialed it for vision use cases.
To us then, it seems like if you want to track what you care about, you have to do so manually. Here, you can set up monitoring from your local environment if you use SageMaker Studio notebooks. This seems to be quite powerful, but we lacked documentation for setting up relatively simple tracking metrics for vision AI like IoU.
We did see that Sagemaker Experiments is fully integrated with Sagemaker Studio (not to be confused with Studio lab), which is their UI. So, in theory, anything trained in their Studio (and maybe Studio lab) UIs should also have tracking built-in. Once again, though, it’s hard to say what you get out of the box and what you have to add yourself.
There’s also Sagemaker Debugger. This feature “captures training metrics in real-time such as data loss during regression and sending alerts when anomalies are detected.” It also has early-stopping functionality so that you stop training when a particular metric is reached.
When it comes to what you get out of the box, they offer hardware metrics (GPU consumption, etc.) and loss metrics.
In Hasty, models are tracked automatically for every model you train. Hasty captures the hardware and ML metrics you select when creating the job. As a specialized computer vision startup, there’s also more out-of-the-box support for CV metrics.
Additionally, Hasty visualizes model predictions made when training so that you see the metrics and the actual results of your model.
Hasty also makes it easy to benchmark experiments against each other so that you can figure out how they compare without having to write code to do so:
A nice touch here is that Hasty gives you plots, so you can see how different experiments developed over time. This gives you additional insight into how your models train and can help you better understand what to experiment with next.
So for deployment, Sagemaker (and the larger AWS ecosystem) is hard to beat in terms of sheer functionality. But there are some considerations for you when deciding:
Do you need all the advanced functionality?
As usual, how do you choose between complete customizability or ease of use?
How important is it for you to use your models for annotation too?
If we have been a bit tough on Sagemaker before, we now come to one of its strengths. AWS Sagemaker has a lot of functionality and covers a lot of potential use cases when it comes to model deployment. This, of course, makes sense. Hosting is AWS’s bread and butter. You can do anything from real-time inferences to batch processing. There’s auto-scaling of models too. They even have an edge manager to deploy models to different environments.
So, to summarize, Sagemaker is good when it comes to getting your models into their environment and then serving them.
It lacks, though, in possibilities to export models out of AWS. Once you have a model running in AWS, you will be locked into that environment as far as we can tell. There’s no easy way for you to take an already trained model, export it, and deploy it somewhere else. To do so, you would have to first retrain the model outside of Sagemaker and then deploy it to another environment.
It is also not that easy to deploy a model in the first place. As always in Sagemaker, you get a lot of possibilities and can use a lot of features, but using them will require some engineering.
Hasty offers similar functionality, albeit not with the same advanced features Sagemaker does. There are two possibilities on how to deploy models in Hasty. The first is to host the model in Hasty's cloud. This is similar to AWS's offering, as you can set up both real-time and batch processing inferencing models. Hasty does have auto-scaling, and you can bring your models to be hosted in Hasty if needed.
It is considerably easier to deploy in Hasty. Here, you can deploy any model in a couple of clicks.
Additionally, Hasty offers the possibility to take any model and connect it back to the annotation environment in a couple of clicks. This allows ML teams to deploy models to production environments and deploy them to speed up the annotation process. Having this in place can be crucial for a project. It creates a data flywheel of better models being created faster by having AI automation, making data labeling even quicker. Rinse and repeat.
In Hasty, you also can export any models in either TorchScript or ONNX formats (for other requests, feel free to contact [email protected]). This option gives you more control and removes lock-in effects as you can take any work you have in Hasty and deploy it to any other environment. Exporting models can benefit companies that want to deploy models in environments without stable internet connections, that need to host them themselves because of regulatory requirements, or simply prefer to own the model hosting process themselves.
For pricing, we see two main decision-making criteria. The first is cost certainty.
In Sagemaker, up-front cost certainty is difficult to come by. In Hasty, you will have a good idea of what to expect up-front.
This is important for most organizations that have to budget a project. With Sagemaker, it will be tough to budget for any deployment and hosting costs until late in the project. In Hasty, that is a much easier task.
The second is cost. Here, we struggle to compare when it comes to model building and hosting as AWS has so many possible configurations for both.
Where there is a significant difference is for anything related to data. Here, AWS is more expensive as they require more manual work. On the annotation side, you pay both if you do manual labeling and when you use automatic labeling. This is in stark contrast to Hasty, where you only pay for automated labeling. The same is true for QA. In short, Sagemaker charges on any work done on your data, Hasty only when you use automation.
There might not be a more difficult job than to explain AWS pricing in tech today. There are so many variables and customization options that it is, essentially, impossible to do a quick write-up of what you can expect in terms of pricing. The good news is that you pay for usage. This usually resonates with engineers. However, project managers often frown upon this as it gives little in terms of pricing certainty.
Given this preamble, we can only say that it is challenging to calculate the cost of Sagemaker up-front. There are too many options and very little guidance in terms of outlined scenarios for anyone but a Sagemaker expert to estimate costs. With that said, there is a pricing calculator that can be used for getting some idea of the cost.
On top of the computational cost, there’s also a potential hidden cost to consider. Sagemaker is very complicated, and you need a lot of time to understand how to set everything up correctly. To that, we suspect that many companies will find themselves having to hire a Sagemaker-certified engineer at some point. These people are not cheap. Most ask for an hourly rate between 60-500$.
There is a generous free offering for new users to Sagemaker, so you can give it a try and see how much computation you use before you have to pay.
Hasty has a more straightforward pricing approach where you pay for one of our plans up-front. The plans are tied to usage, and you can upgrade and downgrade every month to adapt costs to your needs.
Hasty also gives you a generous free plan so that you can test all functionality before making a buying decision. An important point here is that there’s considerably less lock-in in Hasty, so any work you’ve done while trialing the platform can be re-used elsewhere.
For 80% of vision AI teams, data is the bottleneck. Not with us.