Building proofs-of-concept to understand needed resources
The most common question we get in Hasty goes something like this "How much data do I need to label to get my metric to X accuracy?". This is an impossible question to answer. As all AI models are black boxes there's no way of telling how much work you have to put in to get the result you are looking for when you are starting out.
This is not really an experience question either. Internally in Hasty, we've seen hundreds of projects up close. You might think that we would be able to look at similar projects we've seen before and give a very rough benchmark answer to the question above.
To make it more concrete, let's say we've worked with a research institution in Germany that worked on a forestry application to detect wildfires automatically. Now, another research institution in Brazil approaches us with the same use case. If given permission, it would be easy for us to tell them exactly what they need, right?
Unfortunately not. By changing the location, you've changed the data you need to look at. For the sake of argument, let's say that in Germany slash-and-burn agriculture is less common, but in Brazil, it's quite popular. And the Brazilian research institution wants to include those situations in their model.
Furthermore, maybe one team has data from drones, and the other team uses satellite imagery. Now we changed another variable. And so on, so forth. It's highly unlikely that anyone will have seen a project with the same variables as yours - and if that's the case, it's probably an organization in direct competition with you.
With that in mind, how do you know how much you should pay for an AI project?
The answer is to continually work in quick iterations. The further you get into the project, the easier it is to estimate the cost of the next step. With that in mind, the recommendations we give all our new customers are:
- Set aside a small budget for building an initial proof-of-concept - this can be something like €10k
- Spend a month or so building the proof-of-concept and then evaluate the outcome. How does it perform? How far are you from achieving your goal?
- If the results look promising, do another small iteration. Maybe you'll bring in 1 or 2 more people, so you spend €20k this time
- Evaluate how this iteration impacted results. Are you moving closer to your goal? How much did you manage to move the metrics you care about?
- At this point, you should have a much better understanding of the work that needs to go in to reach your project goal. Now you can create a budget for the rest of the project's timeline
- Or, maybe you end up far below the results you were looking for. Then, this is the perfect time to have a discussion with stakeholders to decide if the project should be killed or if it makes sense to do another iteration
- Although it might be hard to kill a project you worked on, you've gotten to a point where you can make an informed decision with a minimum spend. This is something most organizations would celebrate and will free you up to focus on the next big idea in the pipeline
The costs of a vision AI project
So we've discussed prototypes but we still haven't talked about the costs you'll encounter when working on AI. Let's change that.
In general, the costs of developing vision AI solutions can be divided into six groups:
- Data acquisition. The cost for acquiring the data you need
- Data creation. The cost for annotating the data you need for your project
- Data curation. The cost for doing quality control on your data
- Machine learning engineering. The cost for developing an AI model
- Infrastructure. The cost for building an ML infrastructure. The major part here is engineering hours, but it also includes computing resources and hardware
- Supporting software. The project's budget should also include costs for using APIs, operating systems, MLOps tools, paid databases, and any other additional software. Usually, supporting software is not that expensive, but it will be a mistake to forget about the expenses on it when planning your project;
- App development. If the end goal is to get a model working in some type of application, you also need to consider app development costs
How those costs are split differs from project to project. But making a rough estimate of a "typical" project, based on what we have seen helping with 100+ projects, we can say that data creation and curation will make up between 20% and 40% of your costs. Machine learning engineering is another 20%, as is infrastructure development. App development is another 10%. Data acquisition is another 5%, leaving us with 5% of the budget for software costs and unforeseen overheads.
These costs will of course vary quite a bit. Let's say you are doing something where you need vast amounts of data like autonomous driving. If so, data creation and curation might take up 70–80% of your budget. On the other hand, if there are already good data available or if you have a use case with little variance in your data, the data creation and curation budget might only be 20%.
The budget distribution also tends to change over the lifetime of a project. At the start, the main cost tends to be machine learning engineering and setting up infrastructure. Over time, as you have the foundations in place, that shifts to spending more on data creation, curation, and app development.
The hidden costs of a vision AI project
To make things even more difficult, there are many potential hidden costs in any AI project. These are:
- Information security. If there is a large amount of data, the system must be protected. That is why, when integrating your project into a company, you must be ready to spend some money and bring everything in line with information security requirements. The major drawback is that sometimes information security expenses are comparable to the cost of a solution itself.
- Project management. As a project grows, the need for someone to manage it increases. Especially if you are working with many different stakeholders or other companies, you'll have a much larger need for communication than initially thought, which means you'll need more people assisting with your project.
- Change of business logic. Over the time of an AI project, many teams get a better understanding of what's possible and how they can improve on their initially scoped-out solution. Although this is generally positive, it means additional work as you oftentimes have to reannotate your data asset and change your AI models.
- Quality issues. Especially at the start of a project, it's common that the data created will have quality issues. This is fairly normal as annotators are learning on the job, getting an understanding of your use case. However, often it means that the initial data annotated rarely end up being used.
- Training costs. Similar to the point above, there's also a hidden cost for the core team to train annotators and supervisors to make sure they label a project correctly. In many projects, this cost is underestimated - the perception being that you can do it in a week or so. In most cases, this is something that needs constant attention for at least a couple of months before the data workforce has the necessary understanding of the use case
In this article, we have gone through how you can start a project in the best possible way for your budget. We have also detailed the existing costs that you will encounter during an AI project. In the next article in this series, we will give a more concrete example of costs where we illustrate the costs for various stages of the development phase.
Shameless plug time
Do you have a vision AI project coming up and are looking for the right software? Let me give you four bullet points as to why you should consider Hasty.
- We offer the fastest annotation environment that exists. By training AI models on your data, we can automate up to 90% of the annotation work
- We have AI-assisted quality control. As we'll touch on in later posts, this can be a massive outlay, but by using AI to do quality control for you, you can cut costs for QA with 80%
- We have a fully functioning model building environment. That means you can train models using SOTA architectures without any need for setting up local environments or creating complex data pipelines
- All in all, you get a complete end-to-end offering that will drastically reduce your costs for developing AI solutions, and that will make it a lot easier working as a team to build the next great value creator for your company
If you're interested in trying us out, sign up for a free trial here.