If you have ever tried solving a Classification task using a Machine Learning (ML) algorithm, you might have heard of the popular Accuracy score ML metric. On this page, we will:
Let’s jump in.
The Accuracy score is firmly based on the Confusion matrix. So, to better grasp the metric, please check out the Confusion matrix page first.
The most intuitive way to evaluate the performance of any Classification algorithm is to calculate the percentage of its correct predictions. And this is precisely the logic behind the Accuracy score.
To define the term, in Machine Learning, the Accuracy score (or just Accuracy) is a Classification metric featuring a fraction of the predictions that a model got right. The metric is prevalent as it is easy to calculate and interpret. Also, it measures the model’s performance with a single value.
So, to evaluate a Classification model using the Accuracy score, you need to have:
Fortunately, Accuracy is a highly intuitive metric, so you should not experience any challenges in understanding it. The Accuracy score is calculated by dividing the number of correct predictions by the total prediction number.
The more formal formula is the following one.
As you can see, Accuracy can be easily described using the Confusion matrix terms such as True Positive, True Negative, False Positive, and False Negative. Still, as described on the Confusion matrix page, these terms are mainly used for the binary Classification tasks.
So, the Accuracy score algorithm for the binary Classification task is as follows:
Yes, it is as simple as that. But what about the multiclass case? Well, there is no specific formula, so we suggest using the basic logic behind the metric to get the result. The Accuracy score algorithm for the multiclass Classification task is as follows:
In the Accuracy case, the metric value interpretation is more or less straightforward. If you are getting more correct predictions, it results in a higher Accuracy score. The higher the metric value, the better. The best possible value is 1 (if a model got all the predictions right), and the worst is 0 (if a model did not make a single correct prediction).
From our experience, you should consider Accuracy > 0.9 as an excellent score, Accuracy > 0.7 as a good one, and any other score as the poor one. Still, you can set your own thresholds as your logic and task might vary highly from ours (for example, in medicine, you might need to have an Accuracy score up to 0.99+ before calling a job done).
Still, this metric has two massive drawbacks that must be considered when using it. Let’s cover them one by one.
The greatest problem is that Accuracy is utterly useless if the class distribution in your set is skewed. Let’s check out a simple example.
For example, we want to evaluate the performance of a mail spam filter. We have 100 non-spam emails. Our classifier correctly predicted 90 of them (True Negative = 90, False Positive = 10). From 10 spam emails classifier identified only 5 (True Positive = 5, False Negative = 5). In this case, the Accuracy score will be:
However, if we predict all emails as non-spam, we will get a higher Accuracy (True Negative = 100, False Positive = 0, True Positive = 0, False Negative = 10):
The second model has a better metric value but does not have any predictive power. So, be very careful and always check whether your data has a class imbalance problem before applying Accuracy.
To be fair, Data Scientists came up with a solution to this problem by developing the Balanced Accuracy metric. Check its page in the sklearn documentation to learn more.
The other disadvantage is that Accuracy is not that informative when used as the only metric. For example, it does not tell you what types of errors your model makes.
At a 1% misclassification rate (99% Accuracy), the error could be caused by False Positives or False negatives. Such information is essential when evaluating a model for a specific use case. Take COVID tests as an example: you'd rather have FPs (the test says that a person has COVID, but he actually does not) than FNs (the test says that a person does not have COVID, but he actually does).
Overall, it is not a massive problem as you can solve it in a few lines of code by calculating some other metrics, but you still should keep in mind that relying only on the Accuracy value is a bad idea.
Let’s say we have a binary Classification task. For example, you are trying to determine whether a cat or a dog is on an image. You have a model and want to evaluate its performance using Accuracy. You pass 15 pictures with a cat and 20 images with a dog to the model. From the given 15 cat images, the algorithm predicts 9 pictures as the dog ones, and from the 20 dog images - 6 pictures as the cat ones. Let’s build a Confusion matrix first (you can check the detailed calculation on the Confusion matrix page).
Excellent, now let’s calculate the Accuracy score using the formula for the binary Classification task (the number of correct predictions is in the green cells of the table, and the number of the incorrect ones is in the red cells).
Ok, great. Let’s expand the task and add another class, for example, the bird one. You pass 15 pictures with a cat, 20 images with a dog, and 12 pictures with a bird to the model. The predictions are as follows:
Let’s build the matrix.
Let’s use the basic logic behind the Accuracy metric to calculate the value for the multiclass case.
Accuracy score is widely used in the industry, so all the Machine and Deep Learning libraries have their own implementation of this metric. For this page, we prepared three code blocks featuring calculating Accuracy in Python. In detail, you can check out:
Scikit-learn is the most popular Python library for classical Machine Learning. From our experience, Sklearn is the tool you will likely use the most to calculate Accuracy (especially, if you are working with the tabular data). Fortunately, you can do it in just a few lines of code.
# Importing the function from sklearn.metrics import accuracy_score # Initializing the arrays (multiclass case) y_pred = [0, 2, 1, 3] y_true = [0, 1, 2, 3] # Calculating and printing the result accuracy_score(y_true, y_pred, normalize=False)
Beyond the basic functionality, Sklearn has various Accuracy options implemented. You should definitely check them out to simplify your workflow.
In the vision AI field, the Accuracy score algorithm is slightly different. For instance segmentors, semantic segmentors, and object detectors, a prediction is correct if the predicted class equals the ground truth one and the prediction's IoU is above a certain threshold (often, a threshold of 0.5 is used).
# Importing the library import tensorflow as tf # Calculating the metric value m = tf.keras.metrics.Accuracy() m.update_state([1, 2, 3, 4], [0, 2, 3, 4]) # Printing the result print('Final result: ', m.result().numpy())
!pip install torchmetrics # Importing the library import torch import torchmetrics from torchmetrics import Accuracy # Initializing the input tensors target = torch.tensor([0, 1, 2, 3]) preds = torch.tensor([0, 2, 1, 3]) # Сalculating and printing the result accuracy = Accuracy() accuracy(preds, target)
Only 13% of vision AI projects make it to production, with Hasty we boost that number to 100%.