Grad-CAM

In the modern world, neural networks are a proven instrument for solving different tasks across various industries. With the development of high-level Machine Learning libraries and Deep Learning frameworks, it became easier to train a network and achieve a nice metric value. Still, boldly relying on a model that produces accurate predictions is not the best strategy. The key to success is a bit deeper and lies in understanding why a model makes a specific decision. This is where the Grad-Cam algorithm that makes the difference comes into the mix.

On this page, we will talk a bit about the Grad-Cam concept and its capabilities. Let’s jump in.

Grad-Cam explained

Gradient-weighted Class Activation Mapping (Grad-Cam) is an explainability algorithm Hasty uses to explain the results of Classifier, Tagger, Attributor, and Semantic Segmentation models.

Please check out the D-Rise algorithm documentation page to see how Hasty explains the results of the Instance Segmentation and Object Detection tasks.

As the name suggests, Grad-Cam is built on top of the CAM (Class Activation Mapping) method. It is complex and strongly based on mathematics, so to get a complete overview of the Grad-Cam’s concept, please refer to the original paper.

Additionally, Grad-Cam is not a “black-box“ method as it straightaway interacts with the model. Still, it does not require changes in the model’s architecture and can be applied to a broad spectrum of CNN model families.

Source

Let’s take a high-level look at the Grad-Cam’s concept step by step. The flow behind it is straightforward:

  1. You choose an image as an input, a prediction you want to understand, and the model’s convolutional layer that will be used to take a slice of the features (let’s call it the last layer);
  2. You cut the model at the last layer and add fully-connected layers that will produce a prediction based on the features you got on the last layer;
  3. You pass an image through the model and grab:
    • A prediction;
    • The loss;
    • The output of the last layer.
  4. Concerning the model loss, you find the gradient of the output of the last layer using the backpropagation algorithm;
  5. You find and take the sections of the gradient that straightaway contribute to the prediction;
  6. You reduce, resize, and rescale these sections so that the produced heatmap would be of the same shape as the input image;
  7. That’s it. As a result, you get a heatmap of the most essential parts of an image.

Please note that most of these steps are automated, but Grad-Cam still requires a bit of manual work from you.

Grad-Cam visualized

Source
Source

Learn more about explainability …

Last updated on Sep 08, 2022

Removing the risk from vision AI.

Only 13% of vision AI projects make it to production, with Hasty we boost that number to 100%.