In the modern world, neural networks are a proven instrument for solving different tasks across various industries. With the development of high-level Machine Learning libraries and Deep Learning frameworks, it became easier to train a network and achieve a nice metric value. Still, boldly relying on a model that produces accurate predictions is not the best strategy. The key to success is a bit deeper and lies in understanding why a model makes a specific decision. This is where the D-Rise algorithm that makes the difference comes into the mix.
On this page, we will talk a bit about the D-Rise concept and its capabilities. Let’s jump in.
It is not a secret that the predictions of convolutional neural networks (CNNs) can be interpreted using saliency maps (heatmaps that show which regions of the input are more important for the model’s prediction). Unfortunately, when it comes to evaluating a network trained to solve Object Detection or Instance Segmentation task, such a straightforward method fails to produce high-quality explanations.
This happens because areas important for detecting or segmenting a particular object might not coincide with the object itself. For example:
A network can use contextual information;
Or some object parts might be more important than others.
Detector Randomized Input Sampling for Explanation (D-RISE) is designed for Object Detection and Instance Segmentation explainability purposes. It is a “black-box” method, meaning that D-Rise requires only the input and output of your model. In other words, D-Rise does not need to know your model's architecture, weights, and other parameters to work. Such a concept is a significant strength for an explainability method. With such a philosophy, D-Rise can be effectively applied for various object detectors, for example, one-stage (YOLO, SSD, CornerNet) and two-stage (Faster R-CNN).
So, given the input and output of your model D-Rise will produce a saliency map as a result.
Let’s take a closer look at the D-Rise algorithm step by step:
As mentioned above, the D-Rise concept is based on a so-called similarity score explicitly developed for this algorithm. In short, the similarity score is computed using the similarities between three individual components such as:
Intersection over Union between the ground truth and predicted bounding box;
Cosine similarity between the ground truth and predicted class distribution;
Objectness value of the proposal.
These values are multiplied to calculate the similarity score between any two vectors. As you might notice, such a metric ensures that if any of the factors is close to zero, the similarity value will also be close to zero.
Still, to get a deeper understanding of the similarity score and D-Rise algorithm in general, please check out the initial paper.