DeepLabv3+

DeepLabv3+ is a semantic segmentation architecture that builds on DeepLabv3 by adding a simple yet effective decoder module to enhance segmentation results.

Multiple downsampling of a CNN will lead the feature map resolution to become smaller, resulting in lower prediction accuracy and loss of boundary information in semantic segmentation. Similarly, aggregating context around a feature helps in segmenting it better, which is accomplished with the atrous convolutions. DeepLabv3+ helps in solving these issues.

Downsampling is widely adopted in deep convolutional neural networks (CNN) for reducing memory consumption while preserving the transformation invariance to some degree.

Atrous rate

Atrous Convolution/Dilated Convolution is a tool for refining the effective field of view of the convolution. It modifies the field of view using a parameter termed atrous rate. It is a simple yet powerful approach for enlarging the field of view of filters without affecting computation or the number of parameters.

Atrous/Dilated Convolution has wider field of view with same number of parameters as Normal

DeepLabV3+ adds an encoder based on DeepLabV3 to fix the previously noted problem of DeepLabV3 consuming too much time to process high-resolution images.\
The application of the depthwise separable convolution to both atrous spatial pyramid pooling and decoder modules results in a faster and stronger encoder-decoder network for semantic segmentation.

Output stride

Output stride describes the ratio of the size of the input image to the size of the output feature map. It specifies how much signal reduction the input vector experiences as it passes the network.

In Model Playground, we have the option of having output stride as 8 or 16

In the architecture below, the encoder is based on an output stride of 16, i.e. the input image is down-sampled by a factor of 16.

Architecture

Encoder network

DeepLabV3+ employs Aligned Xception as its main feature extractor (encoder), although with substantial modifications. Depth-wise separable convolution replaces all max pooling procedures.

Thanks to the encoder-decoder structure in DeepLabv3+, you can arbitrarily control the resolution of extracted encoder features by atrous convolution to trade-off precision and runtime.

In Model Playground, we can select feature extraction (encoding) network to use as either Resnet or EffiecientNet.

Weights

It's the weights to use for model initialization, and in Model Playground ResNet101 COCO weights are used.

Code Implementation

  
Hello, thank you for using the code provided by CloudFactory. Please note that some code blocks might not be 100% complete and ready to be run as is. This is done intentionally as we focus on implementing only the most challenging parts that might be tough to pick up from scratch. View our code block as a LEGO block - you can’t use it as a standalone solution, but you can take it and add it to your system to complement it.

      python
      
    
      # implement semantic segmentation with deeplabv3+ model is trained on ade20k dataset.
!pip3 install tensorflow
import pixellib
!pip3 install pixellib — upgrade
from pixellib.semantic import semantic_segmentation

segment_image = semantic_segmentation()
segment_image.load_ade20k_model("deeplabv3_xception65_ade20k.h5")
segment_image.segmentAsAde20k("path_to_image", output_image_name= "path_to_output_image")

#xception model trained on ade20k for segmenting objects: 
http://download.tensorflow.org/models/deeplabv3_xception_ade20k_train_2018_05_29.tar.gz
    

PASCAL VOC 2012 test set results with SOTA approaches

As seen above, DeepLabv3+ surpasses various SOTA techniques, including LC, ResNet-DUC-HDC (TuSimple), GCN (Large Kernel Matters), RefineNet, ResNet-38, PSPNet, IDW-CNN, SDN, DIS, and DeepLabv3.

Further Resources

Wiki entry for U-Net

Wiki entry for U-Net++

Boost model performance quickly with AI-powered labeling and 100% QA.

Learn more

Last modified 9d ago

Previous - Computer Vision model architectures

Mask R-CNN

Next - Computer Vision model architectures

U-Net