Deep Layer Aggregation

Fisher Yu, Dequan Wang, Evan Shelhamer, Trevor Darrell
CVPR 2018 Oral

Deep Layer Aggregation

Abstract

Visual recognition requires rich representations that span levels from low to high, scales from small to large, and resolutions from fine to coarse. Even with the depth of features in a convolutional network, a layer in isolation is not enough: compounding and aggregating these representations improves inference of what and where. Architectural efforts are exploring many dimensions for network backbones, designing deeper or wider architectures, but how to best aggregate layers and blocks across a network deserves further attention. Although skip connections have been incorporated to combine layers, these connections have been “shallow” themselves, and only fuse by simple, one-step operations. We augment standard architectures with deeper aggregation to better fuse information across layers. Our deep layer aggregation structures iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters. Experiments across architectures and tasks show that deep layer aggregation improves recognition and resolution compared to existing branching and merging schemes.

Poster

Click here to open high-res pdf poster.

Method

We propose schemes of deep layer aggregation in contrast to shallow aggregation.

Deep layer aggregation learns to better extract the full spectrum of semantic and spatial information from a network. Iterative connections join neighboring stages to progressively deepen and spatially refine the representation. Hierarchical connections cross stages with trees that span the spectrum of layers to better propagate features and gradients.

Interpolation by iterative deep aggregation. Stages are fused from shallow to deep to make a progressively deeper and higher resolution decoder.

Results

DLA achieves great parameter and computation tradeoff.

It also does well in fine-grained image classification tasks.

DLA-Up works well on the semantic segmentation task. It can achieve good balance between global context and local details.

We also obtain state-of-the-art results on boundary prediction.

Paper

Code

paper
github.com/ucbdrive/dla

DLA is also supported in the following popular packages

It has been notably used in the following popular models

Citation

@inproceedings{yu2018deep,
  title={Deep layer aggregation},
  author={Yu, Fisher and Wang, Dequan and Shelhamer, Evan and Darrell, Trevor},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  year={2018}
}

Related


Dilated Residual Networks

Dilated Residual Networks

CVPR 2017 We show that dilated residual networks (DRNs) outperform their non-dilated counterparts in image classification without increasing the model’s depth or complexity.


Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

NeurIPS 2021 Spotlight We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation.


Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

ICCV 2021 Oral We propose a pixel-wise contrastive algorithm for semantic segmentation in the fully supervised setting.


Dense Prediction with Attentive Feature Aggregation

Dense Prediction with Attentive Feature Aggregation

arXiv 2021 We propose Attentive Feature Aggregation (AFA) to exploit both spatial and channel information for semantic segmentation and boundary detection.


BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

CVPR 2020 Oral The largest driving video dataset for heterogeneous multitask learning.


Learning Saliency Propagation for Semi-Supervised Instance Segmentation

Learning Saliency Propagation for Semi-Supervised Instance Segmentation

CVPR 2020 We propose a ShapeProp module to propagate information between object detection and segmentation supervisions for Semi-Supervised Instance Segmentation.


Deep Mixture of Experts via Shallow Embedding

Deep Mixture of Experts via Shallow Embedding

UAI 2019 We explore a mixture of experts (MoE) approach to deep dynamic routing, which activates certain experts in the network on a per-example basis.


TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning

TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning

CVPR 2019 We propose Task-Aware Feature Embedding Networks (TAFE-Nets) to learn how to adapt the image representation to a new task in a meta learning fashion.


SkipNet: Learning Dynamic Routing in Convolutional Networks

SkipNet: Learning Dynamic Routing in Convolutional Networks

ECCV 2018 We introduce SkipNet, a modified residual network, that uses a gating network to selectively skip convolutional blocks based on the activations of the previous layer.


Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation

Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation

ECCV 2018 We aim to characterize adversarial examples based on spatial context information in semantic segmentation.