Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Wenguan Wang, Tianfei Zhou, Fisher Yu, Jifeng Dai, Ender Konukoglu, Luc Van Gool
ICCV 2021 Oral

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Abstract

Current semantic segmentation methods focus only on mining “local” context, i.e., dependencies between pixels within individual images, by context-aggregation modules (e.g., dilated convolution, neural attention) or structure-aware optimization criteria (e.g., IoU-like loss). However, they ignore “global” context of the training data, i.e., rich semantic relations between pixels across different images. Inspired by the recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting. The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes. It raises a pixel-wise metric learning paradigm for semantic segmentation, by explicitly exploring the structures of labeled pixels, which were rarely explored before. Our method can be effortlessly incorporated into existing segmentation frameworks without extra overhead during testing. We experimentally show that, with famous segmentation models (i.e., DeepLabV3, HRNet, OCR) and backbones (i.e., ResNet, HR-Net), our method brings consistent performance improvements across diverse datasets (i.e., Cityscapes, PASCAL-Context, COCO-Stuff, CamVid). We expect this work will encourage our community to rethink the current de facto training paradigm in fully supervised semantic segmentation.

Paper

Code

paper
github.com/tfzhou/ContrastiveSeg

Citation

@inproceedings{wang2021exploring,
  title = {Exploring Cross-Image Pixel Contrast for Semantic Segmentation},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  author = {Wang, Wenguan and Zhou, Tianfei and Yu, Fisher and Dai, Jifeng and Konukoglu, Ender and Van Gool, Luc},
  year = {2021},
}

Related


Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

NeurIPS 2021 Spotlight We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation.


Dense Prediction with Attentive Feature Aggregation

Dense Prediction with Attentive Feature Aggregation

arXiv 2021 We propose Attentive Feature Aggregation (AFA) to exploit both spatial and channel information for semantic segmentation and boundary detection.


BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

CVPR 2020 Oral The largest driving video dataset for heterogeneous multitask learning.


Learning Saliency Propagation for Semi-Supervised Instance Segmentation

Learning Saliency Propagation for Semi-Supervised Instance Segmentation

CVPR 2020 We propose a ShapeProp module to propagate information between object detection and segmentation supervisions for Semi-Supervised Instance Segmentation.


Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation

Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation

ECCV 2018 We aim to characterize adversarial examples based on spatial context information in semantic segmentation.


Deep Layer Aggregation

Deep Layer Aggregation

CVPR 2018 Oral We augment standard architectures with deeper aggregation to better fuse information across layers.


Dilated Residual Networks

Dilated Residual Networks

CVPR 2017 We show that dilated residual networks (DRNs) outperform their non-dilated counterparts in image classification without increasing the model’s depth or complexity.


FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation

FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation

arXiv 2016 We introduce the first domain adaptive semantic segmentation method, proposing an unsupervised adversarial approach to pixel prediction problems.


Multi-Scale Context Aggregation by Dilated Convolutions

Multi-Scale Context Aggregation by Dilated Convolutions

ICLR 2016 We study dilated convolution in depth. It has become a foundamental network operation.