Multi-Task Perception

Featuring a comprehensive sensor suite and annotations, SHIFT supports a range of mainstream perception tasks.

Domain Adaptation

SHIFT considers the most-frequent real-world environmental changes and provides 24 types of domain shifts.

Continual Learning

Containing video sequences under continuously shifting environments, SHIFT provides the first driving dataset allowing research on continuous test-time learning and adaptation.

What does SHIFT provide?

Featuring a comprehensive sensor suite and annotations, SHIFT supports a range of mainstream perception tasks. Our sensor suite features:

  • Multi-View RGB camera set: Front, left / right 45, left / right 90.
  • Stereo RGB cameras: A pair of RGB cameras with horizontal gap 50 cm.
  • Depth camera: Dense depth camera with a depth resolution of 1 mm.
  • Optical flow camera: Dense optical flows.
  • LiDAR sensor: 128 channels, 1.12M points per second.
  • GNSS / IMU sensor.

Together, SHIFT supports supports 12 mainstream perception tasks for autonomous driving:

2D object detection 3D object detection Semantic segmentation Instance segmentation 2D object tracking 3D object tracking
Depth estimation Optical flow estimation Trajectory forecasting Pose estimation Trajectory forecasting Point cloud registration

Why use SHIFT?

Adapting to a continuously evolving environment is a safety-critical challenge inevitably faced by all autonomous-driving systems. However, existing image- and video-based driving datasets, fall short of capturing the mutable nature of the real world. Our dataset, SHIFT, is committed to overcoming these limitations.


Extensive discrete domain shifts

SHIFT considers the most-frequent real-world environmental changes and provides 24 types of domain shifts in 4 main categories:

  • Weather conditions, including cloudiness, rain, and fog intensity
  • Time of day
  • The density of vehicles and pedestrians
  • Camera orientations

First dataset with realistic continuous domain shifts

For nearly all real-world datasets the maximum length of sequences is less than 100 seconds. Given their short length, these sequences are captured under approximately stationary conditions. By collecting video sequences under continuously shifting environmental conditions, we provide the first driving dataset allowing research on continuous test-time learning and adaptation.

length


We observe that the trends of performance drops witnessed in our simulation dataset are compatible with real-world observations in BDD100K under discrete domain shifts. This confirms the real-world consistency of our dataset.

trend


Use cases

Multi-task perception

Multi-task learning can counteract the domain shifts significantly with a proper task combination. Here, a multi-task model trained on semantic segmentation (S), instance segmentation (I), and depth estimation (D) increases the performance under rainy and night conditions.

multitask


Test-time learning

Test-time adaptation (TTA) can effectively boost the model’s performance on continuously shifted domains. However, it is highly sensitive to the hyperparameters and shows severe catastrophic forgetting issues. We hope future research will attempt to mitigate this issue.

About

The SHIFT Dataset is made freely available to academic and non-academic entities for research purposes such as academic research, teaching, scientific publications, or personal experimentation. If you use our dataset, we kindly ask you to cite our paper as

@InProceedings{shift2022,
    author    = {Sun, Tao and Segù, Mattia and Postels, Janis and Wang, Yuxuan and Van Gool, Luc and Schiele, Bernt and Tombari, Federico and Yu, Fisher},
    title     = {{SHIFT:} A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation},
    booktitle = {Computer Vision and Pattern Recognition},
    year      = {2022}
}

You can post questions and comments on our Github Discussions.