SHIFT Dataset /

Get Started

Images & Masks

Our dataset is organized using the format in the table below. To reduce number of files and speed up I/O, data from each type are compressed in corresponding HDF5 files.

Inside each HDF5 file, samples are named using the sequence ID and frame ID. The name pattern is <seq>/<frame>_<sensor>_<view>.<ext>. For example, the following code retrievals the RGB image of front view from sequence 001001 at frame 000025,


We provide labels for 2D/3D object detection, 2D/3D multiple object tracking (MOT), and instance segmentation in the protocol of Scalabel format. Each file has the following fields.

- frames[ ]:                           // storing all frames from every sequences
    - frame0001: Frame
    - frame0002: Frame
- config:
    - image_size:                       // all images have the same size
        - width: 1280
        - height: 800
    - categories: ["car", "truck", ...] // define the categories of objects

Here, frames[ ] is a list of Frame object, which contains all frames from every sequences. The Frame object is defined as

- name: string                          // e.g., "abcd-1234/00000001_img_center.png"
- videoName: string                     // e.g., "abcd-1234", unique across whole dataset
- attributes                            // for discrete domain shifts
    - timeofday: 
        {"daytime" | "morning/afternoon" | "dawn/dusk" | 
        "sunrise/sunset" | "night"} 
    - weather: 
        {"clear" | "partly cloudy" | "overcast" | "small_rain" | 
         "mid_rain" | "heavy_rain" | "small_fog" | "heavy_fog" } 
    - vehicle_density: 
        {"sparse" | "moderate" | "crowded"} 
    - pedestrian_density: 
        {"sparse" | "moderate" | "crowded"} 
    - scene: {"urban" | "village" | "rural" | "highway"}
- intrinsics                            // intrinsic matrix (only for cameras)
    - focal: [x, y]                     // in pixel
    - center: [x, y]                    // in pixel
- extrinsics:                           // extrinsics matrix
    - location: [x, y, z]               // in meter
    - rotation: [rot_x, rot_y, rot_z]   // in degree
- timestamp: int                        // time in this video, ms
- frameIndex: int                       // frame index in this video
- size:
    - width: 1280
    - height: 800
- labels [ ]:
    - id: string                         // for tracking, unique in current sequence
    - index: int
    - category: string                   // classification
    - attributes
        - truncated: bool                // truncation for 2D bounding box
    - box2d:                             // 2D bounding box
        - x1: float
        - y1: float
        - x2: float
        - y2: float
    - box3d:                              // 3D bounding box
        - alpha: float    
        - orientation: [rot_x, rot_y, rot_z]
        - location: [x, y, z]                   // in meter
        - dimension: [height, width, length]    // in meter
    - rle:                                // mask in RLE, for instance segmentation
        - counts: [int]
        - size: (height, width)          

How to use SHIFT?

Our dataset aims to foster research in several under-explored fields for autonomous driving that are crucial to safety. We summerize the possible use cases of our dataset as:

  • Robustness and generality: Investigating how a perception systems’ performance degrades at increasing levels of domain shift.
  • Uncertainty estimation and calibration: Assessing and developing uncertainty estimation methods that work under realistic domain shifts.
  • Multi-task perception system: Studying the combinations of tasks and developing multi-task models to effectively counteract domain shifts. Such results could hint the data collection and annotation industry in the real world.
  • Continual learning: Investigating how to utilize domain shifts progressively, e.g., continual domain adaptation and curriculum learning.
  • Test-time learning: Developing and evaluating learning algorithms for continuously changing environments, e.g., test-time adaptation.