Images & Masks
Our dataset is organized using the format in the table below. To reduce number of files and speed up I/O, data from each type are compressed in corresponding HDF5 files.
Inside each HDF5 file, samples are named using the sequence ID and frame ID. The name pattern is <seq>/<frame>_<sensor>_<view>.<ext>
.
For example, the following code retrievals the RGB image of front view from sequence 001001 at frame 000025,
Labels
We provide labels for 2D/3D object detection, 2D/3D multiple object tracking (MOT), and instance segmentation in the protocol of Scalabel format. Each file has the following fields.
Dataset:
- frames[ ]: // storing all frames from every sequences
- frame0001: Frame
- frame0002: Frame
...
- config:
- image_size: // all images have the same size
- width: 1280
- height: 800
- categories: ["car", "truck", ...] // define the categories of objects
Here, frames[ ]
is a list of Frame
object, which contains all frames from every sequences. The Frame
object is defined as
Frame:
- name: string // e.g., "abcd-1234/00000001_img_center.png"
- videoName: string // e.g., "abcd-1234", unique across whole dataset
- attributes // for discrete domain shifts
- timeofday:
{"daytime" | "morning/afternoon" | "dawn/dusk" |
"sunrise/sunset" | "night"}
- weather:
{"clear" | "partly cloudy" | "overcast" | "small_rain" |
"mid_rain" | "heavy_rain" | "small_fog" | "heavy_fog" }
- vehicle_density:
{"sparse" | "moderate" | "crowded"}
- pedestrian_density:
{"sparse" | "moderate" | "crowded"}
- scene: {"urban" | "village" | "rural" | "highway"}
- intrinsics // intrinsic matrix (only for cameras)
- focal: [x, y] // in pixel
- center: [x, y] // in pixel
- extrinsics: // extrinsics matrix
- location: [x, y, z] // in meter
- rotation: [rot_x, rot_y, rot_z] // in degree
- timestamp: int // time in this video, ms
- frameIndex: int // frame index in this video
- size:
- width: 1280
- height: 800
- labels [ ]:
- id: string // for tracking, unique in current sequence
- index: int
- category: string // classification
- attributes
- truncated: bool // truncation for 2D bounding box
- box2d: // 2D bounding box
- x1: float
- y1: float
- x2: float
- y2: float
- box3d: // 3D bounding box
- alpha: float
- orientation: [rot_x, rot_y, rot_z]
- location: [x, y, z] // in meter
- dimension: [height, width, length] // in meter
- rle: // mask in RLE, for instance segmentation
- counts: [int]
- size: (height, width)
How to use SHIFT?
Our dataset aims to foster research in several under-explored fields for autonomous driving that are crucial to safety. We summerize the possible use cases of our dataset as:
- Robustness and generality: Investigating how a perception systems’ performance degrades at increasing levels of domain shift.
- Uncertainty estimation and calibration: Assessing and developing uncertainty estimation methods that work under realistic domain shifts.
- Multi-task perception system: Studying the combinations of tasks and developing multi-task models to effectively counteract domain shifts. Such results could hint the data collection and annotation industry in the real world.
- Continual learning: Investigating how to utilize domain shifts progressively, e.g., continual domain adaptation and curriculum learning.
- Test-time learning: Developing and evaluating learning algorithms for continuously changing environments, e.g., test-time adaptation.