We are hosting multi-object tracking (MOT) and segmentation (MOTS) challenges based on BDD100K, the largest open driving video dataset as part of the ECCV 2022 Self-supervised Learning for Next-Generation Industry-level Autonomous Driving (SSLAD) Workshop.
Participation
Please first test your results on our eval.ai challenge pages and get your performance. We only consider teams who outperform our baselines for each challenge in the challenge rankings. This page provides more details on our challenges.
Overview
This is a large-scale tracking challenge under the most diverse driving conditions. Understanding the temporal association and shape of objects within videos is one of the fundamental yet challenging tasks for autonomous driving. The BDD100K MOT and MOTS datasets provides diverse driving scenarios with high quality instance segmentation masks under complicated occlusions and reappearing patterns, which serves as a great testbed for the reliability of the developed tracking and segmentation algorithms in real scenes. The BDD100K dataset also include 100K raw video sequences, which can be readily used for self-supervised learning. We hope the utilization of large-scale unlabeled video data in self-driving could further boost the performance of MOT & MOTS. In this challenge, we provide two tracks: (1) Main track - standard MOT and MOTS, and (2) Teaser track - self-supervised MOT and MOTS. We encourage participants from both academia and industry.
Challenge Tracks
We introduce two challenge tracks for our BDD100K challenges: standard multi-object tracking and self-supervised tracking. For both tracks, you can use the full 100K raw video sequences, which are mostly unlabeled.
Main Track: Multi-Object Tracking
-
Multiple Object Tracking (MOT): Given a video sequence of camera images, predict 2D bounding boxes for each object and their association across frames.
-
Multiple Object Tracking and Segmentation (MOTS): In addition to MOT, also predict segmentation masks for each object.
Teaser Track: Self-Supervised Tracking
In this track, we investigate training object trackers without relying on tracking annotations, which can be costly to obtain. Object bounding boxes and masks from the detection and instance segmentation sets are still available, but not those from the tracking set.
-
Self-Supervised Multiple Object Tracking (MOT)
-
Self-Supervised Multiple Object Tracking and Segmentation (MOTS)
Prizes
All participants will receive certificates with their ranking, if desired. Additionally, the winners of each track will receive the following cash prizes:
- Main Track:
- $5,000 USD, $3,000 USD, and $2,000 USD for the top 3 winners of each challenge.
- Teaser Track:
- $1,000 USD for the winner of each challenge.
Timeline
The challenge starts on August 1st, 2022 and will end at 5 PM GMT on October 10th, 2022. You can use this tool to convert to your local time.
Data
BDD100K has been collected throughout diverse scenarios, covering New York, San Francisco Bay Area, and other regions in the US. It contains scenes in a wide variety of locations, weather conditions and day time periods, such as highways, city streets, residential areas, rainy/snowy weathers, etc. The BDD100K MOT set contains 2,000 fully annotated 40-second sequences at 5 FPS under different weather conditions, time of the day, and scene types. We use 1,400/200/400 videos for train/val/test, containing a total of 160K instances and 4M objects. The MOTS set uses a subset of the MOT videos, with 154/32/37 videos for train/val/test, containing 25K instances and 480K object masks. For all challenges, the full 100K raw video sequences at 30 FPS are also available for training.
Baselines
We provide two sets of baselines, one for each track, which serve as an example on how to utilize the BDD100K data.
- Main Track:
- QDTrack [1] for MOT.
- PCAN [2] for MOTS.
- You can also find these baselines in the BDD100K Model Zoo.
- Teaser Track:
- We train the QDTrack baseline model but without tracking annotations. We only use single frame annotations from the detection set and match each object to itself in the same frame. To simulate tracking, we apply different augmentations (Mosaic, MixUp [3], horizontal flip, resize, color jitter) to key and reference frames. For MOTS, we additionally use the instance segmentation set and add a mask prediction head.
Submission
For submission, please follow the following formats for each challenge.
MOT Format
To evaluate your algorithms on BDD100K MOT benchmark, the submission must be in standard Scalabel format in one of these formats:
- A zip file of a folder that contains JSON files of each video.
- A zip file of a JSON file of the entire evaluation set.
The JSON file for each video should contain a list of per-frame result dictionaries with the following structure:
- videoName: str, name of current sequence
- name: str, name of current frame
- frameIndex: int, index of current frame within sequence
- labels []:
- id: str, unique instance id of prediction in current sequence
- category: str, name of the predicted category
- box2d []:
- x1: float
- y1: float
- x2: float
- y2: float
You can find an example result file here.
MOTS Format
To evaluate your algorithms on BDD100K MOTS benchmark, the submission must be in standard Scalabel format in one of these formats:
- A zip file of a folder that contains JSON files of each video.
- A zip file of a JSON file of the entire evaluation set.
The JSON file for each video should contain a list of per-frame result dictionaries with the following structure:
- videoName: str, name of current sequence
- name: str, name of current frame
- frameIndex: int, index of current frame within sequence
- labels []:
- id: str, unique instance id of prediction in current sequence
- category: str, name of the predicted category
- rle:
- counts: str
- size: (height, width)
You can find an example result file here.
Evaluation Server
You can submit your predictions to our challenge evaluation servers hosted on EvalAI:
- Main Track:
- Teaser Track:
Note that these are separate servers used specifically for the challenges. Submissions to the public MOT and MOTS servers will not be used.
Submission Policy
You can make 3 successful submissions per month (at most 1 per day) to the test set and unlimited to the validation set. You can modify the visibility of your submission to be public or private. Before the final deadline, please make your final submission public so it is visible on the public leaderboard.
Evaluation
We provide more details here regarding evaluation.
Super-category
In addition to the evaluation of all 8 classes, we also evaluate results for 3 super-categories specified below. The super-category evaluation results are provided only for the purpose of reference.
"HUMAN": ["pedestrian", "rider"],
"VEHICLE": ["car", "bus", "truck", "train"],
"BIKE": ["motorcycle", "bicycle"]
Ignore Regions
After the bounding box matching process in evaluation, we ignore all detected false-positive boxes that have >50% overlap with the crowd region (ground-truth boxes with the “Crowd” attribute).
We also ignore object regions that are annotated as 3 distracting classes (“other person”, “trailer”, and “other vehicle”) by the same strategy of crowd regions for simplicity.
Pre-training
It is fair game to pre-train your network with ImageNet (ImageNet1K or ImageNet22K). For this challenge, we will only rank the methods that do not use external datasets (except ImageNet). Thus, datasets like COCO and Cityscapes are not allowed.
Metrics
We employ mean Higher Order Tracking Accuracy (HOTA, mean of HOTA of the 8 categories) as our primary evaluation metric for ranking. We also employ mean Multiple Object Tracking Accuracy (mMOTA) and mean ID F1 score (mIDF1), which are previously used as the main metrics. All metrics are detailed below. Note that the overall performance is measured for all objects without considering the category if not mentioned. For MOTS, we use the same metrics set as MOT. The only difference lies in the computation of distance matrices. In MOT, it is computed using box IoU, while for MOTS the mask IoU is used.
-
mHOTA (%): mean Higher Order Tracking Accuracy [4] across all 8 categories.
-
mMOTA (%): mean Multiple Object Tracking Accuracy [5] across all 8 categories.
-
mIDF1 (%): mean ID F1 score [6] across all 8 categories.
-
mMOTP (%): mean Multiple Object Tracking Precision [5] across all 8 categories.
-
HOTA (%): Higher Order Tracking Accuracy [5]. It balances the evaluation of detection and association into a single unified metric.
-
MOTA (%): Multiple Object Tracking Accuracy [5]. It measures the errors from false positives, false negatives and identity switches.
-
IDF1 (%): ID F1 score [6]. The ratio of correctly identified detections over the average number of ground-truths and detections.
-
MOTP (%): Multiple Object Tracking Precision [5]. It measures the misalignments between ground-truths and detections.
Questions
If you have any questions, please go to the BDD100K discussions board.
Organizers
![]() |
![]() |
![]() |
![]() |