We are hosting multi-object tracking (MOT) and segmentation (MOTS) challenges based on BDD100K, the largest open driving video dataset as part of the CVPR 2022 Workshop on Autonomous Driving (WAD).
Participation
Please first test your results on our eval.ai challenge pages (MOT or MOTS and get your performance. Then tell us your method details through this submission form before the challenge deadline. We can only consider those teams who fill in the form in the challenge ranking. This page provides more details on our challenges.
Overview
This is a large-scale tracking challenge under the most diverse driving conditions. Understanding the temporal association and shape of objects within videos is one of the fundamental yet challenging tasks for autonomous driving. The BDD100K MOT and MOTS datasets provides diverse driving scenarios with high quality instance segmentation masks under complicated occlusions and reappearing patterns, which serves as a great testbed for the reliability of the developed tracking and segmentation algorithms in real scenes. We encourage participants from both academia and industry.
Challenges
-
Multiple Object Tracking (MOT): Given a video sequence of camera images, predict 2D bounding boxes for each object and their association across frames.
-
Multiple Object Tracking and Segmentation (MOTS): In addition to MOT, also predict segmentation masks for each object.
All participants will receive certificates with their ranking, if desired.
Timeline
The challenge starts on March 21st, 2022 and will end at 5 PM GMT on June 7, 2022. You can use this tool to convert to your local time.
Data
The BDD100K MOT set contains 2,000 fully annotated 40-second sequences under different weather conditions, time of the day, and scene types. We use 1,400/200/400 videos for train/val/test, containing a total of 160K instances and 4M objects. The MOTS set uses a subset of the MOT videos, with 154/32/37 videos for train/val/test, containing 25K instances and 480K object masks.
Baselines
We provide two baselines, one for each challenge, which serve as an example on how to utilize the BDD100K data.
You can also find the baselines in the BDD100K Model Zoo.
Submission
For submission, please follow the following formats for each challenge.
MOT Format
To evaluate your algorithms on BDD100K MOT benchmark, the submission must be in standard Scalabel format in one of these formats:
- A zip file of a folder that contains JSON files of each video.
- A zip file of a JSON file of the entire evaluation set.
The JSON file for each video should contain a list of per-frame result dictionaries with the following structure:
- videoName: str, name of current sequence
- name: str, name of current frame
- frameIndex: int, index of current frame within sequence
- labels []:
- id: str, unique instance id of prediction in current sequence
- category: str, name of the predicted category
- box2d []:
- x1: float
- y1: float
- x2: float
- y2: float
You can find an example result file here.
MOTS Format
To evaluate your algorithms on BDD100K MOTS benchmark, the submission must be in standard Scalabel format in one of these formats:
- A zip file of a folder that contains JSON files of each video.
- A zip file of a JSON file of the entire evaluation set.
The JSON file for each video should contain a list of per-frame result dictionaries with the following structure:
- videoName: str, name of current sequence
- name: str, name of current frame
- frameIndex: int, index of current frame within sequence
- labels []:
- id: str, unique instance id of prediction in current sequence
- category: str, name of the predicted category
- rle:
- counts: str
- size: (height, width)
You can find an example result file here.
Evaluation Server
You can submit your predictions to our challenge evaluation servers hosted on EvalAI:
Note that these are separate servers used specifically for the challenges. Submissions to the public MOT and MOTS servers will not be used.
Submission Policy
You can make 3 successful submissions per month (at most 1 per day) to the test set and unlimited to the validation set. The leaderboard will be public.
Evaluation
We provide more details here regarding evaluation.
Super-category
In addition to the evaluation of all 8 classes, we also evaluate results for 3 super-categories specified below. The super-category evaluation results are provided only for the purpose of reference.
"HUMAN": ["pedestrian", "rider"],
"VEHICLE": ["car", "bus", "truck", "train"],
"BIKE": ["motorcycle", "bicycle"]
Ignore Regions
After the bounding box matching process in evaluation, we ignore all detected false-positive boxes that have >50% overlap with the crowd region (ground-truth boxes with the “Crowd” attribute).
We also ignore object regions that are annotated as 3 distracting classes (“other person”, “trailer”, and “other vehicle”) by the same strategy of crowd regions for simplicity.
Pre-training
It is a fair game to pre-train your network with ImageNet, but if other datasets are used, please note in the submission description. We will rank the methods without using external datasets except ImageNet.
Metrics
We employ mean Multiple Object Tracking Accuracy (mMOTA, mean of MOTA of the 8 categories) as our primary evaluation metric for ranking. We also employ mean ID F1 score (mIDF1) to highlight the performance of tracking consistency that is crucial for object tracking. All metrics are detailed below. Note that the overall performance is measured for all objects without considering the category if not mentioned. For MOTS, we use the same metrics set as MOT. The only difference lies in the computation of distance matrices. In MOT, it is computed using box IoU, while for MOTS the mask IoU is used.
-
mMOTA (%): mean Multiple Object Tracking Accuracy across all 8 categories.
-
mIDF1 (%): mean ID F1 score across all 8 categories.
-
mMOTP (%): mean Multiple Object Tracking Precision across all 8 categories.
-
MOTA (%): Multiple Object Tracking Accuracy [3]. It measures the errors from false positives, false negatives and identity switches.
-
IDF1 (%): ID F1 score [4]. The ratio of correctly identified detections over the average number of ground-truths and detections.
-
MOTP (%): Multiple Object Tracking Precision [3]. It measures the misalignments between ground-truths and detections.
-
FP: Number of False Positives [3].
-
FN: Number of False Negatives [3].
-
IDSw: Number of Identity Switches [3]. An identity switch is counted when a ground-truth object is matched with a identity that is different from the last known assigned identity.
-
MT: Number of Mostly Tracked identities. At least 80 percent of their lifespan are tracked.
-
PT: Number of Partially Tracked identities. At least 20 percent and less than 80 percent of their lifespan are tracked.
-
ML: Number of Mostly Lost identities. Less of 20 percent of their lifespan are tracked.
-
FM: Number of FragMentations. Total number of switches from tracked to not tracked detections.
Questions
If you have any questions, please go to the BDD100K discussions board.
Organizers
![]() |
![]() |
![]() |
![]() |