Tracking Every Thing in the Wild

Siyuan Li, Martin Danelljan, Henghui Ding, Thomas E. Huang, Fisher Yu
ECCV 2022

Tracking Every Thing in the Wild

Abstract

Current multi-category Multiple Object Tracking (MOT) metrics use class labels to group tracking results for per-class evaluation. Similarly, MOT methods typically only associate objects with the same class predictions. These two prevalent strategies in MOT implicitly assume that the classification performance is near-perfect. However, this is far from the case in recent large-scale MOT datasets, which contain large numbers of classes with many rare or semantically similar categories. Therefore, the resulting inaccurate classification leads to sub-optimal tracking and inadequate benchmarking of trackers. We address these issues by disentangling classification from tracking. We introduce a new metric, Track Every Thing Accuracy (TETA), breaking tracking measurement into three sub-factors: localization, association, and classification, allowing comprehensive benchmarking of tracking performance even under inaccurate classification. TETA also deals with the challenging incomplete annotation problem in large-scale tracking datasets. We further introduce a Track Every Thing tracker (TETer), that performs association using Class Exemplar Matching (CEM). Our experiments show that TETA evaluates trackers more comprehensively, and TETer achieves significant improvements on the challenging large-scale datasets BDD100K and TAO compared to the state-of-the-art.

Results

BDD100K val set

MethodbackbonepretrainmMOTAmIDF1TETALocAAssocAClsA
QDTrack(CVPR21)ResNet-50ImageNet-1K36.651.647.845.948.549.2
TETer (Ours)ResNet-50ImageNet-1K39.153.350.847.252.952.4

BDD100K test set

MethodbackbonepretrainmMOTAmIDF1TETALocAAssocAClsA
QDTrack(CVPR21)ResNet-50ImageNet-1K35.752.349.247.250.949.2
TETer (Ours)ResNet-50ImageNet-1K37.453.350.847.053.650.7

TAO val set

MethodbackbonepretrainTETALocAAssocAClsA
QDTrack(CVPR21)ResNet-101ImageNet-1K30.050.527.412.1
TETer (Ours)ResNet-101ImageNet-1K33.351.635.013.2
TETer-HTC (Ours)ResNeXt-101-64x4dImageNet-1K36.957.537.515.7
TETer-SwinT (Ours)SwinTImageNet-1K34.652.136.715.0
TETer-SwinS (Ours)SwinSImageNet-1K36.754.238.417.4
TETer-SwinB (Ours)SwinBImageNet-22K38.855.640.120.8
TETer-SwinL (Ours)SwinLImageNet-22K40.156.339.924.1

Visualization on BDD100K and TAO

Fun results on random youtube videos

Paper

Code

paper
github.com/SysCV/tet

Citation

@inproceedings{tet,
    title = {Tracking Every Thing in the Wild},
    author = {Li, Siyuan and Danelljan, Martin and Ding, Henghui and Huang, Thomas E. and Yu, Fisher},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2022}
} 

Related


CC-3DT: Panoramic 3D Object Tracking via Cross-Camera Fusion

CC-3DT: Panoramic 3D Object Tracking via Cross-Camera Fusion

CoRL 2022 We propose a method for panoramic 3D object tracking, called CC-3DT, that associates and models object trajectories both temporally and across views.


Video Mask Transfiner for High-Quality Video Instance Segmentation

Video Mask Transfiner for High-Quality Video Instance Segmentation

ECCV 2022 We introduce the HQ-YTVIS dataset as long as Tube-Boundary AP, which provides training, validation and testing support to facilitate future development of VIS methods aiming at higher mask quality.


Video Mask Transfiner for High-Quality Video Instance Segmentation

Video Mask Transfiner for High-Quality Video Instance Segmentation

ECCV 2022 We propose Video Mask Transfiner (VMT) method, capable of leveraging fine-grained high-resolution features thanks to a highly efficient video transformer structure.


SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation

SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation

CVPR 2022 We introduce the largest synthetic dataset for autonomous driving to study continuous domain adaptation and multi-task perception.


Transforming Model Prediction for Tracking

Transforming Model Prediction for Tracking

CVPR 2022 We propose a tracker architecture employing a Transformer-based model prediction module.


Fast Hierarchical Learning for Few-Shot Object Detection

Fast Hierarchical Learning for Few-Shot Object Detection

IROS 2022 We pose few-shot detection as a hierarchical learning problem, where the novel classes are treated as the child classes of existing base classes and the background class.


Monocular Quasi-Dense 3D Object Tracking

Monocular Quasi-Dense 3D Object Tracking

TPAMI 2022 We combine quasi-dense tracking on 2D images and motion prediction in 3D space to achieve significant advance in 3D object tracking from monocular videos.


Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

NeurIPS 2021 Spotlight We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation.


Quasi-Dense Similarity Learning for Multiple Object Tracking

Quasi-Dense Similarity Learning for Multiple Object Tracking

CVPR 2021 Oral We propose a simple yet effective multi-object tracking method in this paper.


BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

CVPR 2020 Oral The largest driving video dataset for heterogeneous multitask learning.