End-to-end Learning of Driving Models from Large-scale Video Datasets

Huazhe Xu, Yang Gao, Fisher Yu, Trevor Darrell
CVPR 2017 Oral

End-to-end Learning of Driving Models from Large-scale Video Datasets

Abstract

Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or a simulation environment. We advocate learning a generic vehicle motion model from large scale crowd-sourced video data, and develop an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state. Our model incorporates a novel FCN-LSTM architecture, which can be learned from large-scale crowd-sourced vehicle action data, and leverages available scene segmentation side tasks to improve performance under a privileged learning paradigm.

Paper

Code

paper
github.com/gy20073/BDD_Driving_Model

Citation

@inproceedings{xu2017end,
  title={End-to-end learning of driving models from large-scale video datasets},
  author={Xu, Huazhe and Gao, Yang and Yu, Fisher and Darrell, Trevor},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={2174--2182},
  year={2017}
}

Related


Deep Object-Centric Policies for Autonomous Driving

Deep Object-Centric Policies for Autonomous Driving

ICRA 2019 We show that object-centric models outperform object-agnostic methods in scenes with other vehicles and pedestrians.


End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

ICCV 2021 We demonstrated that an RL coach (Roach) would be a better choice to supervise imitation learning agents.


Instance-Aware Predictive Navigation in Multi-Agent Environments

Instance-Aware Predictive Navigation in Multi-Agent Environments

ICRA 2021 A new visual model-based RL method with consideration of multiple hypotheses for future object movement.


Semantic Predictive Control for Explainable and Efficient Policy Learning

Semantic Predictive Control for Explainable and Efficient Policy Learning

ICRA 2019 We propose a driving policy learning framework that predicts feature representations of future visual inputs.