OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving
The 2077AI Foundation, in collaboration with Automotive Studies, Automotive Studies, Tongji University, CAERI (China Automotive Engineering Research Institute), and Geometrical-PAL, has released a new generation of datasets for the autonomous driving. These datasets provide large-scale, more modern, and more realistic data that offer perspectives not previously available. They support various downstream tasks in autonomous driving research and development.

Abstract
The OmniHD-Scenes dataset combines data from 128-beam LiDAR, six cameras, and six 4D imaging radar systems to achieve full environmental perception. The dataset comprises 1501 clips, each approximately 30s long, totaling more than 450K synchronized frames and more than 5.85 million synchronized sensor data points. We also propose a novel 4D annotation pipeline. To date, we have annotated 200 clips with more than 514K precise 3D bounding boxes. These clips also include semantic segmentation annotations for static scene elements. Additionally, we introduce a novel automated pipeline for generation of the dense occupancy ground truth, which effectively leverages information from non-key frames. Alongside the proposed dataset, we establish comprehensive evaluation metrics, baseline models, and benchmarks for 3D detection and semantic occupancy prediction. These benchmarks utilize surround-view cameras and 4D imaging radar to explore cost-effective sensor solutions for autonomous driving applications.

Sensor Deployment Architecture Diagram

Sensor Coordinate System
The contributions of this study can be summarized as follows.
- Novel Dataset: Introduces OmniHD-Scenes, the first multimodal dataset with 4D radar point clouds for 3D object detection, multi-object tracking, and occupancy prediction.
- Extensive Data: Features 1501 clips (450K+ frames, 5.85M data points) covering diverse urban driving scenarios, including challenging conditions like rain and night.
- Advanced Annotation: A novel 4D annotation pipeline leverages temporal data for efficient and accurate 3D bounding box and semantic segmentation annotation.
- Comprehensive Benchmarks: Establishes benchmarks for 3D object detection and occupancy prediction, including baseline models using various sensor modalities.
Overview

Our OmniHD-Scenes dataset contains data from six 4D radar systems acquired with high-resolution cameras and LiDAR, and also includes data from day/night and bad-weather scenarios to accommodate more complex working conditions. The proposed dataset also includes segmentation and occupancy labels, thereby fully supporting multi-view and multi-sensor tasks.
Sensor And Controller Specifications

Annotation

MooreData Platform Annotation Pipeline
Since Tesla introduced the concept of 4D annotation,it has become a pivotal component of the data closed-loop process. This annotation technique utilizes poses to establish temporal relationships and represents traffic participants and road information over a period using dense point cloud reconstruction. Compared to traditional 3D annotation methods, the reconstructed map is denser, exhibits stronger global consistency, offers enhanced visual effects, significantly reduces repetitive tasks, and utilizes more prior information to ensure data reliability. Utilizing 4D tools for data generation can substantially lower data production costs and enhance data quality. In this study, we implemented semi-automatic 4D annotation using the MooreData platform solution, with the data being processed in clips.
Visualization Annotation Result
Annotation Result Display - Rain Scene
Annotation Result Display - Night Scene
Annotation Results Display - Day Scenario
Occupancy ground truth
Dataset Statistics
The OmniHD-Scenes dataset predominantly includes data for urban areas and overpass roads with complex traffic patterns, and spans a range of weather conditions and different times of day. Significantly, it comprises data for a substantial number of challenging scenarios, such as rainy conditions (33%) and night scenes (28%), offering valuable opportunities for the development and evaluation of more robust perception algorithms.
Moreover, a notable feature of OmniHD-Scenes is its incorporation of multi-view 4D radar. Fig. 12 illustrates the point-cloud quantity distributions per frame for both the LiDAR and 4D-radar data. The LiDAR point-cloud counts per frame are concentrated between 180K and 210K points, whereas those for the 4D radar primarily range from 2K to 4K points. Therefore,in terms of point-cloud density, the 4D-radar data are conspicuously sparse. This sparseness will inspire researchers to explore more effective ways of leveraging the unique characteristics of 4D radar or to integrate it with other modalities for enhanced perception efficiency.
3D OD Model Performance
OCC Model Performance
Conclusion And Future Work
In the future, we plan to expand OmniHD-Scenes to facilitate additional tasks and benchmarks, including trajectory prediction, visual grounding, and end-to-end autonomous driving.
Acknowledgement
Thanks to Intelligent Connected Vehicle inspection Center (Hunan) of CAERI Co., Ltd and Shanghai Geometrical Perception and Learning Co., Ltd.