autoware_bevfusion#
Purpose#
The autoware_bevfusion
package is used for 3D object detection based on lidar or camera-lidar fusion.
Inner-workings / Algorithms#
This package implements a TensorRT powered inference node for BEVFusion [1]. The sparse convolution backend corresponds to spconv. Autoware installs it automatically in its setup script. If needed, the user can also build it and install it following the following instructions.
Inputs / Outputs#
Input#
Name | Type | Description |
---|---|---|
~/input/pointcloud |
sensor_msgs::msg::PointCloud2 |
Input pointcloud topics. |
~/input/image* |
sensor_msgs::msg::Image |
Input image topics. |
~/input/camera_info* |
sensor_msgs::msg::CameraInfo |
Input camera info topics. |
Output#
Name | Type | Description |
---|---|---|
~/output/objects |
autoware_perception_msgs::msg::DetectedObjects |
Detected objects. |
debug/cyclic_time_ms |
tier4_debug_msgs::msg::Float64Stamped |
Cyclic time (ms). |
debug/pipeline_latency_ms |
tier4_debug_msgs::msg::Float64Stamped |
Pipeline latency time (ms). |
debug/processing_time/preprocess_ms |
tier4_debug_msgs::msg::Float64Stamped |
Preprocess (ms). |
debug/processing_time/inference_ms |
tier4_debug_msgs::msg::Float64Stamped |
Inference time (ms). |
debug/processing_time/postprocess_ms |
tier4_debug_msgs::msg::Float64Stamped |
Postprocess time (ms). |
debug/processing_time/total_ms |
tier4_debug_msgs::msg::Float64Stamped |
Total processing time (ms). |
Parameters#
BEVFusion node#
Name | Type | Description | Default | Range |
---|---|---|---|---|
sensor_fusion | boolean | Whether to use camera-lidar fusion or not. | True | N/A |
max_camera_lidar_delay | float | Maximum delay between the lidar and the latest camera in seconds. | 0.0 | ≥0.0 |
plugins_path | string | A path to the TensorRT plugins. | $(find-pkg-share autoware_bevfusion)/plugins/libautoware_tensorrt_plugins.so | N/A |
trt_precision | string | A precision of TensorRT engine. | fp16 | ['fp16', 'fp32'] |
cloud_capacity | integer | Capacity of the point cloud buffer (should be set to at least the maximum theoretical number of points). | 2000000 | ≥1 |
onnx_path | string | A path to ONNX model file. | $(var model_path)/bevfusion.onnx | N/A |
engine_path | string | A path to TensorRT engine file. | $(var model_path)/bevfusion.engine | N/A |
densification_num_past_frames | integer | A number of past frames to be considered as same input frame. | 1 | ≥0 |
densification_world_frame_id | string | A name of frame id where world coordinates system is defined with respect to. | map | N/A |
circle_nms_dist_threshold | float | A distance threshold between detections in NMS. | 0.5 | ≥0.0 |
iou_nms_search_distance_2d | float | A maximum distance value to search the nearest objects. | 10.0 | ≥0.0 |
iou_nms_threshold | float | A threshold value of NMS using IoU score. | 0.1 | ≥0.0 ≤1.0 |
yaw_norm_thresholds | array | A thresholds array of direction vectors norm, all of objects with vector norm less than this threshold are ignored. | [0.3, 0.3, 0.3, 0.3, 0.0] | N/A |
score_threshold | float | A threshold value of confidence score, all of objects with score less than this threshold are ignored. | 0.2 | ≥0.0 |
BEVFusion model#
Name | Type | Description | Default | Range |
---|---|---|---|---|
class_names | array | Predicted classes' names. | ['CAR', 'TRUCK', 'BUS', 'BICYCLE', 'PEDESTRIAN'] | N/A |
voxels_num | array | Voxel ranges used during inference [min, opt, max]. | [5000, 30000, 60000] | N/A |
point_cloud_range | array | Range in meters of the pointcloud in meters [min_x, min_y, min_z, max_x, max_y, max_z]. | [-76.8, -76.8, -3.0, 76.8, 76.8, 5.0] | N/A |
voxel_size | array | Voxels size [x, y, z] in meters. | [0.3, 0.3, 8.0] | N/A |
num_proposals | integer | Number of object proposals. | 500 | ≥1 |
out_size_factor | integer | Output size factor using in the network. | 8 | ≥1 |
max_points_per_voxel | integer | Maximum number of points that a voxel can hold. | 10 | ≥1 |
d_bound | array | Distance bounds used in the view transform in meters (min, max, and step). | [1.0, 166.2, 1.4] | N/A |
x_bound | array | x-axis bounds used in the view transform in meters (min, max, and step). | [-122.4, 122.4, 0.68] | N/A |
y_bound | array | y-axis bounds used in the view transform in meters (min, max, and step). | [-122.4, 122.4, 0.68] | N/A |
z_bound | array | z-axis bounds used in the view transform in meters (min, max, and step). | [-10.0, 10.0, 20.0] | N/A |
num_cameras | integer | Number of cameras to use. | 6 | ≥0 |
raw_image_height | integer | Raw image height in pixels. | 1080 | ≥0 |
raw_image_width | integer | Raw image width in pixels. | 1440 | ≥0 |
img_aug_scale_x | float | Raw image scaling before ROI extraction. | 0.489 | ≥0.0 ≤1.0 |
img_aug_scale_y | float | Raw image scaling before ROI extraction. | 0.489 | ≥0.0 ≤1.0 |
roi_height | integer | ROI image height (input to the network) in pixels. | 384 | ≥0 |
roi_width | integer | ROI image width (input to the network) in pixels. | 704 | ≥0 |
features_height | integer | Image features height (output of the image backbone) in pixels. | 48 | ≥0 |
features_width | integer | Image features width (output of the image backbone) in pixels. | 88 | ≥0 |
num_depth_features | integer | Number of depth features used in the view transform. | 118 | ≥0 |
Detection class remapper#
Name | Type | Description | Default | Range |
---|---|---|---|---|
allow_remapping_by_area_matrix | array | Whether to allow remapping of classes. The order of 8x8 matrix classes comes from ObjectClassification msg. | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] | N/A |
min_area_matrix | array | Minimum area for specific class to consider class remapping. | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.1, 0.0, 36.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | N/A |
max_area_matrix | array | Maximum area for specific class to consider class remapping. | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.0, 0.0, 999.999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 999.999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 999.999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | N/A |
The build_only
option#
The autoware_bevfusion
node has a build_only
option to build the TensorRT engine file from the specified ONNX file, after which the program exits.
ros2 launch autoware_bevfusion bevfusion.launch.xml build_only:=true
The log_level
option#
The default logging severity level for autoware_bevfusion
is info
. For debugging purposes, the developer may decrease severity level using log_level
parameter:
ros2 launch autoware_bevfusion bevfusion.launch.xml log_level:=debug
Assumptions / Known limits#
This node assumes that the input pointcloud follows the PointXYZIRC
layout defined in autoware_point_types
.
Trained Models#
You can download the onnx and config files in the following links.
The files need to be placed inside $(env HOME)/autoware_data/bevfusion
- lidar-only model:
- camera-lidar model:
- class remapper
The model was trained in TIER IV's internal database (~35k lidar frames) for 30 epochs.
Changelog#
References/External links#
[1] Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, and Song Han. "BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation." 2023 International Conference on Robotics and Automation.
(Optional) Future extensions / Unimplemented parts#
Although this node can perform camera-lidar fusion, as it is the first method in autoware to actually use images and lidars for inference, the package structure and its full integration in the autoware pipeline are left for future work. In the current structure, it can be employed without any changes as a lidar-based detector.