Skip to content

autoware_bevfusion#

Purpose#

The autoware_bevfusion package is used for 3D object detection based on lidar or camera-lidar fusion.

Inner-workings / Algorithms#

This package implements a TensorRT powered inference node for BEVFusion [1]. The sparse convolution backend corresponds to spconv. Autoware installs it automatically in its setup script. If needed, the user can also build it and install it following the following instructions.

Inputs / Outputs#

Input#

Name Type Description
~/input/pointcloud sensor_msgs::msg::PointCloud2 Input pointcloud topics.
~/input/image* sensor_msgs::msg::Image Input image topics.
~/input/camera_info* sensor_msgs::msg::CameraInfo Input camera info topics.

Output#

Name Type Description
~/output/objects autoware_perception_msgs::msg::DetectedObjects Detected objects.
debug/cyclic_time_ms tier4_debug_msgs::msg::Float64Stamped Cyclic time (ms).
debug/pipeline_latency_ms tier4_debug_msgs::msg::Float64Stamped Pipeline latency time (ms).
debug/processing_time/preprocess_ms tier4_debug_msgs::msg::Float64Stamped Preprocess (ms).
debug/processing_time/inference_ms tier4_debug_msgs::msg::Float64Stamped Inference time (ms).
debug/processing_time/postprocess_ms tier4_debug_msgs::msg::Float64Stamped Postprocess time (ms).
debug/processing_time/total_ms tier4_debug_msgs::msg::Float64Stamped Total processing time (ms).

Parameters#

BEVFusion node#

Name Type Description Default Range
sensor_fusion boolean Whether to use camera-lidar fusion or not. True N/A
max_camera_lidar_delay float Maximum delay between the lidar and the latest camera in seconds. 0.0 ≥0.0
plugins_path string A path to the TensorRT plugins. $(find-pkg-share autoware_bevfusion)/plugins/libautoware_tensorrt_plugins.so N/A
trt_precision string A precision of TensorRT engine. fp16 ['fp16', 'fp32']
cloud_capacity integer Capacity of the point cloud buffer (should be set to at least the maximum theoretical number of points). 2000000 ≥1
onnx_path string A path to ONNX model file. $(var model_path)/bevfusion.onnx N/A
engine_path string A path to TensorRT engine file. $(var model_path)/bevfusion.engine N/A
densification_num_past_frames integer A number of past frames to be considered as same input frame. 1 ≥0
densification_world_frame_id string A name of frame id where world coordinates system is defined with respect to. map N/A
circle_nms_dist_threshold float A distance threshold between detections in NMS. 0.5 ≥0.0
iou_nms_search_distance_2d float A maximum distance value to search the nearest objects. 10.0 ≥0.0
iou_nms_threshold float A threshold value of NMS using IoU score. 0.1 ≥0.0
≤1.0
yaw_norm_thresholds array A thresholds array of direction vectors norm, all of objects with vector norm less than this threshold are ignored. [0.3, 0.3, 0.3, 0.3, 0.0] N/A
score_threshold float A threshold value of confidence score, all of objects with score less than this threshold are ignored. 0.2 ≥0.0

BEVFusion model#

Name Type Description Default Range
class_names array Predicted classes' names. ['CAR', 'TRUCK', 'BUS', 'BICYCLE', 'PEDESTRIAN'] N/A
voxels_num array Voxel ranges used during inference [min, opt, max]. [5000, 30000, 60000] N/A
point_cloud_range array Range in meters of the pointcloud in meters [min_x, min_y, min_z, max_x, max_y, max_z]. [-76.8, -76.8, -3.0, 76.8, 76.8, 5.0] N/A
voxel_size array Voxels size [x, y, z] in meters. [0.3, 0.3, 8.0] N/A
num_proposals integer Number of object proposals. 500 ≥1
out_size_factor integer Output size factor using in the network. 8 ≥1
max_points_per_voxel integer Maximum number of points that a voxel can hold. 10 ≥1
d_bound array Distance bounds used in the view transform in meters (min, max, and step). [1.0, 166.2, 1.4] N/A
x_bound array x-axis bounds used in the view transform in meters (min, max, and step). [-122.4, 122.4, 0.68] N/A
y_bound array y-axis bounds used in the view transform in meters (min, max, and step). [-122.4, 122.4, 0.68] N/A
z_bound array z-axis bounds used in the view transform in meters (min, max, and step). [-10.0, 10.0, 20.0] N/A
num_cameras integer Number of cameras to use. 6 ≥0
raw_image_height integer Raw image height in pixels. 1080 ≥0
raw_image_width integer Raw image width in pixels. 1440 ≥0
img_aug_scale_x float Raw image scaling before ROI extraction. 0.489 ≥0.0
≤1.0
img_aug_scale_y float Raw image scaling before ROI extraction. 0.489 ≥0.0
≤1.0
roi_height integer ROI image height (input to the network) in pixels. 384 ≥0
roi_width integer ROI image width (input to the network) in pixels. 704 ≥0
features_height integer Image features height (output of the image backbone) in pixels. 48 ≥0
features_width integer Image features width (output of the image backbone) in pixels. 88 ≥0
num_depth_features integer Number of depth features used in the view transform. 118 ≥0

Detection class remapper#

Name Type Description Default Range
allow_remapping_by_area_matrix array Whether to allow remapping of classes. The order of 8x8 matrix classes comes from ObjectClassification msg. [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] N/A
min_area_matrix array Minimum area for specific class to consider class remapping. [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.1, 0.0, 36.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] N/A
max_area_matrix array Maximum area for specific class to consider class remapping. [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.0, 0.0, 999.999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 999.999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 999.999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] N/A

The build_only option#

The autoware_bevfusion node has a build_only option to build the TensorRT engine file from the specified ONNX file, after which the program exits.

ros2 launch autoware_bevfusion bevfusion.launch.xml build_only:=true

The log_level option#

The default logging severity level for autoware_bevfusion is info. For debugging purposes, the developer may decrease severity level using log_level parameter:

ros2 launch autoware_bevfusion bevfusion.launch.xml log_level:=debug

Assumptions / Known limits#

This node assumes that the input pointcloud follows the PointXYZIRC layout defined in autoware_point_types.

Trained Models#

You can download the onnx and config files in the following links. The files need to be placed inside $(env HOME)/autoware_data/bevfusion

The model was trained in TIER IV's internal database (~35k lidar frames) for 30 epochs.

Changelog#

[1] Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, and Song Han. "BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation." 2023 International Conference on Robotics and Automation.

(Optional) Future extensions / Unimplemented parts#

Although this node can perform camera-lidar fusion, as it is the first method in autoware to actually use images and lidars for inference, the package structure and its full integration in the autoware pipeline are left for future work. In the current structure, it can be employed without any changes as a lidar-based detector.