autoware_bevfusion#

Purpose#

The autoware_bevfusion package is used for 3D object detection based on lidar or camera-lidar fusion.

Inner-workings / Algorithms#

This package implements a TensorRT powered inference node for BEVFusion [1]. The sparse convolution backend corresponds to spconv. Autoware installs it automatically in its setup script. If needed, the user can also build it and install it following the following instructions.

Inputs / Outputs#

Input#

Name	Type	Description
`~/input/pointcloud`	`sensor_msgs::msg::PointCloud2`	Input pointcloud topics.
`~/input/image*`	`sensor_msgs::msg::Image`	Input image topics. The image is assumed to be RGB8 encoding.
`~/input/camera_info*`	`sensor_msgs::msg::CameraInfo`	Input camera info topics.

Output#

Name	Type	Description
`~/output/objects`	`autoware_perception_msgs::msg::DetectedObjects`	Detected objects.
`debug/cyclic_time_ms`	`tier4_debug_msgs::msg::Float64Stamped`	Cyclic time (ms).
`debug/pipeline_latency_ms`	`tier4_debug_msgs::msg::Float64Stamped`	Pipeline latency time (ms).
`debug/processing_time/preprocess_ms`	`tier4_debug_msgs::msg::Float64Stamped`	Preprocess (ms).
`debug/processing_time/inference_ms`	`tier4_debug_msgs::msg::Float64Stamped`	Inference time (ms).
`debug/processing_time/postprocess_ms`	`tier4_debug_msgs::msg::Float64Stamped`	Postprocess time (ms).
`debug/processing_time/total_ms`	`tier4_debug_msgs::msg::Float64Stamped`	Total processing time (ms).

Parameters#

BEVFusion node#

Name	Type	Description	Default	Range
max_camera_lidar_delay	float	Maximum delay between the lidar and the latest camera in seconds.	0.0	≥0.0
plugins_path	string	A path to the TensorRT plugins.	$(find-pkg-share autoware_bevfusion)/plugins/libautoware_tensorrt_plugins.so	N/A
trt_precision	string	A precision of TensorRT engine.	fp16	['fp16', 'fp32']
cloud_capacity	integer	Capacity of the point cloud buffer (should be set to at least the maximum theoretical number of points).	2000000	≥1
onnx_path	string	A path to ONNX model file.	$(var model_path)/bevfusion_camera_lidar.onnx	N/A
engine_path	string	A path to TensorRT engine file.	$(var model_path)/bevfusion_camera_lidar.engine	N/A
image_backbone_onnx_path	string	A path to ONNX model file for image backbone. For fusion mode, all three image_backbone_* parameters must be non-empty. For lidar-only mode, all three must be empty strings.	$(var model_path)/bevfusion_image_backbone.onnx	N/A
image_backbone_engine_path	string	A path to TensorRT engine file for image backbone. For fusion mode, all three image_backbone_* parameters must be non-empty. For lidar-only mode, all three must be empty strings.	$(var model_path)/bevfusion_image_backbone.engine	N/A
image_backbone_trt_precision	string	A precision of TensorRT engine for image backbone. For fusion mode, all three image_backbone_* parameters must be non-empty. For lidar-only mode, all three must be empty strings.	fp16	['fp16', 'fp32', '']
densification_num_past_frames	integer	A number of past frames to be considered as same input frame.	1	≥0
densification_world_frame_id	string	A name of frame id where world coordinates system is defined with respect to.	map	N/A
run_image_undistortion	boolean	A flag to run image undistortion.	True	N/A
circle_nms_dist_threshold	float	A distance threshold between detections in NMS.	0.5	≥0.0
iou_nms_search_distance_2d	float	A maximum distance value to search the nearest objects.	10.0	≥0.0
iou_nms_threshold	float	A threshold value of NMS using IoU score.	0.1	≥0.0 ≤1.0
yaw_norm_thresholds	array	A thresholds array of direction vectors norm, all of objects with vector norm less than this threshold are ignored.	[0.3, 0.3, 0.3, 0.3, 0.0]	N/A
score_threshold	float	A threshold value of confidence score, all of objects with score less than this threshold are ignored.	0.2	≥0.0

BEVFusion model#

Name	Type	Description	Default	Range
class_names	array	Predicted classes' names.	['CAR', 'TRUCK', 'BUS', 'BICYCLE', 'PEDESTRIAN']	N/A
voxels_num	array	Voxel ranges used during inference [min, opt, max].	[5000, 30000, 60000]	N/A
point_cloud_range	array	Range in meters of the pointcloud in meters [min_x, min_y, min_z, max_x, max_y, max_z].	[-76.8, -76.8, -3.0, 76.8, 76.8, 5.0]	N/A
voxel_size	array	Voxels size [x, y, z] in meters.	[0.3, 0.3, 8.0]	N/A
num_proposals	integer	Number of object proposals.	500	≥1
out_size_factor	integer	Output size factor using in the network.	8	≥1
max_points_per_voxel	integer	Maximum number of points that a voxel can hold.	10	≥1
d_bound	array	Distance bounds used in the view transform in meters (min, max, and step).	[1.0, 166.2, 1.4]	N/A
x_bound	array	x-axis bounds used in the view transform in meters (min, max, and step).	[-122.4, 122.4, 0.68]	N/A
y_bound	array	y-axis bounds used in the view transform in meters (min, max, and step).	[-122.4, 122.4, 0.68]	N/A
z_bound	array	z-axis bounds used in the view transform in meters (min, max, and step).	[-10.0, 10.0, 20.0]	N/A
num_cameras	integer	Number of cameras to use.	6	≥0
raw_image_height	integer	Raw image height in pixels.	1080	≥0
raw_image_width	integer	Raw image width in pixels.	1440	≥0
img_aug_scale_x	float	Raw image scaling before ROI extraction.	0.489	≥0.0 ≤1.0
img_aug_scale_y	float	Raw image scaling before ROI extraction.	0.489	≥0.0 ≤1.0
roi_height	integer	ROI image height (input to the network) in pixels.	384	≥0
roi_width	integer	ROI image width (input to the network) in pixels.	704	≥0
features_height	integer	Image features height (output of the image backbone) in pixels.	48	≥0
features_width	integer	Image features width (output of the image backbone) in pixels.	88	≥0
num_depth_features	integer	Number of depth features used in the view transform.	118	≥0
image_feature_channel	integer	Image feature dimension (output channels of the image backbone).	256	≥0
use_intensity	boolean	Whether to use intensity feature in point cloud processing.	True	N/A

Detection class remapper#

Name	Type	Description	Default	Range
allow_remapping_by_area_matrix	array	Whether to allow remapping of classes. The order of 8x8 matrix classes comes from ObjectClassification msg.	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]	N/A
min_area_matrix	array	Minimum area for specific class to consider class remapping.	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.1, 0.0, 36.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	N/A
max_area_matrix	array	Maximum area for specific class to consider class remapping.	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.0, 0.0, 999.999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 999.999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 999.999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	N/A

The `build_only` option#

The autoware_bevfusion node has a build_only option to build the TensorRT engine file from the specified ONNX file, after which the program exits.

bash ros2 launch autoware_bevfusion bevfusion.launch.xml build_only:=true

The `log_level` option#

The default logging severity level for autoware_bevfusion is info. For debugging purposes, the developer may decrease severity level using log_level parameter:

bash ros2 launch autoware_bevfusion bevfusion.launch.xml log_level:=debug

Assumptions / Known limits#

This node assumes that the input pointcloud follows the PointXYZIRC layout defined in autoware_point_types.

Trained Models#

You can download the onnx and config files in the following links. The files need to be placed inside $(env HOME)/autoware_data/ml_models/bevfusion

lidar-only model:
- onnx
- config
camera-lidar model:
- onnx
- config
class remapper

The model was trained in TIER IV's internal database (~35k lidar frames) for 30 epochs.

Changelog#

References/External links#

[1] Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, and Song Han. "BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation." 2023 International Conference on Robotics and Automation.

(Optional) Future extensions / Unimplemented parts#

Although this node can perform camera-lidar fusion, as it is the first method in autoware to actually use images and lidars for inference, the package structure and its full integration in the autoware pipeline are left for future work. In the current structure, it can be employed without any changes as a lidar-based detector.