autoware_image_projection_based_fusion#
Purpose#
The autoware_image_projection_based_fusion
package is designed to enhance obstacle detection accuracy by integrating information from both image-based and LiDAR-based perception. It fuses detected obstacles — such as bounding boxes or segmentation — from 2D images with 3D point clouds or other obstacle representations, including bounding boxes, clusters, or segmentation. This fusion helps to refine obstacle classification and detection in autonomous driving applications.
Fusion algorithms#
The package provides multiple fusion algorithms, each designed for specific use cases. Below are the different fusion methods along with their descriptions and detailed documentation links:
Fusion Name | Description | Detail |
---|---|---|
roi_cluster_fusion | Assigns classification labels to LiDAR-detected clusters by matching them with Regions of Interest (ROIs) from a 2D object detector. | link |
roi_detected_object_fusion | Updates classification labels of detected objects using ROI information from a 2D object detector. | link |
pointpainting_fusion | Augments the point cloud by painting each point with additional information from ROIs of a 2D object detector. The enriched point cloud is then processed by a 3D object detector for improved accuracy. | link |
roi_pointcloud_fusion | Matching pointcloud with ROIs from a 2D object detector to detect unknown-labeled objects. | link |
segmentation_pointcloud_fusion | Filtering pointcloud that are belong to less interesting region which is defined by semantic or instance segmentation by 2D image segmentation. | link |
Inner Workings / Algorithms#
The fusion process operates on two primary types of input data:
- Msg3d: This includes 3D data such as point clouds, bounding boxes, or clusters from LiDAR.
- RoIs (Regions of Interest): These are 2D detections or proposals from camera-based perception modules, such as object detection bounding boxes.
Both inputs come with timestamps, which are crucial for synchronization and fusion. Since sensors operate at different frequencies and may experience network delays, a systematic approach is needed to handle their arrival, align their timestamps, and ensure reliable fusion.
The following steps describe how the node processes these inputs, synchronizes them, and performs multi-sensor fusion.
Step 1: Matching and Creating a Collector#
When a Msg3d or a set of RoIs arrives, its timestamp is checked, and an offset is subtracted to determine the reference timestamp. The node then searches for an existing collector with the same reference timestamp.
- If a matching collector is found, the incoming data is added to it.
- If no matching collector exists, a new collector is created and initialized with the reference timestamp.
Step 2: Triggering the Timer#
Once a collector is created, a countdown timer is started. The timeout duration depends on which message type arrived first and is defined by either msg3d_timeout_sec
for msg3d or rois_timeout_sec
for RoIs.
The collector will attempt to fuse the collected 3D and 2D data either:
- When both Msg3d and RoI data are available, or
- When the timer expires.
If no Msg3d is received before the timer expires, the collector will discard the data without performing fusion.
Step 3: Fusion Process#
The fusion process consists of three main stages:
- Preprocessing – Preparing the input data for fusion.
- Fusion – Aligning and merging RoIs with the 3D point cloud.
- Postprocessing – Refining the fused output based on the algorithm's requirements.
The specific operations performed during these stages may vary depending on the type of fusion being applied.
Step 4: Publishing the Fused Result#
After the fusion process is completed, the fused output is published. The collector is then reset to an idle state, ready to process the next incoming message.
The figure below shows how the input data is fused in different scenarios.
Parameters#
All of the fusion nodes have the common parameters described in the following
Name | Type | Description | Default | Range |
---|---|---|---|---|
rois_timestamp_offsets | array | The timestamp offset between each RoIs and the msg3d in seconds. | [0.059, 0.01, 0.026, 0.042, 0.076, 0.093] | N/A |
rois_timeout_sec | float | Timer's timeout duration in seconds when the collector created by RoIs msg. | 0.5 | ≥0.001 |
msg3d_timeout_sec | float | Timer's timeout duration in seconds when the collector received msg3d. | 0.05 | ≥0.001 |
image_buffer_size | integer | The number of image buffer size for debug. | 15 | ≥1 |
point_project_to_unrectified_image | array | An array of options indicating whether to project point to unrectified image or not. | [False, False, False, False, False, False] | N/A |
filter_scope_min_x | float | Minimum x position to be considered for debug [m]. | -100.0 | N/A |
filter_scope_min_y | float | Minimum y position to be considered for debug [m]. | -100.0 | N/A |
filter_scope_min_z | float | Minimum z position to be considered for debug [m]. | -100.0 | N/A |
filter_scope_max_x | float | Maximum x position to be considered for debug [m]. | 100.0 | N/A |
filter_scope_max_y | float | Maximum y position to be considered for debug [m]. | 100.0 | N/A |
filter_scope_max_z | float | Maximum z position [m]. | 100.0 | N/A |
approximate_camera_projection | array | An array of options indicating whether to use approximated projection for each camera. | [True, True, True, True, True, True] | N/A |
approximation_grid_cell_width | float | The width of grid cell used in approximated projection [pixel]. | 1.0 | N/A |
approximation_grid_cell_height | float | The height of grid cell used in approximated projection [pixel]. | 1.0 | N/A |
debug_mode | boolean | Flag to enable or disable debug message output. | False | N/A |
collector_debug_mode | boolean | Flag to enable or disable collector's debug message output. | False | N/A |
publish_processing_time_detail | boolean | Flag to publish detail message for processing time. | False | N/A |
publish_previous_but_late_output_msg | boolean | Flag to indicate if the current fusion result should be published if its timestamp is earlier than the previous published fusion result. | False | N/A |
rosbag_length | float | This value determine if the rosbag has started from the beginning again. The value should be set smaller than the actual length of the bag. | 10.0 | N/A |
matching_strategy.type | string | Set it to advanced if you want to use more accurate and complicated logic for matching LiDAR and camera; otherwise, set it to naive . |
advanced | ['naive', 'advanced'] |
matching_strategy.threshold | float | A maximum threshold value to synchronize RoIs from multiple cameras in seconds. | 0.05 | ≥0.0 ≤0.1 |
matching_strategy.msg3d_noise_window | float | msg3d noise window in seconds. | 0.001 | ≥0.0 |
matching_strategy.rois_timestamp_noise_window | array | List of camera timestamp noise windows in seconds. The noise values should be specified in the same order as the input_topics. | [0.005, 0.005, 0.005, 0.005, 0.005, 0.005] | N/A |
Parameter Settings#
Timeout#
The order in which RoIs
or the msg3d
message arrives at the fusion node depends on your system and sensor configuration. Since the primary goal is to fuse 2D RoIs
with msg3d
data, msg3d
is essential for processing.
If RoIs
arrive earlier, they must wait until msg3d
is received. You can adjust the waiting time using the rois_timeout_sec
parameter.
If msg3d
arrives first, the fusion process should proceed as quickly as possible, so the waiting time for msg3d
(msg3d_timeout_sec
) should be kept minimal.
RoIs Offsets#
The offset between each camera and the LiDAR is determined by their shutter timing. To ensure accurate fusion, users must understand the timing offset between the RoIs
and msg3d
. Once this offset is known, it should be specified in the parameter rois_timestamp_offsets
.
In the figure below, the LiDAR completes a full scan from the rear in 100 milliseconds. When the LiDAR scan reaches the area where the camera is facing, the camera is triggered, capturing an image with a corresponding timestamp. The rois_timestamp_offsets
can then be calculated by subtracting the LiDAR header timestamp from the camera header timestamp. As a result, the rois_timestamp_offsets
would be [0.059, 0.010, 0.026, 0.042, 0.076, 0.093]
.
To check the header timestamp of the msg3d and RoIs, user can easily run
ros2 echo [topic] --header field
Matching Strategies#
We provide two matching strategies for different scenarios:
Naive Mode#
User should use this mode if the concatenation node from the Autoware point cloud preprocessor is not being used. User should also set an appropriate threshold
value to determine whether the time interval between the msg3d
and RoI
messages is within an acceptable range.
If the interval is less than the match threshold, the messages are considered matched.
-
Example usage:
matching_strategy: type: naive threshold: 0.05
Advanced Mode#
If the concatenation node from the Autoware point cloud preprocessor is being used, enable this mode.
The advanced mode parses diagnostics from the concatenation node to verify whether all point clouds have been successfully concatenated. If concatenation is incomplete, it dynamically adjusts rois_timestamp_offsets
based on diagnostic messages.
Instead of using a fixed threshold, this mode requires setting two parameters:
msg3d_noise_window
(a single value)rois_timestamp_noise_window
(a vector)
These parameters enforce stricter matching between the RoI
messages and msg3d
input.
-
Example usage:
matching_strategy: type: advanced msg3d_noise_window: 0.02 rois_timestamp_noise_window: [0.01, 0.01, 0.01, 0.01, 0.01, 0.01]
Approximate camera projection#
For algorithms like pointpainting_fusion
, the computation required to project points onto an unrectified (raw) image can be substantial. To address this, an option is provided to reduce the computational load. Set the approximate_camera_projection parameter
to true
for each camera (ROIs). If the corresponding point_project_to_unrectified_image
parameter is also set to true, the projections will be pre-cached.
The cached projections are calculated using a grid, with the grid size specified by the approximation_grid_width_size
and approximation_grid_height_size
parameters in the configuration file. The centers of the grid are used as the projected points.
Debug and Diagnostics#
To verify whether the node has successfully fuse the msg3d or rois, the user can examine rqt or the /diagnostics
topic using the following command:
ros2 topic echo /diagnostics
Below is an example output when the fusion is success:
- msg3d has a value of
True
. - Each rois has a value of
True
. - The
fusion_success
isTrue
. - The
level
value is\0
. (diagnostic_msgs::msg::DiagnosticStatus::OK)
header:
stamp:
sec: 1722492015
nanosec: 848508777
frame_id: ''
status:
- level: "\0"
name: 'pointpainting: pointpainting_fusion_status'
message: Fused output is published and include all rois and msg3d
hardware_id: pointpainting_fusion_checker
values:
- key: msg3d/is_fused
value: 'True'
- key: fused_timestamp
value: '1738725170.860273600'
- key: reference_timestamp_min
value: '1738725170.850771904'
- key: reference_timestamp_max
value: '1738725170.870771885'
- key: /rois0/timestamp
value: '1738725170.903310537'
- key: /rois0/is_fused
value: 'True'
- key: /rois1/timestamp
value: '1738725170.934378386'
- key: /rois1/is_fused
value: 'True'
- key: /rois2/timestamp
value: '1738725170.917550087'
- key: /rois2/is_fused
value: 'True'
- key: fusion_success
value: 'True'
Below is an example output when the fusion is failed:
- msg3d has a value of
True
. - Each rois has a value of
False
. - The
fusion_success
isFalse
. - The
level
value is\x02
. (diagnostic_msgs::msg::DiagnosticStatus::ERROR)
header:
stamp:
sec: 1722492015
nanosec: 848508777
frame_id: ''
status:
- level: "\x02"
name: 'pointpainting: pointpainting_fusion_status'
message: Fused output msg is published but misses some ROIs
hardware_id: pointpainting_fusion_checker
values:
- key: msg3d/is_fused
value: 'True'
- key: fused_timestamp
value: '1738725170.860273600'
- key: reference_timestamp_min
value: '1738725170.850771904'
- key: reference_timestamp_max
value: '1738725170.870771885'
- key: /rois0/is_fused
value: 'False'
- key: /rois1/timestamp
value: '1738725170.934378386'
- key: /rois1/is_fused
value: 'True'
- key: /rois2/timestamp
value: '1738725170.917550087'
- key: /rois2/is_fused
value: 'True'
- key: fusion_success
value: 'False'
The build_only
option#
The pointpainting_fusion
node has build_only
option to build the TensorRT engine file from the ONNX file.
Although it is preferred to move all the ROS parameters in .param.yaml
file in Autoware Universe, the build_only
option is not moved to the .param.yaml
file for now, because it may be used as a flag to execute the build as a pre-task. You can execute with the following command:
ros2 launch autoware_image_projection_based_fusion pointpainting_fusion.launch.xml model_name:=pointpainting model_path:=/home/autoware/autoware_data/image_projection_based_fusion model_param_path:=$(ros2 pkg prefix autoware_image_projection_based_fusion --share)/config/pointpainting.param.yaml build_only:=true