autoware_camera_streampetr#
Purpose#
The autoware_camera_streampetr
package is used for 3D object detection based on images only.
Inner-workings / Algorithms#
This package implements a TensorRT powered inference node for StreamPETR [1]. This is the first camera-only 3D object detection node in autoware.
This node has been optimized for multi-camera systems where the camera topics are published in a sequential manner, not all at once. The node takes advantage of this by preprocessing (resize, crop, normalize) the images and storing them appropriately on GPU, so that delay due to preprocessing can be minimized.
Topic for image_i arrived -------------------------
| |
| |
| |
v |
Is image distorted? |
| \ |
| \ |
Yes No |Image Updates
| | |done in parallel, if multitheading is on
v | |otherwise done sequentially in FIFO order
Undistort | |
| | |
v v |
Load image into GPU memory |
| |
v |
Preprocess image (scale & crop ROI & normalize) |
| |
v |
Store in GPU memory binding location for model input |
| -------------------------|
v |
Is image the `anchor_image`? |
| \ |
| \ |
No Yes |
| | |
v v | If multithreading is on
(Wait) Are all images synced within `max_time_difference`? | image Updates are temporarily frozen
| \ | until this part completes.
| \ |
Yes No |
| | |
v v |
Perform model forward pass (Sync failed! Skip prediction) |
| |
v |
Postprocess (NMS + ROS2 format) |
| |
v |
Publish predictions -------------------------|
Inputs / Outputs#
Input#
Name | Type | Description |
---|---|---|
~/input/camera*/image |
sensor_msgs::msg::Image or sensor_msgs::msg::CompressedImage |
Input image topics (supports both compressed and uncompressed). |
~/input/camera*/camera_info |
sensor_msgs::msg::CameraInfo |
Input camera info topics, for camera parameters. |
Output#
Name | Type | Description | RTX 3090 Latency (ms) |
---|---|---|---|
~/output/objects |
autoware_perception_msgs::msg::DetectedObjects |
Detected objects. | — |
latency/preprocess |
autoware_internal_debug_msgs::msg::Float64Stamped |
Preprocessing time per image(ms). | 3.25 |
latency/total |
autoware_internal_debug_msgs::msg::Float64Stamped |
Total processing time (ms): preprocessing + inference + postprocessing. | 26.04 |
latency/inference |
autoware_internal_debug_msgs::msg::Float64Stamped |
Total inference time (ms). | 22.13 |
latency/inference/backbone |
autoware_internal_debug_msgs::msg::Float64Stamped |
Backbone inference time (ms). | 16.21 |
latency/inference/ptshead |
autoware_internal_debug_msgs::msg::Float64Stamped |
Points head inference time (ms). | 5.45 |
latency/inference/pos_embed |
autoware_internal_debug_msgs::msg::Float64Stamped |
Position embedding inference time (ms). | 0.40 |
latency/inference/postprocess |
autoware_internal_debug_msgs::msg::Float64Stamped |
nms + filtering + converting network predictions to autoware format (ms). | 0.40 |
latency/cycle_time_ms |
autoware_internal_debug_msgs::msg::Float64Stamped |
Time between two consecutive predictions (ms). | 110.65 |
Parameters#
StreamPETR node#
The autoware_camera_streampetr
node has various parameters for configuration:
Model Parameters#
model_params.backbone_path
: Path to the backbone ONNX modelmodel_params.head_path
: Path to the head ONNX modelmodel_params.position_embedding_path
: Path to the position embedding ONNX modelmodel_params.fp16_mode
: Enable FP16 inference modemodel_params.use_temporal
: Enable temporal modelingmodel_params.input_image_height
: Input image height for preprocessingmodel_params.input_image_width
: Input image width for preprocessingmodel_params.class_names
: List of detection class namesmodel_params.num_proposals
: Number of object proposalsmodel_params.detection_range
: Detection range for filtering objects
Post-processing Parameters#
post_process_params.iou_nms_search_distance_2d
: 2D search distance for IoU NMSpost_process_params.circle_nms_dist_threshold
: Distance threshold for circle NMSpost_process_params.iou_nms_threshold
: IoU threshold for NMSpost_process_params.confidence_threshold
: Confidence threshold for detectionspost_process_params.yaw_norm_thresholds
: Yaw normalization thresholds
Node Parameters#
max_camera_time_diff
: Maximum allowed time difference between cameras (seconds)rois_number
: Number of camera ROIs/cameras (default: 6)is_compressed_image
: Whether input images are compressedis_distorted_image
: Whether input images are distorteddownsample_factor
: If is_distorted_image istrue
, factor to downsample the image by during undistortion. Makes undistortion faster than using full scalemultithreading
: Whether to use multithreading for handling image callbacksanchor_camera_id
: ID of the anchor camera for synchronization (default: 0)debug_mode
: Enable debug mode for timing measurementsbuild_only
: Build TensorRT engines and exit without running inference
The build_only
option#
The autoware_camera_streampetr
node has a build_only
option to build the TensorRT engine files from the specified ONNX files, after which the program exits.
ros2 launch autoware_camera_streampetr tensorrt_stream_petr.launch.xml build_only:=true
The log_level
option#
The default logging severity level for autoware_camera_streampetr
is info
. For debugging purposes, the developer may decrease severity level using log_level
parameter:
ros2 launch autoware_camera_streampetr tensorrt_stream_petr.launch.xml log_level:=debug
Assumptions / Known limits#
This node is camera-only and does not require pointcloud input. It assumes:
- All cameras are synchronized within the specified
max_camera_time_diff
- Camera calibration information is available and accurate
- The anchor camera (specified by
anchor_camera_id
) triggers the inference cycle - Transform information between camera frames and base_link is available via tf
- Transform information between map and base_link is available via tf for ego motion compensation
- The input images are undistorted
Trained Models#
You can download the ONNX model files for StreamPETR. The files should be placed in the appropriate model directory as specified in the launch configuration.
Required model files:
- Backbone ONNX model: TODO
- Head ONNX model: TODO
- Position embedding ONNX model: TODO
If you want to train and deploy your own model, you can find the source code for that in AWML.
Changelog#
References/External links#
[1] Wang, Shihao and Liu, Yingfei and Wang, Tiancai and Li, Ying and Zhang, Xiangyu. "Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection." 2023
(Optional) Future extensions / Unimplemented parts#
- Enable 2d object detection. Because 2d object detection is used as an auxiliary loss during training, the same node can easily support 2d object detection with minor updates.
- Implement int8 quantization for the backbone to further reduce inference latency
- Execute the image backbone for each image as they arrive, to further reduce latency.
- Add velocity to predictions.