TensorRT Plugins #

Plugins#

Grid Sampler#

OP Name	Attributes	Inputs	Outputs	FP32 Speed	FP16 Speed	INT8 Speed	Half Type	Tensor Format	Test Device
GridSampler2DTRT	interpolation_mode: int padding_mode: int align_corners: int	input: T grid: T	output: T	x1	x2.0	x3.8	nv_half	kLinear, kCHW4	RTX 2080Ti
GridSampler2DTRT2	interpolation_mode: int padding_mode: int align_corners: int	input: T grid: T	output: T	x1	x3.1	x3.8	nv_half2	kLinear, kCHW2, kCHW4	RTX 2080Ti
GridSampler3DTRT	interpolation_mode: int padding_mode: int align_corners: int	input: T grid: T	output: T	x1	x1.3	-	nv_half	kLinear	RTX 2080Ti
GridSampler3DTRT2	interpolation_mode: int padding_mode: int align_corners: int	input: T grid: T	output: T	x1	x2.2	-	nv_half2	kLinear	RTX 2080Ti

Inputs#

input: T[float/half/half2/int8]

Tensor shape: [N, C, H_in, W_in] (4D case) or [N, C, D_in, H_in, W_in] (5D case)

grid: T[float/half/half2/int8]

Tensor shape: [N, 2, H_out, W_out] (4D case) or [N, 3, D_out, H_out, W_out] (5D case)

grid specifies the sampling pixel locations normalized by the input spatial dimensions. Therefore, it should have most values in the range of [-10, 10]. For example, values x = -10, y = -10 is the left-top pixel of input, and values x = 10, y = 10 is the right-bottom pixel of input.

Attributes#

interpolation_mode: int

Interpolation mode to calculate output values. (0: bilinear , 1: nearest, 2: bicubic)

Note: bicubic supports only 4-D input.

padding_mode: int

Padding mode for outside grid values. (0: zeros, 1: border, 2: reflection)

align_corners: int

If align_corners=1, the extrema (-1 and 1) are considered as referring to the center points of the input's corner pixels. If align_corners=0, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic.

Outputs#

output: T[float/half/half2/int8]

Tensor shape: [N, C, H_out, W_out] (4D case) or [N, C, D_out, H_out, W_out] (5D case)

Multi-scale Deformable Attention#

OP Name	Attributes	Inputs	Outputs	FP32 Speed	FP16 Speed	INT8/FP16 Speed	Half Type	Tensor Format	Test Device
MultiScaleDeformableAttnTRT	-	value: T value_spatial_shapes: T sampling_locations: T attention_weights: T	output: T	x1	x1.3	x3.2	nv_half	kLinear	RTX 2080Ti
MultiScaleDeformableAttnTRT2	-	value: T value_spatial_shapes: T value_level_start_index: T sampling_locations: T attention_weights: T	output: T	x1	x2.0	x2.7	nv_half2	kLinear	RTX 2080Ti

Inputs#

value: T[float/half/half2/int8]

Tensor shape: [N, num_keys, mum_heads, channel]

value_spatial_shapes: T[int32]

Spatial shape of each feature map, has shape [num_levels, 2], last dimension 2 represent (h, w)

reference_points: T[float/half2]

The reference points.

Tensor shape: [N, num_queries, 1, points_per_group * 2]

sampling_offsets: T[float/half/half2/int8]

The offset of sampling points.

Tensor shape: [N, num_queries, num_heads, num_levels * num_points * 2]

attention_weights: T[float/half/int8]

The weight of sampling points used when calculate the attention (before softmax), has shape[N ,num_queries, num_heads, num_levels * num_points].

Attributes#

-

Outputs#

output: T[float/half/int8]

Tensor shape: [N, num_queries, mum_heads, channel]

Modulated Deformable Conv2d#

OP Name	Attributes	Inputs	Outputs	FP32 Speed	FP16 Speed	INT8/FP16 Speed	Half Type	Tensor Format	Test Device
ModulatedDeformableConv2dTRT	stride: int[2] padding: int[2] dilation: int[2] groups: int deform_groups: int	input: T offset: T mask: T weight: T bias: T (optional)	output: T	x1	x2.9	x3.7	nv_half	kLinear, kCHW4	RTX 2080Ti
ModulatedDeformableConv2dTRT2	stride: int[2] padding: int[2] dilation: int[2] groups: int deform_groups: int	input: T offset: T mask: T weight: T bias: T (optional)	output: T	x1	x3.5	x3.7	nv_half2	kLinear, kCHW2, kCHW4	RTX 2080Ti

Inputs#

input: T[float/half/half2/int8]

Tensor shape: [N, C_in, H_in, W_in]

offset: T[float/half/half2/int8]

Tensor shape: [N, deform_groups*K_h*K_w*2, H_out, W_out]

mask: T[float/half/half2/int8]

Tensor shape: [N, deform_groups*K_h*K_w, H_out, W_out]

weight: T[float/half/half2/int8]

Tensor shape: [C_out, C_in/groups, K_h, K_w]

bias: T[float/half/half2] (optional)

Tensor shape: [C_out]

Attributes#

stride: int[2]

Same as torch.nn.Conv2d.

padding: int[2]

Same as torch.nn.Conv2d.

dilation: int[2]

Same as torch.nn.Conv2d.

groups: int

Same as torch.nn.Conv2d.

deform_groups: int

Deformable conv2d groups.

Outputs#

output: T[float/half/half2/int8]

Tensor shape: [N, C_out, H_out, W_out]

NOTE: Values (C_in / groups) and (C_in / deform_groups) should be even numbers.

Rotate#

OP Name	Attributes	Inputs	Outputs	FP32 Speed	FP16 Speed	INT8/FP16 Speed	Half Type	Tensor Format	Test Device
RotateTRT	interpolation: int	img: T angle: T center: T	output: T	x1	X1.8	X4.4	nv_half	kLinear, kCHW4	RTX 2080Ti
RotateTRT2	interpolation: int	img: T angle: T center: T	output: T	x1	x2.2	x4.4	nv_half2	kLinear, kCHW2, kCHW4	RTX 2080Ti

Inputs#

img: T[float/half/half2/int8]

Tensor shape: [C, H, W]

angle: T[float/half/half2]

Tensor shape: [1]

center: T[float/half/half2]

Tensor shape: [2]

Attributes#

interpolation: int

Interpolation mode to calculate output values. (0: bilinear , 1: nearest)

Outputs#

output: T[float/half/half2/int8]

Tensor shape: [C, H, W]

Inverse#

OP Name	Attributes	Inputs	Outputs	Tensor Format	Test Device
InverseTRT	-	input: T[float]	output: T[float]	kLinear	RTX 2080Ti

Inputs#

input: T[float]

Tensor shape: [B, C, H, W]

Outputs#

output: T[float]

Tensor shape: [B, C, H, W]

BEV Pool#

OP Name	Attributes	Inputs	Outputs	FP32 Speed	FP16 Speed	INT8 Speed	Half Type	Tensor Format	Test Device
BEVPoolV2TRT	out_height: int out_width: int	depth: T feat: T ranks_depth: T ranks_feat: T ranks_bev: T interval_starts: T interval_lengths: T	output: T	x1	X1.1	X2.1	nv_half	kLinear	RTX 2080Ti
BEVPoolV2TRT2	out_height: int out_width: int	depth: T feat: T ranks_depth: T ranks_feat: T ranks_bev: T interval_starts: T interval_lengths: T	output: T	x1	x1.4	X2.1	nv_half2	kLinear	RTX 2080Ti

Inputs#

depth: T[float/half/half2/int8]

Tensor shape: [Cam, D, H, W]

feat: T[float/half/half2/int8]

Tensor shape: [Cam, H, W, C]

ranks_depth: T[int32]

ranks_feat: T[int32]

ranks_bev: T[int32]

interval_starts: T[int32]

interval_lengths: T[int32]

Attributes#

out_height: int

BEV feature height

out_width: int

BEV feature width

Outputs#

output: T[float/half/half2/int8]

Tensor shape: [1, out_height, out_width, C]

Multi-Head Attention#

OP Name	Inputs	Outputs	FP32 Speed NHMA	FP16 Speed NHMA	FP32 Speed FHMA	FP16 Speed FHMA	INT8 Speed FHMA	Half Type	Test Device
QKVTRT	query: T key: T value: T	output: T	x1	X2.0	x4.6	x6.1	x8.2	nv_half	RTX 2080Ti
QKVTRT2	query: T key: T value: T	output: T	x1	X2.1	x4.6	x6.3	x8.2	nv_half2	RTX 2080Ti

Inputs#

query: T[float/half/half2/int8]

Tensor shape: [batch, q_len, channel]

key: T[float/half/half2/int8]

Tensor shape: [batch, kv_len, channel]

value: T[float/half/half2/int8]

Tensor shape: [batch, kv_len, channel]

Attributes#

-

Outputs#

output: T[float/half/half2/int8]

Tensor shape: [batch, q_len, channel]

NOTE: If q_len and kv_len are both multiples of 64, the plugin will run with Flash Multi-Head Attention (FMHA), else Naive Multi-Head Attention (NMHA).