Changes in TorchSparse v2.1#
In TorchSparse v2.1, we introduced and optimized the Implicit-GEMM dataflow to improve the computing efficiency of Sparse Convolution.
TF32#
We support TF32 precision in TorchSparse v2.1. Use following code to activate TF32 mode:
import torchsparse.backends
torchsparse.backends.allow_tf32 = True # True by default on NVIDIA GPUs >= SM80
User API#
The APIs for Sparse Convolution in TorchSparse v2.1 is PyTorch-like and can be easily accessed in torchsparse.nn
:
from torchsparse import nn as spnn
SparseConvLayer = spnn.Conv3d(in_channels, out_channels, kernel_size)
The input arguments for spnn.Conv3d
are as follows:
Input Argument | Type | Mandatory | Default Value |
---|---|---|---|
in_channels | int | Yes | / |
out_channels | int | Yes | / |
kernel_size | int / List[int] / Tuple[int, ...] | No | 3 |
stride | int / List[int] / Tuple[int, ...] | No | 1 |
padding | int / List[int] / Tuple[int, ...] | No | 0 |
dilation | int | No | 1 |
bias | bool | No | False |
transposed | bool | No | False |
generative | bool | No | False |
The argument
generative
is newly introduced in TorchSparse v2.1. If bothgenerative
andtransposed
areTrue
, the conv module will execute generative transposed convolution.
Conv_mode & Auto-tune#
Sparse Conv Mode#
We provide 3 different Sparse Convolution Modes in TorchSparse v2.1 to accommodate different workloads (get better performance). The conv_mode can be get/set through:
from torchsparse.nn import functional as F
F.set_conv_mode(0) # Set conv_mode to 0/1/2
conv_mode = F.get_conv_mode()
print(conv_mode)
conv_mode can significantly impact the performance of Sparse Convolution.
Generally,
conv_mode = 0
(default) is more suitable for workloads with lower sparsity (e.g., detection tasks) and lower compute precision (e.g., FP16), whileconv_mode = 1 or 2
might be a better choice for higher compute precision and sparsity.Please refer to the following table below for recommended
conv_mode
settings. However, you are always encouraged to experiment and evaluate the speed differences to find the optimal setting for your specific use caseļ¼
FP16 | TF32 | FP32 | |
---|---|---|---|
Segmentation | conv_mode 0 |
conv_mode 2 |
conv_mode 2 |
Detection | conv_mode 0 |
conv_mode 0 |
conv_mode 2 |
Sparse Auto-tune (Experimental Feature)#
We also provide Sparse Auto-tuner in TorchSparse v2.1 to optimize Sparse Conv Settings autonomously. To enable Auto-optimization, you need to first run the model with a few samples to tune conv kernel configurations:
model = ... # your model to be used
dataflow = ... # an iterator with your data samples, usually in the form of torch.utils.data.DataLoader
save_dir = ... # path to save tuning results, a str
tune_tag = ... # tag for tuning, a str
torchsparse.tune(
model=model,
data_loader=dataflow,
n_samples=100, # How many data samples to use in auto-tuning
collect_fn=lambda data:data,
enable_fp16 = True,
save_dir = save_dir,
tune_tag = tune_tag,
force_retune = True, # Whether to ignore cached tuning results
tune_with_bwd = False, # Whether to tune for backward kernels
)
collect_fn
: Process data before calling model.forward(). Or rather, runmodel(*collect_fn(data))
where data is yielded bydata_loader
. The default case handles{'input': SparseTensor,...}
for data.
Sparse Mapping Mode#
We provided two sparse mapping modes in TorchSparse v2.1: hashmap
and hashmap_on_the_fly
(default). The grid
mode in previous TorchSparse has been deprecated.
TorchSparse v2.1 relies on hash method to map input & output points. The hashmap_on_the_fly
mode fuses operations (e.g., build hash table, hash query) together, making it slightly faster than hashmap
mode in most cases.
The sparse mapping modes can be easily set through:
from torchsparse.nn import functional as F
F.set_kmap_mode('hashmap') # Choose 'hashmap' or 'hashmap_on_the_fly'
kmap_mode = F.get_kmap_mode()
print(kmap_mode)
Downsample Mode#
TorchSparse v2.1 is compatible with the two different downsampling modes in SpConv and MinkowskiEngine:
from torchsparse.nn import functional as F
F.set_downsample_mode('spconv') # Choose 'spconv' or 'minkowski'
downsample_mode = F.get_downsample_mode()
print(downsample_mode)
Additional Notes#
Coordinate Order - Batch First#
We changed the coordinate order from
[x, y, z, batch]
to[batch, x, y, z]
in TorchSparse v2.1.
Negative coordinates#
TorchSparse v2.1 allows negative coordinates as inputs.
import torchsparse.tensor as ts_ten ts_ten.set_allow_negative_coordinates(True) # False by default neg_flag = ts_ten.get_allow_negative_coordinates() print(neg_flag)
Note that allowing negative coordinates can add to the sparse mapping overhead. We recommend to leave this flag as
False
if you can make sure that there will be no negative coordinates in your input tensor.
SparseTensor.to_dense()#
We designed customized CUDA kernels to efficiently convert
SparseTensor
to its dense counterpart. Simply callSparseTensor.dense()
to do the conversion.
Range of coordinates#
TorchSparse v2.1 changes the behavior of downsampling layers. The range of coordinates will shrink by
s
times if we apply a convolution layer with stride =s
.
Originally, the coordinate operation can be understood as:
coords_new = torch.floor(coords_old / s).int() * s
Now it is similar to
coords_new = torch.floor(coords_old / s).int()
We also support padding to manipulate the range of coordinates during downsampling. The definition of padding is fully compatible with SpConv and will not take effect if downsample_mode = "minkowski"
. Padding determines the lower and upper bound of coordinates.