- fix os version in CI to resolve windows mismatch compiler problem
- add python 3.13 and cuda 12.1/12.4/12.6 prebuilts, drop python 3.7/3.8 prebuilts, change manylinux glibc from 2_17 to 2_28 if cuda >= 12.4, which requires recent OS such as ubuntu 18.10+.
- Fix a CI bug that cpu cumm and spconv use different gcc compiler, must be same.
- use a flag to enable large kernel algo (need time to compile at runtime)
- Add SparseGlobalMaxPool and SparseGlobalAvgPool for training only. libspconv don't support it.
- Fix int8 nvrtc error when use prebuilt
- Fix int8 kernel when run on turing GPU
- change version
- change version
- Add int8 quantization support
- Add large kernel support for implicit gemm (kv <= 128)
- CI fail because of pypi temporary shutdown. assign a new version and run again.
- Fix overflow when shape is too large
- Add prebuilt for CUDA 11.8 (RTX 4090 and H100) and CUDA 11.6.
- Fix small bugs
- Fix missing .contiguous for input feature
- Add some debug msg if points vanished.
- Fix CI problem: main function too long and cause OOM in CI vm.
- Fix build problem
- Fix nvrtc problem
- Add Ampere support. faster fp16, faster tf32 and greatly faster int8 kernels in Ampere GPUs.
- Add pure c++ code generation (libspconv.so) for deploy (or train in another deeplearning framework)
- Add NVRTC support for all gemm kernels. if your GPU architecture isn't compiled in prebuilt, spconv will use slightly slower (10-20us overhead for every kernel launch) NVRTC kernels.
- Fix launch fail in maxpool if too much voxels
- all weight layout will be KRSC, don't support old spconv 1.x weights anymore.
- previous gemm ops in ops.py now move to c++ by default (controlled by spconv.constants.SPCONV_CPP_GEMM)
- drop python 3.6 support.
- pascal and kepler architecture is removed in CUDA 12 prebuilt.
- Fix thrust problem by adding -fvisibility=hidden
- add full nvrtc support
- add support for large spatial shape and batch size. if detect large shape, we use int64 instead of int32 when hashing.
- add sm_37
- add fp16 kernels witl fp32 accumulator (run slower, but can avoid nan if channel size is too large)
- add SPCONV_BWD_SPLITK env to control splitk candidates.
- Add fp16 conv simt kernels for mixed-training in pascal or older GPUS. WARNING: not optimized for TESLA P100 which has 2x throughput in half.
- Fix wrong arch assert in all kernels for old GPUs to make spconv work in sm_50 GPUs
- Fix a small bug of spatial_shape.
- Fix a bug in PointToVoxel, we must always return a clone instead of a view.
- Fix a bug in sparse add.
- Fix a serious bug in conv weight init.
- Add more wrong usage check
- Add insert_exist_keys for hash table
- Fix strange compile problem in windows
- Fix missing pccm.Class in setup.py
- Add hash table
- update cumm version
- Add AddTableMisaligned for sptensors with same shape but different indices.
- Fix a bug already fixed in 2.1.10 but introduced in 2.1.12 again.
- Add some ops from spconv 1.x, see spconv.utils for more details.
- Add some debug tool for users to attach more info in issue.
- Add a method for voxel generator to get pc_voxel_id, which is usually used in semantic segmentation
- Fix a bug in cuda voxel generater when max_voxels is smaller than real number of voxels
- Fixed a bug Volta kernels (TITAN V, Tesla V100), backward weight kernels use f16 as accumulator. we should use f32.
- Fixed a corner case when user use kernel size = 1x1 but stride != 1.
- Fixed a corner case when input feature is non-contiguous when maxpool.
- Fixed a bug in utils.PointToVoxel, shouldn't get cuda stream in cpu code
- Remove a wrong assert
- Add support for pytorch 1.5
- Fix a bug when net have inverse and run inference in eval mode.
- Fix missing -fopenmp in linker for CPU only
- remove stale comment sending in CI
- Add cuda profile tool
- Add python 36 support
- Format all code
- remove a unnecessary device sync and slightly improve performance.
- Fix a bug of SparseInverseConv3d
- Fix a bug of CPU only package
- Fix a bug of python 3.7
- add implicit gemm algorithm for all kind of convolution with kernel volume <= 32. this algorithm is very fast with float16.
- add pytorch wrapper for voxel generator
- add CPU support and CPU-only build.
- Fix a serious bug that do nothing with non-spconv layers in SparseSequential
- Fix a bug of ProxyableClassMeta
- Change build system from cmake to pccm.
- Change pytorch python code to spconv.pytorch
- Rewrite All c++ code.
- The subm indice pair generation speed is greatly increased by two tricks: 1. most subm conv use only kernelsize=3, so we can unroll loops to get 100% performance increase. 2. subm indice pairs have a property: indicePairs[0, i] = indicePairs[1, kernelVolume - i - 1], so we can get another 100% performance increase.
- add batch gemm support. small performance increasement but more gpu memory usage. you can use algo=spconv.ConvAlgo.Batch to use it.
- replace most of 'functor' with c++14 dispatch in c++ code.
- change gather/scatterAdd kernel parameter to support large points.