Skip to content

wayhoww/GPURayTraversal

Repository files navigation

I cannot recall the background of this repo anymore...
--------------------------

I ([email protected]) have made some editing 
so that this project can run on Visual Studio 2019 and Visual Studio 2022.

Prerequisite Environment Variables:
CUDA_PATH: some path like `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5`
MSVC_PATH: some path like `C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.30.30705`

It seems that `tesla_persistent_while_while.cu` is the only working kernel on RTX 3090 (sm_86).
--------------------------

Fast GPU Ray Traversal 1.4
--------------------------
Implementation by Tero Karras, Timo Aila, and Samuli Laine
Copyright 2009-2012 NVIDIA Corporation

This package contains full source code for the fast GPU-based ray traversal
routines used in the following paper:

    "Understanding the Efficiency of Ray Traversal on GPUs",
    Timo Aila and Samuli Laine,
    Proc. High-Performance Graphics 2009
    http://www.tml.tkk.fi/~timo/publications/aila2009hpg_paper.pdf

In addition to the original kernels that were optimized for NVIDIA GTX 285,
the package also includes kernels specifically hand-tuned for GTX 480
(GF100/Fermi) and GTX 680 (GK104/Kepler). The results for these GPUs have
been published in the following technical report:

    "Understanding the Efficiency of Ray Traversal on GPUs - Kepler and Fermi Addendum",
    Timo Aila, Samuli Laine, and Tero Karras,
    NVIDIA Technical Report NVR-2012-02,
    http://research.nvidia.com/publication/understanding-efficiency-ray-traversal-gpus-kepler-and-fermi-addendum

The accompanying benchmark application and test scenes aim to replicate the
published results as accurately as possible, although there are slight
differences in the test setup (e.g. BVH builder, CUDA version).
See results.txt for details.

The source code is licensed under New BSD License (see LICENSE), and
hosted by Google Code:

http://code.google.com/p/understanding-the-efficiency-of-ray-traversal-on-gpus/


System requirements
-------------------

- Microsoft Windows XP, Vista, or 7.

- At least 1GB of system memory.

- NVIDIA CUDA-compatible GPU with compute capability 1.2 and at least 1.5
  gigabytes of RAM. GeForce GTX 480 or GTX 680 is recommended.

- Microsoft Visual Studio 2010. Required even if you do not plan to build
  the source code, as the runtime CUDA compilation mechanism depends on it.


Instructions
------------

1. Install Visual Studio 2010. The Express edition can be downloaded from:
   http://www.microsoft.com/visualstudio/en-us/products/2010-editions/visual-cpp-express

2. Install the latest NVIDIA GPU drivers and CUDA Toolkit.
   http://developer.nvidia.com/object/cuda_archive.html

3. Run rt.exe to start the application in interactive mode. The first
   run executes certain initialization tasks that may take a while to
   complete.

4. If you get an error during initialization, the most probable explanation
   is that the application is unable to launch nvcc.exe contained in the
   CUDA Toolkit. In this case, you should:

   - Set CUDA_BIN_PATH to point to the CUDA Toolkit "bin" directory, e.g.
     "set CUDA_BIN_PATH=C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin".

   - Set CUDA_INC_PATH to point to the CUDA Toolkit "include" directory, e.g.
     "set CUDA_INC_PATH=C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include".

   - Run vcvars32.bat to setup Visual Studio paths, e.g.
     "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\vcvars32.bat".

5. Run benchmark.cmd to measure the performance of all test scenes with
   the same settings that were used in the paper. Expected performance
   numbers for different GPUs are listed in "results.txt". Note that a
   64-bit build is required to benchmark the San Miguel scene.

6. Optional: Build the application yourself.

   - Open rt.sln in Visual Studio 2010.
   - Right-click the "rt" project and select "Set as StartUp Project".
   - Select Release build. Debug build is very slow, especially when sorting
     secondary rays during the benchmark.
   - Build and run.


Test setup
----------

Camera positions:

- Results of each scene are averaged over 5 distinct camera positions.
- Camera positions are specified on the command line using
  "signature strings".
- To generate a signature string, click "Show camera controls" in the
  interactive mode and then "Export camera signature..."

Ray generation:

- Viewport of 1024x768 pixels.
- Primary rays that hit scene geometry generate 32 secondary rays (AO or
  diffuse) distributed according to a cosine-weighted Halton sequence on
  the hemisphere.
- Primary rays that miss the geometry generate dummy secondary rays that
  are ignored by the ray traversal kernel and excluded from the
  rays-per-second figures.
- AO rays are of limited length and terminate immediately after encountering
  an intersection.
- Diffuse interreflection rays are very long and continue the traversal
  until they find the closest intersection.
- See src/rt/ray/RayGen.cpp

Batches and sorting:

- All primary rays are traced in a single launch.
- Secondary rays are divided into batches of 2^20 rays, and each batch
  is traced in a separate launch.
- Primary rays are generated according to 2D Morton order in screen space.
- Each batch of secondary rays is sorted according to 6D Morton order based
  on ray origin and direction vectors.
- The sorting of is performed by the CPU and is a quite heavy operation,
  taking roughly one second per batch. It is disabled in the interactive
  mode.
- See src/rt/ray/PixelTable.cpp and src/rt/ray/RayBuffer.cpp

Acceleration structure:

- AABB-based binary bounding volume hierarchy.
- Built using spatial triangle splits (Stich et al.) to improve quality:
  http://www.nvidia.com/docs/IO/77714/sbvh.pdf
- Triangles are represented using Woop's affine triangle transformation:
  http://www.sven-woop.de/publications/Diplom_SvenWoop_Final.pdf
- Memory layout varies between individual traversal kernels.
- See src/rt/bvh/SplitBVHBuilder.cpp and src/cuda/CudaBVH.cpp

Ray traversal:

- Performance results are based on the time spent in ray traversal for
  the selected ray type. Ray generation and sorting are excluded from
  the measurements.
- The code for launching the traversal kernels can be found in
  src/cuda/CudaTracer.cpp
- The kernels themselves are located in src/rt/kernels:

  fermi_speculative_while_while
    Hand-tuned to yield the best performance on GTX 480.
    Works on older GPUs as well, but is not optimal.
    
  kepler_dynamic_fetch
    Hand-tuned to yield the best performance on GTX 680.
    Works on older GPUs as well, but is not optimal.

  tesla_persistent_packet
    "Persistent packet" kernel from the paper.

  tesla_persistent_speculative_while_while
    "Persistent speculative while-while" kernel from the paper.
    This is the fastest kernel on GTX 285.

  tesla_persistent_while_while
    "Persistent while-while" kernel from the paper.


Version history
---------------

Version 1.4, May 22, 2012
- Include hand-tuned kernels for Kepler-based GPUs.
- Improve fermi_speculative_while_while perf using vmin/vmax PTX instructions.
- Include San Miguel test scene in the package.
- Improve robustness of the BVH builder with degenerate input.
- Switch to New BSD License (previously Apache License 2.0).
- Upgrade to Visual Studio 2010 (previously 2008).
- Fix a CUDA compilation issue with Visual Studio Express.
- General bugfixes and improvements to framework.

Version 1.3, Jul 08, 2011
- Fix compatibility issues with CUDA 4.0.

Version 1.2, Dec 17, 2010
- Fix issues with nvcc path autodetection with CUDA 3.2.

Version 1.1, Dec 01, 2010
- Update the codebase to support GF104 and CUDA 3.2.
- Speed up ray sorting significantly by utilizing all available CPU cores.
- Minor stability improvements.

Version 1.0, Jun 29, 2010
- Initial release.


Known issues
------------

- When using CUDA 3.2 or later, the performance of device-side code drops
  slightly in 64-bit builds. This is because CUDA 3.2 disallows "mixed-bitness
  mode", which we utilize on earlier CUDA versions to get maximum performance.
  With CUDA 3.2, we must always compile device code with the same bitness
  as host code, which generally results in higher register pressure in 64-bit
  builds.
  
  For more information, see:
  http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_3.2_Readiness_Tech_Brief.pdf

- The mesh importer only supports a limited subset of the Wavefront OBJ
  file format. If you have trouble importing a mesh, you may want to try
  enabling WAVEFRONT_DEBUG in src/framework/io/MeshWavefrontIO.cpp.


Acknowledgements
----------------

Anat Grynberg and Greg Ward for the Conference room model.
University of Utah for the Fairy scene.
Marko Dabrovic (www.rna.hr) for the Sibenik cathedral model.
Guillermo M. Leal Llaguno (www.evvisual.com) for the San Miguel model.

About

GPURayTraversal for VS 2019 & VS 2022

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published