Demonstrates a conjugate gradient solver on GPU using Multi Block Cooperative Groups. Added shfl_scan which uses *_sync equivalent of the shfl intrinsics added since CUDA 9.0.Added simpleVoteIntrinsics which uses *_sync equivalent of the vote intrinsics _any, _all added since CUDA 9.0.Demonstrates a GEMM computation using the Warp Matrix Multiply and Accumulate (WMMA) API introduced in CUDA 9, as well as the new Tensor Cores introduced in the Volta chip family. Demonstrates a matrix multiplication using shared memory through tiled approach, uses CUDA Driver API. Demonstrates a matrix multiplication using shared memory through tiled approach. Enumerates the properties of the CUDA devices present in the system. Demonstrates warp aggregated atomics using Cooperative Groups. Demonstrates runtime compilation library using NVRTC of a simple vectorAdd kernel. This is the first release of CUDA Samples on GitHub: Removed support of Visual Studio 2010 from all samples.Added Windows OS support to conjugateGradientMultiDeviceCG sample.Demonstrates CUBLAS-XT library which performs GEMM operations over multiple GPUs. Demonstrates system wide atomic instructions. Demonstrates Peer-To-Peer (P2P) data transfers between pairs of GPUs and computes latency and bandwidth. Demonstrates performance comparision of various memory types involved in system. Demonstrates conjugate gradient solver on GPU using CUBLAS and CUSPARSE library calls captured and called using CUDA Graph APIs. Demonstrates CUDA Graphs creation, instantiation and launch using Graphs APIs and Stream Capture APIs. Update all the samples to support CUDA 10.1.Demonstrates several important optimization strategies for Data-Parallel Algorithms like reduction. It measures the memcopy bandwidth of the GPU and memcpy bandwidth across PCI-e. Demonstrates single and batched decoding of jpeg images using NVJPEG Library. Demonstrates Inter Process Communication with one process per GPU for computation. Demonstrates integer GEMM computation using the Warp Matrix Multiply and Accumulate (WMMA) API for integers employing the Tensor Cores. Added support of Visual Studio 2019 to all samples supported on Windows.Demonstrates cuSolverDN's LU, QR and Cholesky factorization. Demonstrates data exchange between CUDA and EGL Streams. Demonstrates how to convert and resize NV12 frames to BGR planars frames using CUDA in batch. Demonstrates the nppiFilterCannyBorder_8u_C1R Canny Edge Detection image filter function. Demonstrates how to use NPP FilterBox function to perform a box filter. Added Windows OS support to nvJPEG sample.Demonstrates encoding of jpeg images using NVJPEG Library. Demonstrates how to perform Vulkan image - CUDA Interop. Demonstrates the Mersenne Twister random number generator GP11213 in cuRAND. Demonstrates cuSolverSP's LU, QR and Cholesky factorization. Demonstrates Instantiated CUDA Graph Update with Jacobi Iterative Method using different approaches. Demonstrates CUDA-NvSciBuf/NvSciSync Interop. Demonstrates Inter Process Communication using cuMemMap APIs. Demonstrates how cuMemMap API allows the user to specify the physical properties of their memory while retaining the contiguous nature of their access. Demonstrates CUDA Driver and Runtime APIs working together to load fatbinary of a CUDA kernel. Demonstrates CUDA-D3D11 External Resource Interoperability APIs for updating D3D11 buffers from CUDA and synchronization between D3D11 and CUDA with Keyed Mutexes. This section describes the release notes for the CUDA Samples on GitHub only. Samples for CUDA Developers which demonstrates features in CUDA Toolkit.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |