Parallel Reduction2 Shared Memory (3) - Reduction with Shared Memory References Professional CUDA C Programming Contents Reducing Global Memory Access Parallel Reduction with Shared Memory Parallel Reduction with Unrolling Parallel Reduction with Dynamic Shared Memory Effective Bandwidth 지난 두 포스팅에 이어서 Global Memory Access를 줄이기 위해 Shared Memory를 사용하는 것에 대해 알아보도록 하겠습니다. Shared Memory (1) Shared Memory (2) - Square/Rectangular Shared Memory Reducing Global Memory Ac.. 2022. 1. 20. Warp의 Branch Divergence (reduction problem) References Professional CUDA C Programming Contents Parallel Reduction Neighbored vs Interleaved Approach Unrolling Loops Use template parameter in device functions (템플릿 파라미터 사용) Divergent Wraps (예제 : Sum Reduction) Divergent Wraps (예제 : Sum Reduction) References Programming Massively Parallel Processors https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf Contents Warp Partioni.. 2022. 1. 8. 이전 1 다음