unrolling1 Warp의 Branch Divergence (reduction problem) References Professional CUDA C Programming Contents Parallel Reduction Neighbored vs Interleaved Approach Unrolling Loops Use template parameter in device functions (템플릿 파라미터 사용) Divergent Wraps (예제 : Sum Reduction) Divergent Wraps (예제 : Sum Reduction) References Programming Massively Parallel Processors https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf Contents Warp Partioni.. 2022. 1. 8. 이전 1 다음