3 Comments
User's avatar
keesh lauria's avatar

CUDA’s explicit management of shared memory has always seemed better to me than the situation with CPUs. In CPU programming we still have to think hard about managing what’s in cache, but we don’t have explicit control, so it’s harder to know if what we’re trying to do is actually happening. And a small change in the code or, even worse, in the compiler, can change performance a lot.

Expand full comment
Nicholas Wilt's avatar

Yes—I often liken cache-aware CPU code to saying a prayer while burning incense in an altar.

At GTC around 2018, NVIDIA did a fairly public interrogation of the utility of shared memory, and reluctantly (it seemed) concluded that it was needed for >40% of CUDA workloads. Subjectively, that seems to be around the time shared memory capacities started to dramatically grow.

Expand full comment
Nicholas Wilt's avatar

Yes—I often liken cache-aware CPU code to saying a prayer while burning incense in an altar.

At GTC around 2018, NVIDIA did a fairly public interrogation of the utility of shared memr

Expand full comment