Ten Years Later: CUDA Succeeded Despite...

Feb 16

After posting a list of reasons why CUDA succeeded, it seems worthwhile to reflect on some of its apparent vulnerabilities, and why CUDA has been successful despite those issues.

Read →

3 Comments

keesh lauria

Oct 31

CUDA’s explicit management of shared memory has always seemed better to me than the situation with CPUs. In CPU programming we still have to think hard about managing what’s in cache, but we don’t have explicit control, so it’s harder to know if what we’re trying to do is actually happening. And a small change in the code or, even worse, in the compiler, can change performance a lot.

Expand full comment

Reply (2)

Nicholas Wilt

Nov 1

Yes—I often liken cache-aware CPU code to saying a prayer while burning incense in an altar.

At GTC around 2018, NVIDIA did a fairly public interrogation of the utility of shared memory, and reluctantly (it seemed) concluded that it was needed for >40% of CUDA workloads. Subjectively, that seems to be around the time shared memory capacities started to dramatically grow.

Expand full comment

Nicholas Wilt

Oct 31

Yes—I often liken cache-aware CPU code to saying a prayer while burning incense in an altar.

At GTC around 2018, NVIDIA did a fairly public interrogation of the utility of shared memr

Expand full comment

The Parallel Programmer

Ten Years Later: CUDA Succeeded Despite...