Most Popular Posts of 2025

A Look Back, and Forward

Jan 01, 2026

An author can never tell which content will drive engagement. Along with the audience, authors are (or I am) taking in and writing about different perspectives on the same landscape; in the case of this blog, the technological landscape. Who knew that a post about “Why We Need SIMD” would be the most popular of the year, or that a reading list could engender so much engagement? I started this Substack almost exactly a year ago, and the end of the year is as good a time as any to reflect:

It never ceases to amaze me how few people write with feedback, even on microblogging platforms like Twitter. So we authors are left to consult metrics to determine which topics caught readers’ attention.
Given that I’m best known for my work on CUDA and The CUDA Handbook, I suppose it’s surprising that most of my technical Substack posts involved SIMD instructions like AVX2, not CUDA.
I’m gratified at the number of people who’ve subscribed, especially the paid subscribers; as a professional writer since the late 1980s, I have unique perspective on the way the Internet has been incredible for content, and mostly terrible for content providers, whose pay has plummeted as they struggle to differentiate from high quality content and be heard through the noise of mediocre content. I expect this problem to get worse, not better, as AI tools improve at mimicking human writers.

With no further ado: this article will reflect on the most popular articles and most popular tweets of 2025, with some bonus content if you read to the end.

Most Popular Articles

The most popular article by far, fueled by a post to Hacker News, was Why We Need SIMD (The Real Reason), which recapitulated the history of SIMD instructions on x86 and spoke to the engineering tradeoffs that have prompted CPU vendors to pursue SIMD instuctions as a preferred mechanism to accelerate data parallel workloads.

The second most popular article was my reading list: Perennial Technical Reads, a walk down memory lane invoking Fred Brooks, Jon Bentley, and others. If I were writing it today, I might have remembered to include Joel Spolsky’s spicy blog along with the other Internet resources mentioned.

The third most popular article was a send-up of a l33t coding question, Third Largest Element, which implemented an AVX2-accelerated solution that no reasonable interviewer possibly could expect a candidate to write in real time.

Almost as popular: An article with the enticing title This ML Workload Runs 30x Faster w AVX512, an implementation of the float-to-NF4 conversion that uses permute instructions for register-to-register lookups. It also alludes to a cool optimized Binary Sort implementation that I first learned almost 40 years ago.

Understandably, none of my paid posts were as popular:

Book Review: The NVIDIA Way, a review of Tae Kim’s bestselling hagiography of Jensen Huang that is a fun read but an incomplete oral history of NVIDIA,
Implications of Google v. Oracle, an article on the most consequential Supreme Court decision in software engineering history,
Standardized or Proprietary, a reflection on the tradeoffs between standardized APIs designed by committees such as the OpenGL ARB (Architectural Review Board) and proprietary APIs such as Microsoft’s Direct3D and NVIDIA’s CUDA, both of which I had a central hand in designing,
AMD’s GPU Software: A Software Architect’s Take, a spicy summary of some (by no means all!) of the issues in AMD’s GPU software stack,
Finding And Fixing Direct3D and A Missive From The RISC/CISC War, reflections on my time at Microsoft excerpted from an autobiography in progress.

Most of my articles are free, but I’ll keep leavening with paid content to keep things interesting for my most committed subscribers.

Most Popular Tweets

The runaway viral tweet of the year, viewed 1.8M times, was an anecdote about how consumption of soda dropped 90% when NVIDIA started charging $0.25/soda. NOTE: this was a comment on economics and demand elasticity, not NVIDIA’s confidential business practices c. 2002!

Another notable tweet, with >400k views, was a pedestrian observation about NVIDIA’s software stack, one I have made many times: that because CUDA’s driver API is portable across both operating systems and CPU architectures, it enables NVIDIA to meet developers on platforms they have chosen.

A tweet on game developers’ elite status on the software engineering community got quite a few views:

Some posts that didn’t get as much traction as I expected include Don’t Move The Data!, an updated version of a 2017 article on how data movement has become the limiting reagent of all compute. I may keep updating this one, because the goalposts keep moving as packaging and other technologies continue to develop at a breakneck pace.

Conclusion and 2026

I have been blocking accounts to curate my feed to be constructive and technical, and it has mostly worked. At its best, the Internet and platforms like Twitter serve to facilitate wholesome exchanges and opportunities for learning.

If any of the articles mentioned above sounded interesting, I’d encourage you to take a look through the archives. There are articles on C++ programming idioms, CUDA programming practices, API design, the evolving technological landscape, and more.

As a final note, I have all but decided to move the entire CUDA Handbook text, and related content like the 2013 article on Histograms, to the Web site. Everyone seems to pirate the book anyway! AI tools seem accomplished at performing the stultifying task of converting Word text to Markdown, so look for updates as that project moves forward.

The Parallel Programmer

Discussion about this post

Ready for more?