2 Comments
User's avatar
Caden Parker's avatar

It's amazing how modern compilers and superscalar CPUs can take crappy code and still make it run fast. That is a hilarious way to vectorize a loop.

Expand full comment
Nicholas Wilt's avatar

Yes - I find it particularly impressive that g++ detects the x&=x-1 transformation and translates it to a popcount! The AVX codegen is broken IMHO - it generates way too much memory traffic. But the escalation for the aggrieved developer is straightforward: replace it with a proper popcount!

Expand full comment