Sometimes It's Best To Flip A Coin

Not making a decision is, itself, a decision

Sep 12, 2025

In computer engineering, we often are required to make decisions that have little bearing on the quality of the final product, but are needed for the sake of consistency. When plausible arguments can be made in favor of a variety of options, it’s all too tempting to hedge your bets and support more than one, but in reality, consistency often is more important than the substance of the decision itself. In such cases, for the sake of simplicity as well as consistency, you might as well make the decision at random. If deciding between two options, flip a coin.

As Amazon’s Bias For Action Leadership Principle says, it is generally better to make some decision than to defer making a decision at all.

But often, we do not.

Consider little-endian and big-endian ordering, a taxonomy introduced by Danny Cohen in Internet Engineering Note 137 (“On Holy Wars and a Plea for Peace”, c. 1980), which literally are named for the satirical episode in Gulliver’s Travels, where the Lilliputians were divided as to whether eggs should be eaten “from the big end” or “from the little end.” Cohen’s taxonomy refers to how computers store data in memory that comprises more than one byte. With memory addresses increasing from left to right, the two byte orderings would store a 32-bit word as follows:

03 02 01 00 (big-endian)
00 01 02 03 (little-endian)

Plausible arguments can be made for either arrangement: words stored as little-endian may be dereferenced as smaller words and the incoming data is truncated as one would expect. But when writing out the values of, say, 32-bit numbers, the significance of the digits increases right-to-left: No would argue that 0xff squared is anything other than 0xfe01, but a little-endian machine would store the 0x01 in the first byte of memory and 0xfe in the second. Dumping the memory as bytes, say in a debugger, makes the values look backward if they were stored as little-endian.

Across the first few decades of CPU design, there was no consensus as to whether CPUs should be little- or big-endian. In the 1990s, CPUs from vendors such as MIPS could be configured to do either1! And in the Intel 80486 (later i486) processor released in 1989, Intel added an instruction BSWAP (“byte swap”) that switched the endianness of a 32-bit register.

By the late 1980s, the silliness of the divide had become obvious enough that Dave Cutler, the operating system architect recruited by Microsoft in the late 1980s to build the NT kernel, decided to draw a line in the sand: the NT kernel would support only little-endian CPUs. That decision undoubtedly was colored by the importance of Microsoft’s partnership with Intel, whose x86 CPUs only supported little-endian byte ordering; and it was reinforced by Intel’s subsequent rise to dominance of the CPU industry. Today, it’s safe to say that little-endian byte ordering is more prevalent.

The moral of the story: The industry could have saved a lot of time by just picking one.

In 1997-1998, when Direct3D was competing with OpenGL for developer mindshare in the gaming space on Windows, the team was tasked with building the 3D hardware tests for the Windows Hardware Quality Labs (WHQL). Hardware vendors have a powerful economic incentive to pass these tests, so in practice, they specify the standard(s) for hardware behavior. Vendors whose hardware was failing the tests could be granted conditional waivers, as long as they promised that the next generation of hardware would be more compliant; but they had every incentive to get them passing at their earliest convenience.

As we designed these tests, we considered the precedent set by OpenGL and were struck by how many details of the specification had been left as “implementation defined.” I’ve always taken this as a natural consequence of having competitors try to collaborate on a specification – no one wants to invest in changing their hardware to behave more like that of their competitors – but it seemed like more specificity might be warranted. Why, we wondered, hadn’t OpenGL’s Architectural Review Board settled more questions with coin tosses, or arm wrestling competitions, or something?

Take, for example, the so-called “rasterization rules” needed to ensure that, when the screen is tessellated by triangles with shared vertices, each pixel is hit exactly one time (needed for important functions like alpha blending to work correctly). The system doesn’t have to organize at a level higher than a single triangle; the condition can be met by a correctly-defined set of rasterization rules consistently applied to standalone triangles. But there are different ways to solve the problem. To break a tie between two pixels, for example, you could always choose the “top left” pixel, or “bottom right” – hardware that implemented either rule would satisfy the criterion of only hitting each pixel on the screen one time.

The process I used to make this decision was as unscientific as it was self-serving: I went to the OpenGL team and asked them what rasterization rules their software renderer used. “Top left,” they said. I asked the Direct3D team what rasterization rules their software renderer used. “Top left,” they said. So I asked the WHQL test implementors to enforce a top-left rasterization rule, and within a few years, all 3D hardware complied with this aspect of the specification. If one considers this type of industry intervention benign (if not benevolent), this episode is an instance of Microsoft playing the role of benevolent tyrant to enforce industry compliance across the diversity of 3D hardware accelerators in the market. All fueled by, not a coin toss, but a crisp decision being put in place where a vague one had been deemed sufficient by the industry in other contexts.

In the field of API design, ordering parameters in functions is another area where consistency is more important than the decision itself.

When we first built CUDA, we had to decide whether passback parameters would come first (echoing the assignment operator), or last (echoing the implies operator)?We went with the former:

void *dptr;
cudaMalloc( (void **) &dptr, N );

As an aside, OpenGL went the other direction – all passback parameters come last:

GLuint VBO;
glGenBuffers(1, &VBO);

Another, similar convention we adopted was to prepend cu* for driver API functions and cuda* for runtime functions; the driver API went even further and adopted a convention where an entry point’s functional area was incorporated into the function name: cuStreamCreate() creates a stream, for example (and, per the passback ordering convention discussed above, its first parameter passes back the stream). The driver API had cuDeviceGetAttribute(), not cudaGetDeviceProperties().

Once such a decision is made, it is important to stay disciplined and enforce consistency as new families of APIs and other functions are added: consistency builds trust with developers.

The more recently-designed, and now more prevalent, ARM architecture, also is bi-endian, but is most commonly configured as little-endian.

The Parallel Programmer

Discussion about this post