For Want Of A Bit
When Intel decided to add SIMD instructions to their x86 platform, they knew they wanted to minimize disruption to end users. To that end, they chose to alias the new MMX registers onto the existing x87 register file, which had been used to hold floating point values. The idea, I think, was to enable existing operating systems (such as Windows 95) to context switch the register state without needing an update. Context switching is the feature that enables multitasking operating systems to give each program the illusion that it is running inside a whole computer; each program gets a “time slice” of CPU time to run, then gets interrupted so another program can run.
With the new instructions, the chip would automatically go into “MMX mode” when an MMX instruction was encountered, then when the program was done executing MMX instructions, it had to execute the EMMS instruction before any x87 instructions could run again. (Though this quirk of implementation meant that MMX precluded the intermixing of floating point and SIMD integer instructions, no one seemed fazed at the prospect.)
Anyway.. once MMX-capable hardware became available, during testing Microsoft and Intel found that MMX programs were only working correctly… most of the time. There were unpredictable failures - the programs would sort of lose their mind periodically, and disrupt execution of other programs.
Eventually they realized that, although the operating system was correctly context switching all the x87/MMX register state between programs, one key piece of information was not being context switched, because it hadn’t existed when the Windows 95 context switching code had been written:
The MMX bit.
The mode bit that told the CPU whether the chip was running in x87 mode, or MMX mode.
Without correctly context switching that one bit of state, even programs that did not use MMX at all could be corrupted, because they’d get control after a context switch and the chip would think it was running in MMX mode when it was not. In turn, MMX code could start running incorrectly if the bit got cleared between context switches.
So we had to push an update to a VxD (the kernel mode modules used by Windows 95), and ship it in patches and “in the box” in Window 98. That hadn’t been the intention, but it was mitigated by the necessary delay in rollouts of software updated to take advantage of the new instructions.
I am not sure whether Intel or Microsoft was more to blame for this oversight. But it was quickly remedied, in two ways:
MMX’s successor SIMD instruction set (SSE, “streaming SIMD extensions”), added new register state instead of aliasing existing register state; and
Starting with SSE-capable machines, Intel added the FXSAVE/FXRESTORE instructions to save and restore register state, effectively shifting the division of labor to effect a correct context switch from entirely on the OS kernel, to chip-specific microcode. The operating system’s main responsibility then became to allocate the correct amount of memory to hold the thread context - a queryable parameter.
It’s not hard to contrive scenarios where a single bit makes all the difference in the world (sign bits come to mind); still, it’s amazing that right under the noses of seasoned CPU and operating system architects, the MMX mode bit managed to launch this sneak attack on platform stability.