Unveiling a Hidden Bug in the Apollo Guidance Computer
By Henry Garner

AI Summary
The Apollo Guidance Computer (AGC) stands as one of the most examined codebases in history, yet a critical bug in its gyro control code remained unnoticed for 57 years. This bug involved a resource lock, LGYRO, that failed to release during an error path, potentially disabling the guidance platform's ability to realign. Using Claude and Allium, an open-source behavioral specification language, we distilled the AGC's 130,000 lines of assembly into 12,500 lines of specs, which led us directly to the defect.
The AGC's source code has been publicly available since 2003, thanks to Ron Burkey and volunteers who transcribed it from MIT's printed listings. In 2016, Chris Garry's GitHub repository brought renewed attention to this historic code. Despite the scrutiny, no formal verification had been published against the flight code until now.
We approached the problem differently by using Allium to create a behavioral specification of the Inertial Measurement Unit (IMU) subsystem. This specification modeled the lifecycle of every shared resource, highlighting a flaw that traditional methods had missed. The AGC managed the IMU through LGYRO, a lock that should be released after torquing the gyroscopes. However, if the IMU was caged during a torque, the lock remained engaged, blocking further gyro operations.
This bug could have manifested during the Apollo 11 mission when Michael Collins orbited the Moon alone. If a cage event occurred during his star-sighting alignment, the program would hang, leaving him unable to realign the guidance platform. A hard reset would have cleared the issue, but the situation would have been dire without radio contact.
Margaret Hamilton and her team at MIT Instrumentation Laboratory pioneered many concepts in software engineering, including priority scheduling and error recovery. Their work saved the Apollo 11 landing during the 1202 alarms. However, the most serious bugs were often specification errors, like the one we found.
The defect was a result of BADEND, a termination routine that correctly handled general resources but not the gyro-specific lock. Defensive coding practices hid the problem, but did not eliminate it. Our use of Allium's specification forced us to consider every path through the IMU code, revealing the oversight.
Modern programming languages have features to prevent lock leaks, but they still occur. The bug in the AGC is classified as CWE-772 by MITRE, highlighting the ongoing challenge of resource management. Although every Apollo mission returned safely, this bug persisted across multiple missions.
This discovery prompts us to reflect on what hidden bugs might exist in our own systems. The AGC's story is a reminder of the importance of thorough verification and the potential for overlooked issues in even the most scrutinized code.
Key Concepts
A mechanism used in computing to manage access to a shared resource, ensuring that only one process can use the resource at a time to prevent conflicts.
A formal method of defining the expected behavior of a system, often used to verify that the system meets its requirements by modeling its operations and interactions.
Category
TechnologyMore on Discover
Summarized by Mente
Save any article, video, or tweet. AI summarizes it, finds connections, and creates your to-do list.
Start free, no credit card