Bricking a Mars probe

Like many of you, I love following space exploration. I’ve always been a space program fanatic. When I was a kid, I knew more details about the Mercury, Gemini and Saturn programs than any other kid or teacher in school.

The recent Curiosity program got me thinking back about the old Mars Viking program. In 1976, NASA landed two probes on Mars that sent back the first pictures from the surface of the planet in history. This was the first time a human-created device had landed on the planet, so there were a ton of unknowns. I was very young, but I remember waiting for those first pictures from Mars, anxious to find out if the little green men would be friendly.

Here’s the first panorama ever sent from Mars (shot by Viking 1 lander – click it to go to the source and high-res versions):

I was reading about the Viking missions and stumbled on a little known, yet fascinating fact about the Viking 1 lander. On November 11, 1982, an over-the-air software update was broadcast to the Viking 1 lander to correct a battery charging issue. The update had an error in it and accidentally overwrote the wrong bits resulting in a bricked lander. Basically, they accidentally overwrote the memory reserved for the antenna pointing code which immediately terminated all communications. The mission had already been a huge success, so it wasn’t a huge deal in the grand scheme of things, but I bet those responsible for the error had a big “oh crap!, we just killed the probe!” moment. I’d love to meet these folks. I bet it’s a great story. I imagine a few NASA engineers in a conference room going through a list of memory addresses that were part of the update, trying to figure out why they didn’t get any acknowledgement from the probe, and then finding the mistake and looking at each other in shocked disbelief and fear.

I can remember working from home on a production system and accidentally typing in the wrong iptables command resulting in the instant inability to connect to the server. At least I was able to drive 30 minutes to the data center and connect a console to the server and undo my mistake!

Russian oops – Phobos 1

I also learned that a similar mistake happened with the Russian Phobos 1 orbital mars mission. From http://en.wikipedia.org/wiki/Phobos_program

“Phobos 1 operated nominally until an expected communications session on September 2, 1988 failed to occur. The failure of controllers to regain contact with the spacecraft was traced to an error in the software uploaded on August 29/August 30, which had deactivated the attitude thrusters. By losing its lock on the Sun, the spacecraft could no longer properly orient its solar arrays, thus depleting its batteries.

Software instructions to turn off the probe’s attitude control, normally a fatal operation, were part of a routine used when testing the spacecraft on the ground. Normally this routine would be removed before launch. However, the software was coded in PROMs, and so removing the test code would have required removing and replacing the entire computer. Because of time pressure from the impending launch, engineers decided to leave the command sequence in, though it should never be used. However, a single-character error in constructing an upload sequence resulted in the command executing, with subsequent loss of the spacecraft.”

Oops!

Software Heroes -– Galileo

I also stumbled on a fascinating story about the NASA Galileo Jupiter probe launched in 1989. During it’s long trip to Jupiter, the high-gain antenna failed to open, even after many genius attempts at spinning the probe to free it (among other wild tactics). This meant that NASA was stuck using the low-gain antenna. The difference in bandwidth was dramatic — 10bps vs 134kbps (10,000 times slower)! While the crippled probe was hurtling on the way to Jupiter, software engineers had to do some serious software rewriting to try and salvage as much of the mission as possible. They rewrote the encoding/decoding software to get better compression and optimized the image compression algorithms among many other improvements. Then they had to push these updates to the probe. Can you imagine the stress? Amazing stuff. More details at https://www.nasa.gov/pdf/546504main_42s_galilieo_rocky_road_jupiter.pdf and at http://www.lpi.usra.edu/publications/newsletters/lpib/lpib76/gal76.html.

Software developers that write code for spacecraft are my heroes. They work under incredible stress and have to cope with extremely low bandwidth, very limited resources, a non-touchable, non-upgradable hardware platform and they typically are writing very low-level code. There are engineers still occasionally sending updates to the Voyager probes launched in the ’70s! They communicate at 160bps and it takes 16 hours for any instructions to be received. I get impatient when a git pull takes more than 5 seconds!

Lastly, I was reminded by a friend of yet another mission where engineers saved the day, and this one was fairly recent (2004) — The Huygens Titan probe. Read all about it at http://www.thespacereview.com/article/306/1