
Sign up to save your podcasts
Or


Sightseeing trips to the ice
In the 1970s New Zealand’s national carrier Air New Zealand began operating “sightseeing” flights to Antarctica. The flights would depart from Auckland, fly south over the remote Balleny Islands and onto the ice continent, passing over Capes Hallett and Adare and from there, down McMurdo Sound to the ice continent. There they would circle for an hour or so, to give passsengers a good look at Antarctica, before turning around. They returned to Auckland via Christchurch in the South Island for a refuelling stop: the airline’s fleet of DC-10 wide-bodied jets did not quite have the range to make it back to Auckland comfortably in one go. The round trip took about 12 hours.
At 8 am on 28 November 1979 flight TE-901, the final flight for the season, left Auckland with two hundred and thirty-seven passengers and twenty crew. At the controls was an Air New Zealand veteran, captain Jim Collins. All was well, though visibility was not great. As he approached Antarctica shortly after midday, Collins reported cloud cover at 5,000 metres, and requested permission from local air traffic control at the McMurdo Sound naval base to drop below the cloudbase to give passengers a better view. McMurdo granted it.
Soon after that exchange, McMurdo Base lost contact with the DC-10. When it had not reappeared on New Zealand air traffic control systems a couple of hours later, officials began to fear the worst.
Over the afternoon, planes were scrambled from McMurdo Base. They traced TE-901’s last known position. They followed its scheduled flightpath. They found nothing.
This Substack is reader-supported. To receive new posts and support my work, consider becoming a free or paid subscriber.
Hours passed. News of the missing plane began to spread. As the Auckland evening drew in, relatives began to reconcile themselves to the idea of disaster. Air New Zealand issued a grim deduction: the plane could not possibly still be in the air: it would long since have run out of fuel.
Then, at about midnight — the sun does not set in Antarctica in late November — the crew of a US Navy Hercules spotted a dark smear on Ross Island, some forty-five kilometres east of the plane’s scheduled flight path. They investigated. They quickly realised it was the remains of an air crash. The aeroplane’s tail section, but little else, was intact. It bore Air New Zealand’s distinctive koru. Wreckage stretched for hundreds of metres up the lower slopes of Mt. Erebus, a 3,900 metre volcano.
The Hercules radioed back to base. They had found TE-901. There would be no survivors.
A commercial airliner with two-hundred and fifty seven souls on board had flown at an altitude of 400 metres into a mountain the size of the Eiger.
At the time it was the fourth worst crash in the history of powered flight. It remains New Zealand’s worst single peacetime loss of life.
The crash recovery, at such an inhospitable location, was harrowing. Many on the rescue teams were permanently traumatised. The New Zealand Government created a special civilian award for gallantry for those who went down to the ice.
The government initiated an investigation. The cause of the crash remained baffling. The plane’s captain, Jim Collins, was an experienced and conscientious pilot. He had reported no trouble. The plane’s telemetry indicated it was functioning perfectly. The flight recorders indicated the cockpit seemed harmonious, though there was some discussion about visibility. But photographs recovered from passenger cameras indicated good visibility below the plane, and it had been the basis on which Collins’ was granted permission to descend below the plane’s minimum safe altitude of just under 5,000 metres. That altitude had been set a good kilometre above the highest point of the surrounding terrain, which was the crater of the active volcano Mt.Erebus, on Ross Island to the east of the Sound.
Yet, despite all this, the airliner had been nearly fifty kilometres off course and was flying lower than 500 metres above sea level.
Someone, or something, had plainly gone very wrong. Quickly — the consensus, after 45 years, is far too quickly — the question turned to who. As is so often the case in “human error” investigations, the first person in the frame was the operator: Jim Collins. Captain Collins, as is also so often the case, was in no position to argue.
Whether or not Collins was at fault, there is a deeper point. It ought to be a well-understood operating risk in any complex operation that people make mistakes. Human error is not an exception to the operation of a system but an inevitability. Yet, when failure happens, our instinctive response to find the wrongdoer.
It need not be that way.
Moonshot
When President Kennedy fired the starting gun on the space race in 1962 — at which point the Reds had a lap head start — the Apollo Programme was on an absurdly tight schedule. The farthest it had then got was Alan Shepard’s suborbital Mercury flight in May, 1961. That lasted fifteen minutes and reached an altitude of less than two hundred kilometres from Earth. It wasn’t even, truly, in space. Now Kennedy promised that NASA would have a man on the moon by the end of the decade.
The moon is nearly four hundred thousand kilometres from Earth. The mission would take ten days. The challenge NASA faced was, literally, an order of magnitude greater than its greatest achievement to date.
Then, in January 1967, disaster struck. A terrible fire during a launchpad test killed three astronauts in NASA’s Apollo programme. The immediate cause of the accident was a stray spark from exposed electrical wiring, which ignited in the pure oxygen environment of the sealed capsule. The crew did not stand a chance. They were bolted in: it would have taken them a minute and a half to open the door.
Programme director Gene Kranz shut down the blame game before it could get started with his famous “tough and competent” speech.
Spaceflight will never tolerate carelessness, incapacity, and neglect. Somewhere, somehow, we screwed up. It could have been in design, build, or test. Whatever it was, we should have caught it. We were too gung-ho about the schedule and we locked out all of the problems we saw each day in our work.
[...] Nothing we did had any shelf life. Not one of us stood up and said, ‘Dammit, stop!’ I don’t know what Thompson’s committee will find as the cause, but I know what I find. We are the cause! We were not ready! We did not do our job. We were rolling the dice, hoping that things would come together by launch day, when in our hearts we knew it would take a miracle.
[...] From this day forward, Flight Control will be known by two words: ‘Tough’ and ‘Competent.’ Tough means we are forever accountable for what we do or what we fail to do. We will never again compromise our responsibilities. Every time we walk into Mission Control we will know what we stand for. Competent means we will never take anything for granted. We will never be found short in our knowledge and in our skills.
Here are two divergent responses to disaster. The first, and often instinctive, is to find a single root cause in an individual—to isolate a “bad apple”. The second is exemplified by Kranz. He shifted the focus immediately from who to blame to what to fix: How can we redesign the system—the spacecraft, the procedures, the culture—to ensure this never happens again?
The Erebus disaster would, unfortunately, take the first path.
Investigation
Within a few months of the crash New Zealand’s chief air accidents investigator Ron Chippindale issued a preliminary report. It was thorough — something like 90 pages — and pulled few punches. Chippindale attributed the crash to pilot error, more or less upon the legal principle res ipsa loquitur — “sometimes, things speak for themselves”:
The probable cause of this accident was the decision of the captain to continue the flight at low level toward an area of poor surface and horizon definition when the crew was not certain of their position and the subsequent inability to detect the rising terrain which intercepted the aircraft’s flight path.
If you can’t see a mountain in front of you and you fly into it, that’s on you.
But Jim Collins was a fastidious pilot. He had been briefed on the route. He had studied it diligently. His family had watched him, the night before departure, marking up the route on his own personal atlas, which he took with him on the plane. And besides: it made no sense. Why would an experienced pilot drop through the published minimum safe altitude if he didn’t know where he was and couldn’t see where he was going?
One of Mr. Chippindale’s other observations — made in passing and of little consequence — caught the attention of the pilot’s association:
The flight planned route entered in the company’s base computer was varied after the crew’s briefing in that the position for McMurdo on the computer printout used at the briefing was incorrect by over 2 degrees of longitude and was subsequently corrected prior to this flight.
[...]
Some diagrams and maps issued at the route qualification briefing could have been misleading in that they depicted a track which passed to the true west of Ross Island over a sea level ice shelf, whereas the flight planned track passed to the east over high ground reaching to 12450 feet [about 3,800 metres] above mean sea-level.
That “high ground” was Mt. Erebus. But even at McMurdo Sound’s extreme southern latitude, “two degrees of longitude” is no small difference: it is about 50 kilometres. The route Jim Collins had been briefed on would take him well to the west of Mt. Erebus: if he was going by his maps, he would have been over McMurdo Sound. But still: even from 50 kilometres, mountains the size of Erebus are hard to miss. If visibility was poor, the plane should not have been at a dangerously low altitude. As far as Ron Chippindale was concerned, the operating cause of the disaster remained that Jim Collins flew his plane into a mountain.
Air New Zealand quickly swung in behind this narrative. The pilot’s union did not.
Another thing to bear in mind: the Antarctic flights were unlike ordinary commercial flights for a couple of reasons. Firstly, being a round-trip, they didn’t have a fixed destination waypoint. Planes navigate via “waypoints”. The final one is usually the air traffic control tower of the destination airport. Needless to say, you have to reach a runway waypoint exactly: near enough is not good enough. But TE-901 was not aiming at a landing strip, but rather just a marker it would fly around before heading home.
Secondly, it was a sightseeing trip: the point of the journey was to have a look around, so the pilots might be expected to take a little more, well, latitude than one might expect on a normal commercial route, especially if visibility was patchy.
And thirdly, it was, well a sight-seeing trip. That meant being closer to the ground that the DC-10’s normal cruising altitude of 10,000 metres. There is no chance of hitting any mountain when you are 10 kilometres in the air, but it is not much good for sightseeing, especially when there is a layer of cloud below. Pilots on the Antarctic route had standing permission to descend to a “minimum safe altitude” of 4,900 metres — still well above Mt. Erebus’s summit — and it emerged during later evidence that they regularly dropped well below even that to give passengers a view of the Sound as they approached.
That typo
There is a little frisson here: the “error” mentioned in Mr. Chippindale’s report had been coded into the computer coordinates and the Airline’s briefing notes for more than a year. The previous Air New Zealand flights had all taken this route, incorrect though it was flying down McMurdo sound, some 43 kilometres west of the approved flightpath.
It was, in fact, a better route for sightseeing, precisely because you could fly low over the sea-ice. The approved route went right over the top of Mt. Erebus, obliging the pilots to stay much higher and having to contend with the small matter of flying over the crater of an active volcano!
It is possible — no-one knows — that Air New Zealand knew of the error, and kept quiet about it on account of its somewhat difficult relationship with air traffic control at the McMurdo Station, which was not used to commercial flights and did not like having to “babysit” them. Had McMurdo known about the error, they may have withdrawn permission for Air New Zealand to fly it.
There is some controversy as to what prompted the action, but in the hours before TE-901’s departure, an Air New Zealand flight controller noticed the “error” in the intended flight co-ordinates — they had been miskeyed as “164°48’ east” instead of “166°48’ east” — and corrected them back to the approved civil aviation route.
It is possible that the real error was this correction to the original error, and the ideal route was down McMurdo Sound all along. As far as Captain Jim Collins was concerned, that was where he was meant to be.
In any case, the correction was never communicated to Captain Collins or his crew. The DC-10’s navigation system would take the plane directly over Mt. Erebus, and not 43 kilometres to the west.
At first this non-communication of such a large correction seems outrageous, but it is an oversight of exactly the same nature as the one apparently made in the cockpit. The controller must have assumed it was an insignificant change. It would be, as long as the plane stayed at its permitted altitude. Seeing as it realigned the plane with its approved flight path the controller may have also assumed his action simply conformed to the route Captain Collins was expecting to fly.
And — had he even taken his thought process that far — he might have figured that, if all else fails, Collins would be able to see that the flightpath had been shifted. Mt. Erebus is nearly four kilometres high: it is usually quite hard to miss.
Visual meteorological conditions
On 28 November 1979 there was a heavy layer of cloud over McMurdo Sound at the programmed altitude. Captain Collins sought from McMurdo Air Traffic Control, and was granted, permission to descend below the cloud base to give the passengers a better view. On his co-ordinates he, and local air traffic controllers at McMurdo Station, believed he was flying over McMurdo Sound, so there was little risk. McMurdo ATC had a radar that could “let down” the plane 400 metres, so approved the descent as long as Collins maintained “visual meteorological conditions” — weather conditions good enough to maintain separation from other aircraft and obstacles by sight. Collins confirmed that he did. Passenger photographs recovered from the crash site confirm there was good visibility across the ground. McMurdo’s radar system never managed to lock onto the DC-10.
Had it been clear, he would have seen Mt. Erebus ahead. But it was heavily overcast, and when a snow-covered landscape blends into a flat white sky the horizon disappears in an unusual condition called a “sector whiteout”. This is different from the whiteout that you might experience when skiing, in which you lose all sense of up and down. It is less obvious, and more insidious.
In a “sector” whiteout clouds above clear air reflect light preventing any snow-covered features from casting shadows. The sky, snow, and horizon blend together. There is no contrast to reveal slope. Rather than obscuring your vision it affords an apparently clear view. A colossal mountain is indistinguishable from flat ice. Captain Jim Collins expected to see flat ice. He looked out his window, and that is what he saw. He had no idea he was flying straight into a mountain until his ground indicator warnings sounded. By then it was too late.
So a highly unlikely combination of factors — an unrecognised programming error, its non-communication, a cloud layer and the visual conditions it created — contrived to create a disaster. Had any of the factors not been present TE-901 would have returned safely.
Normal accidents and system glitches
The Erebus crash was a textbook example of what Charles Perrow called a “normal accident”: a catastrophic failure mode of a highly complex, tightly-coupled system. It is literally a textbook case: it is in Perrow’s book. Perrow called them “normal” accidents because they arise during normal passages of expected operating conditions. No major error, oversight or sabotage is needed to cause them: just a confluence of unusual circumstances and the sort of ineradicable misapprehensions that comprise the human condition and from which we all from time to time suffer.
Normal accidents are not, in the final analysis, really “failures” as such: rather, they unwelcome operating modes that arise as a function of sheer system complexity: when non-adjacent parts of the system that aren’t meant to interact do, then there is a potential for non-linear reactions. The Erebus disaster is a classic case.
Catastrophic failures are all too easy to recognise when they do happen.
But what happens when the malfunctioning complex system is not a tightly coupled power station or commercial aviation system, but one of bureaucracy and law? Here human judgment and misapprehension is just as vital. What happens when the failure is not a sudden explosion or a plane crash on a distant mountain, but a series of loosely coupled misapprehensions, misalignments of interests, and clouded judgments? When they happen not in seconds, but over years? Where each error is not recognised as a failure but, rather, as a success? What if errors are taken as validations of a system that is in fact malfunctioning?
These would not look like “accidents” at all: they would present — much later — like latent defects. In the meantime they may have been built on and integrated into foundations. Those afflicted by these everyday misapprehensions might not be blamed, or vilified, but rewarded for carrying out their appointed function. These calumnies may only unravel years later. When they do, there will be the same hue and cry, and those responsible will be, in the same way, eviscerated.
These we might call “system glitches” — failures that don’t look like failures, and so are allowed to repeat, without a fixed end point.
What might one of these look like? The best recent example is the Post Office Horizon IT scandal: the prosecution over fifteen years of literally hundreds of sub-postmasters for non-existent fraud. The prosecutions had been wrought by a sprawling, incomprehensible “complex system” — a complex combination of software, software vendors, investigators, inhouse lawyers, Post Office executives, externally appointed solicitors and barristers and the court system — seemed to be working properly. It identified fraudulent behaviour, successfully extracted compensation for it and punished those the system held responsible.
Subpostmasters tend to be “pillar-of-the-community” types, often having taking the role out of a sense of public duty. They tend to notably lack criminal records, nor any motive for petty fraud.
But it is only when we step back to ask a bigger question that the possibility of a system glitch emerges: can it really be true that nine hundred generally upstanding individuals should start independently defrauding the Post Office in strikingly similar ways, just as the Post Office introduced a state-of-the-art accounting system designed to detect fraud?
Catastrophic accidents typically only happen once. The lesson is learned, systems are updated, protocols introduced, and the complex system adapts. They tend, therefore, to be highly unusual events. The system gradually gets safer.
The Post Office Horizon IT scandal shows us that this need not be true for system glitches. They may not recognised. They can happen over and over again. A false prosecution that looks like a fair one will not prompt any change in behaviour. If the same circumstances come up, we should expect the same outcome.
Hunter becomes the hunted
This is the most insidious system effect of all: the self-justifying hunt for another scapegoat to explain how the system made scapegoats. Once revealed, the system glitch becomes obvious. Of course these public-spirited subpostmasters weren’t siphoning away money! That is absurd! Now, we must make an example of those villainous few who made it their business to prosecute them! With this new resolve, the system reconfigures itself to find some new “bad apples” to replace the subpostmasters in the public stocks. Commissions of enquiry are convened. King’s Counsel appointed. The shellacking is broadcast on the internet for all the world to see. It is a modern-day public flogging.
But is not this to commit the same category error the system made in the first place? Aren’t we looking for simplistic, linear causes of a complex, non-linear failure?
The lawyers who prosecuted subpostmasters were, in their own way, acting within their expected function in the system.
They were presented with what appeared to be clear evidence of theft, from a respected institution, backed by what they were told was infallible technology. Their role in the system was to prosecute apparent crime; that is what they did. The system incentivised success, not scepticism. They were operating with their own “local” information, unable to see the wider pattern that would have revealed the truth.
This is not to excuse malicious or knowingly dishonest conduct. But it is to doubt that many of the hundreds of people involved in prosecutions over 15 years were malicious or knowingly dishonest. Most were, like the rest of us, beset by misapprehension, cognitive bias and ordinary human fallibility. That is all it took. They were unsighted components in a sprawling, malfunctioning system failure. Now that failure has manifested itself, the system turns on those same components and treated them the same way they had treated the subpostmasters. We find this to be a satisfying type of retributive justice, but it serves to misdirect attention away from a system that keeps glitching.
What if the real villain is not a person, but the pathological dynamics of the complex system?
Human organisations as complex systems
Complex system/ˈkɒmplɛks ˈsɪstəm/ (n.)
A self-organising system of autonomous individuals, components and subsystems, which interact in non-linear ways and whose behaviour cannot be reliably predicted from the behaviour of the individual parts. The rules, boundaries and components of a complex system are typically not well defined, may themselves be complex systems, and may change unexpectedly.
The thing about human organisations is that they are complex systems. The Post Office Horizon IT scandal wasn’t caused by one bad actor — nor even a lot of them — but by the unlikely conjunctions and interactions of actors doing, generally, what they were there to do, and were expected to do, only mediated by the peculiar motivations, incentives and information gaps that cross-cut any component’s interactions with the rest of the system.
In this way the system itself contrived to distribute a latent failure across multiple institutions and people. Each component operated within its own domain and according to its own incentives and with its own limited information. The Post Office needed to boost its results. Fujitsu needed to demonstrate Horizon’s reliability. Fujitsu employees knew there may be bugs but were disincentivised by their employer from flagging or escalating them. Isolated subpostmasters could not see that others were experiencing the same impossible discrepancies. Lawyers presumed an institution as august as the Post Office had well founded concerns and played their role in optimising the success of what they took to be thoroughly justified prosecutions.
The individuals interacting with this “system” — and their subsystems — broadly did not have the information needed to assemble apparently innocuous local warning signs into a coherent picture of wide-scale injustice. Now that we do have that information — thanks to the intervention of other complex systems, like the fourth estate — it is hard to put ourselves back in the purblind position that the actors were in at the time.
This is not to excuse, but to explain: many are understandably anxious to isolate and punish a villain, but that is to make exactly the same category error that this glitching system has made: to look for a single root cause of a complex and unpredictable interaction. The right reaction, on seeing in-house lawyers squirming under cross examination at the Post Office Horizon Inquiry was to think, “there but for the grace of God go I”.
This does not mean there should be no consequences: it just means they should be target at the people who are meant to be accountable for these systems. There is little to be gained from blaming operators, even when they are still here to speak for themselves. Accountability belongs with those empowered to design, maintain and change the systems we operate. For the Post Office, that was Paula Vennels and the Post Office executive. For Erebus it was Morrie Davis and Air New Zealand’s executive. In both cases they notably baulked at accepting a responsibility that was clearly theirs. In stark contrast, Gene Kranz and the NASA flight control team accepted their responsibility and acted on it. They are rightly held up as exemplars of how accountability is meant to work.
Air New Zealand’s executive massaged the picture to deflect their accountability. Responsibility lay not with a pilot following his brief, even if he was mistaken, but with airline management allowing a system to persist where simple misapprehension could result in disaster. Human error is, in any system, inevitable. Likewise, final responsibility for the Post Office scandal lies not with those pursuing the prosecutions, however short-sightedly, but with those who set their incentives and encouraged that short-sightedness, discouraging internal challenge and ushered in those outcomes regardless of their plausibility.
Complex systems have a “mind” of their own. Especially those that comprise human agents who, literally, have minds of their own. We can try to harness them, but the opportunity for chaos, swamping our best laid plans to remain in control, is never far away. Those who commission systems remain responsible when, as they will, they play up.
We tell ourselves we are in control: all to often, the gods have other ideas.
Memoriam
The Erebus disaster could have been something that bound a small nation together. Instead, it became a story about blame, anger and recrimination. A quest for the truth that went badly off course.
— Michael Wright, White Silence, episode 6.
The Erebus recriminations did not stop with the Chippindale report. The uproar was such that the government commissioned a royal enquiry. It would be chaired by a High Court Judge, Mr. Justice Peter Mahon. Mahon rejected Chippindale’s report and exonerated Collins and his crew, finding that the airline’s administrative mistakes — especially that typo and the failure to communicate its correction to the crew — were the true operating causes of the crash.
The Judge was unusually scathing about the corporation’s dissembling after the crash, theatrically describing the totality of Air New Zealand’s evidence as an “orchestrated litany of lies”. That meaty phrase has become part of the New Zealand canon.
The mountain continued to claim its victims. Air New Zealand’s combative chief executive Morrie Davies abruptly retired in the wake of the Mr. Justice Mahon’s report in, he claimed, “an attempt to remove a focus point from this current controversy and hasten the company’s recovery.”
The Royal Commissioner himself would be next in line. The airline sought judicial review of Mr Justice Mahon’s findings — especially his description of the corporation’s evidence as “an orchestrated litany of lies”. New Zealand’s Court of Appeal ruled that as the airline had not been offered an opportunity to respond to the allegations before they were published there had been a “breach of natural justice”. Rather anaemically, the Privy Council then upheld that ruling. Consequently, Mr. Justice Mahon felt forced to retire. Though he had been sharply criticised by the courts, he remained markedly popular with the New Zealand public until his death in 1986.
Even forty years after the disaster Air New Zealand was still too wounded to allow its then outgoing chief executive, Christopher Luxon — now the nation’s Prime Minister — to contribute to an excellent Radio New Zealand podcast about the episode.
In forty years, no-one took the “Apollo programme” approach, to accept that the system had failed, and that everyone who was responsible for the system had some share of the responsibility for the accident. The emphasis on apportioning blame on individuals was misplaced and counterproductive: damage had been done. What was required was to explain what had happened, to give a full account, to learn from it, and to honour the memories of the dead.
For a small country in the middle of the Southern Ocean, New Zealand has had its fair share of tragedies. By and large, it is good at remembering them and honouring those who fell. There are national memorials to the Ferry Wahine, the country’s major earthquakes and its occasional tragedies of human conflict. But, even forty-five years later, the only national memorial to New Zealand’s greatest peacetime tragedy is the decades of unseemly wrangling about who was responsible and some memorable judicial phraseology.
In 2017, Jacinda Ardern’s New Zealand Government committed to building a National Erebus Memorial. Eight years later, though a former Air New Zealand Chief Executive now holds Ardern’s old job, it has yet to be built. Perhaps that is the signature of that same old system glitch, still running through the Erebus affair. Perhaps it’s just a sign of our times. But in that same period of eight years, Gene Kranz’s Apollo Programme put a man on the moon.
Without a concrete memorial we have only Peter Mahon’s words, but they have an enduring resonance. Mahon understood the system dynamics at play, and turned a beautiful phrase. The closing paragraph of his Report of the Royal Commission, which came to him as he surveyed the crash site in Antarctica, captures perfectly the idea of the “normal accident”:
By a navigational error for which the air crew was not responsible, and about which they were uninformed, an aircraft had flown not into McMurdo Sound but into Lewis Bay, and there the elements of nature had so combined, at a fatal coincidence of time and place, to translate an administrative blunder in Auckland into an awesome disaster in Antarctica. Much has been written and said about the weather hazards of Antarctica, and how they may combine to create a spectacular but hostile terrain, but for my purposes the most definitive illustration of these hidden perils was the wreckage which lay on the mountain side below, showing how the forces of nature, if given the chance, can sometimes defeat the flawless technology of man. For the ultimate key to the tragedy lay here, in the white silence of Lewis Bay, the place to which the airliner had been unerringly guided by its micro-electronic navigation system, only to be destroyed, in clear air and without warning, by a malevolent trick of the polar light.
Thanks for reading! This post is public so feel free to share it.
Resources
By The Jolly ContrarianSightseeing trips to the ice
In the 1970s New Zealand’s national carrier Air New Zealand began operating “sightseeing” flights to Antarctica. The flights would depart from Auckland, fly south over the remote Balleny Islands and onto the ice continent, passing over Capes Hallett and Adare and from there, down McMurdo Sound to the ice continent. There they would circle for an hour or so, to give passsengers a good look at Antarctica, before turning around. They returned to Auckland via Christchurch in the South Island for a refuelling stop: the airline’s fleet of DC-10 wide-bodied jets did not quite have the range to make it back to Auckland comfortably in one go. The round trip took about 12 hours.
At 8 am on 28 November 1979 flight TE-901, the final flight for the season, left Auckland with two hundred and thirty-seven passengers and twenty crew. At the controls was an Air New Zealand veteran, captain Jim Collins. All was well, though visibility was not great. As he approached Antarctica shortly after midday, Collins reported cloud cover at 5,000 metres, and requested permission from local air traffic control at the McMurdo Sound naval base to drop below the cloudbase to give passengers a better view. McMurdo granted it.
Soon after that exchange, McMurdo Base lost contact with the DC-10. When it had not reappeared on New Zealand air traffic control systems a couple of hours later, officials began to fear the worst.
Over the afternoon, planes were scrambled from McMurdo Base. They traced TE-901’s last known position. They followed its scheduled flightpath. They found nothing.
This Substack is reader-supported. To receive new posts and support my work, consider becoming a free or paid subscriber.
Hours passed. News of the missing plane began to spread. As the Auckland evening drew in, relatives began to reconcile themselves to the idea of disaster. Air New Zealand issued a grim deduction: the plane could not possibly still be in the air: it would long since have run out of fuel.
Then, at about midnight — the sun does not set in Antarctica in late November — the crew of a US Navy Hercules spotted a dark smear on Ross Island, some forty-five kilometres east of the plane’s scheduled flight path. They investigated. They quickly realised it was the remains of an air crash. The aeroplane’s tail section, but little else, was intact. It bore Air New Zealand’s distinctive koru. Wreckage stretched for hundreds of metres up the lower slopes of Mt. Erebus, a 3,900 metre volcano.
The Hercules radioed back to base. They had found TE-901. There would be no survivors.
A commercial airliner with two-hundred and fifty seven souls on board had flown at an altitude of 400 metres into a mountain the size of the Eiger.
At the time it was the fourth worst crash in the history of powered flight. It remains New Zealand’s worst single peacetime loss of life.
The crash recovery, at such an inhospitable location, was harrowing. Many on the rescue teams were permanently traumatised. The New Zealand Government created a special civilian award for gallantry for those who went down to the ice.
The government initiated an investigation. The cause of the crash remained baffling. The plane’s captain, Jim Collins, was an experienced and conscientious pilot. He had reported no trouble. The plane’s telemetry indicated it was functioning perfectly. The flight recorders indicated the cockpit seemed harmonious, though there was some discussion about visibility. But photographs recovered from passenger cameras indicated good visibility below the plane, and it had been the basis on which Collins’ was granted permission to descend below the plane’s minimum safe altitude of just under 5,000 metres. That altitude had been set a good kilometre above the highest point of the surrounding terrain, which was the crater of the active volcano Mt.Erebus, on Ross Island to the east of the Sound.
Yet, despite all this, the airliner had been nearly fifty kilometres off course and was flying lower than 500 metres above sea level.
Someone, or something, had plainly gone very wrong. Quickly — the consensus, after 45 years, is far too quickly — the question turned to who. As is so often the case in “human error” investigations, the first person in the frame was the operator: Jim Collins. Captain Collins, as is also so often the case, was in no position to argue.
Whether or not Collins was at fault, there is a deeper point. It ought to be a well-understood operating risk in any complex operation that people make mistakes. Human error is not an exception to the operation of a system but an inevitability. Yet, when failure happens, our instinctive response to find the wrongdoer.
It need not be that way.
Moonshot
When President Kennedy fired the starting gun on the space race in 1962 — at which point the Reds had a lap head start — the Apollo Programme was on an absurdly tight schedule. The farthest it had then got was Alan Shepard’s suborbital Mercury flight in May, 1961. That lasted fifteen minutes and reached an altitude of less than two hundred kilometres from Earth. It wasn’t even, truly, in space. Now Kennedy promised that NASA would have a man on the moon by the end of the decade.
The moon is nearly four hundred thousand kilometres from Earth. The mission would take ten days. The challenge NASA faced was, literally, an order of magnitude greater than its greatest achievement to date.
Then, in January 1967, disaster struck. A terrible fire during a launchpad test killed three astronauts in NASA’s Apollo programme. The immediate cause of the accident was a stray spark from exposed electrical wiring, which ignited in the pure oxygen environment of the sealed capsule. The crew did not stand a chance. They were bolted in: it would have taken them a minute and a half to open the door.
Programme director Gene Kranz shut down the blame game before it could get started with his famous “tough and competent” speech.
Spaceflight will never tolerate carelessness, incapacity, and neglect. Somewhere, somehow, we screwed up. It could have been in design, build, or test. Whatever it was, we should have caught it. We were too gung-ho about the schedule and we locked out all of the problems we saw each day in our work.
[...] Nothing we did had any shelf life. Not one of us stood up and said, ‘Dammit, stop!’ I don’t know what Thompson’s committee will find as the cause, but I know what I find. We are the cause! We were not ready! We did not do our job. We were rolling the dice, hoping that things would come together by launch day, when in our hearts we knew it would take a miracle.
[...] From this day forward, Flight Control will be known by two words: ‘Tough’ and ‘Competent.’ Tough means we are forever accountable for what we do or what we fail to do. We will never again compromise our responsibilities. Every time we walk into Mission Control we will know what we stand for. Competent means we will never take anything for granted. We will never be found short in our knowledge and in our skills.
Here are two divergent responses to disaster. The first, and often instinctive, is to find a single root cause in an individual—to isolate a “bad apple”. The second is exemplified by Kranz. He shifted the focus immediately from who to blame to what to fix: How can we redesign the system—the spacecraft, the procedures, the culture—to ensure this never happens again?
The Erebus disaster would, unfortunately, take the first path.
Investigation
Within a few months of the crash New Zealand’s chief air accidents investigator Ron Chippindale issued a preliminary report. It was thorough — something like 90 pages — and pulled few punches. Chippindale attributed the crash to pilot error, more or less upon the legal principle res ipsa loquitur — “sometimes, things speak for themselves”:
The probable cause of this accident was the decision of the captain to continue the flight at low level toward an area of poor surface and horizon definition when the crew was not certain of their position and the subsequent inability to detect the rising terrain which intercepted the aircraft’s flight path.
If you can’t see a mountain in front of you and you fly into it, that’s on you.
But Jim Collins was a fastidious pilot. He had been briefed on the route. He had studied it diligently. His family had watched him, the night before departure, marking up the route on his own personal atlas, which he took with him on the plane. And besides: it made no sense. Why would an experienced pilot drop through the published minimum safe altitude if he didn’t know where he was and couldn’t see where he was going?
One of Mr. Chippindale’s other observations — made in passing and of little consequence — caught the attention of the pilot’s association:
The flight planned route entered in the company’s base computer was varied after the crew’s briefing in that the position for McMurdo on the computer printout used at the briefing was incorrect by over 2 degrees of longitude and was subsequently corrected prior to this flight.
[...]
Some diagrams and maps issued at the route qualification briefing could have been misleading in that they depicted a track which passed to the true west of Ross Island over a sea level ice shelf, whereas the flight planned track passed to the east over high ground reaching to 12450 feet [about 3,800 metres] above mean sea-level.
That “high ground” was Mt. Erebus. But even at McMurdo Sound’s extreme southern latitude, “two degrees of longitude” is no small difference: it is about 50 kilometres. The route Jim Collins had been briefed on would take him well to the west of Mt. Erebus: if he was going by his maps, he would have been over McMurdo Sound. But still: even from 50 kilometres, mountains the size of Erebus are hard to miss. If visibility was poor, the plane should not have been at a dangerously low altitude. As far as Ron Chippindale was concerned, the operating cause of the disaster remained that Jim Collins flew his plane into a mountain.
Air New Zealand quickly swung in behind this narrative. The pilot’s union did not.
Another thing to bear in mind: the Antarctic flights were unlike ordinary commercial flights for a couple of reasons. Firstly, being a round-trip, they didn’t have a fixed destination waypoint. Planes navigate via “waypoints”. The final one is usually the air traffic control tower of the destination airport. Needless to say, you have to reach a runway waypoint exactly: near enough is not good enough. But TE-901 was not aiming at a landing strip, but rather just a marker it would fly around before heading home.
Secondly, it was a sightseeing trip: the point of the journey was to have a look around, so the pilots might be expected to take a little more, well, latitude than one might expect on a normal commercial route, especially if visibility was patchy.
And thirdly, it was, well a sight-seeing trip. That meant being closer to the ground that the DC-10’s normal cruising altitude of 10,000 metres. There is no chance of hitting any mountain when you are 10 kilometres in the air, but it is not much good for sightseeing, especially when there is a layer of cloud below. Pilots on the Antarctic route had standing permission to descend to a “minimum safe altitude” of 4,900 metres — still well above Mt. Erebus’s summit — and it emerged during later evidence that they regularly dropped well below even that to give passengers a view of the Sound as they approached.
That typo
There is a little frisson here: the “error” mentioned in Mr. Chippindale’s report had been coded into the computer coordinates and the Airline’s briefing notes for more than a year. The previous Air New Zealand flights had all taken this route, incorrect though it was flying down McMurdo sound, some 43 kilometres west of the approved flightpath.
It was, in fact, a better route for sightseeing, precisely because you could fly low over the sea-ice. The approved route went right over the top of Mt. Erebus, obliging the pilots to stay much higher and having to contend with the small matter of flying over the crater of an active volcano!
It is possible — no-one knows — that Air New Zealand knew of the error, and kept quiet about it on account of its somewhat difficult relationship with air traffic control at the McMurdo Station, which was not used to commercial flights and did not like having to “babysit” them. Had McMurdo known about the error, they may have withdrawn permission for Air New Zealand to fly it.
There is some controversy as to what prompted the action, but in the hours before TE-901’s departure, an Air New Zealand flight controller noticed the “error” in the intended flight co-ordinates — they had been miskeyed as “164°48’ east” instead of “166°48’ east” — and corrected them back to the approved civil aviation route.
It is possible that the real error was this correction to the original error, and the ideal route was down McMurdo Sound all along. As far as Captain Jim Collins was concerned, that was where he was meant to be.
In any case, the correction was never communicated to Captain Collins or his crew. The DC-10’s navigation system would take the plane directly over Mt. Erebus, and not 43 kilometres to the west.
At first this non-communication of such a large correction seems outrageous, but it is an oversight of exactly the same nature as the one apparently made in the cockpit. The controller must have assumed it was an insignificant change. It would be, as long as the plane stayed at its permitted altitude. Seeing as it realigned the plane with its approved flight path the controller may have also assumed his action simply conformed to the route Captain Collins was expecting to fly.
And — had he even taken his thought process that far — he might have figured that, if all else fails, Collins would be able to see that the flightpath had been shifted. Mt. Erebus is nearly four kilometres high: it is usually quite hard to miss.
Visual meteorological conditions
On 28 November 1979 there was a heavy layer of cloud over McMurdo Sound at the programmed altitude. Captain Collins sought from McMurdo Air Traffic Control, and was granted, permission to descend below the cloud base to give the passengers a better view. On his co-ordinates he, and local air traffic controllers at McMurdo Station, believed he was flying over McMurdo Sound, so there was little risk. McMurdo ATC had a radar that could “let down” the plane 400 metres, so approved the descent as long as Collins maintained “visual meteorological conditions” — weather conditions good enough to maintain separation from other aircraft and obstacles by sight. Collins confirmed that he did. Passenger photographs recovered from the crash site confirm there was good visibility across the ground. McMurdo’s radar system never managed to lock onto the DC-10.
Had it been clear, he would have seen Mt. Erebus ahead. But it was heavily overcast, and when a snow-covered landscape blends into a flat white sky the horizon disappears in an unusual condition called a “sector whiteout”. This is different from the whiteout that you might experience when skiing, in which you lose all sense of up and down. It is less obvious, and more insidious.
In a “sector” whiteout clouds above clear air reflect light preventing any snow-covered features from casting shadows. The sky, snow, and horizon blend together. There is no contrast to reveal slope. Rather than obscuring your vision it affords an apparently clear view. A colossal mountain is indistinguishable from flat ice. Captain Jim Collins expected to see flat ice. He looked out his window, and that is what he saw. He had no idea he was flying straight into a mountain until his ground indicator warnings sounded. By then it was too late.
So a highly unlikely combination of factors — an unrecognised programming error, its non-communication, a cloud layer and the visual conditions it created — contrived to create a disaster. Had any of the factors not been present TE-901 would have returned safely.
Normal accidents and system glitches
The Erebus crash was a textbook example of what Charles Perrow called a “normal accident”: a catastrophic failure mode of a highly complex, tightly-coupled system. It is literally a textbook case: it is in Perrow’s book. Perrow called them “normal” accidents because they arise during normal passages of expected operating conditions. No major error, oversight or sabotage is needed to cause them: just a confluence of unusual circumstances and the sort of ineradicable misapprehensions that comprise the human condition and from which we all from time to time suffer.
Normal accidents are not, in the final analysis, really “failures” as such: rather, they unwelcome operating modes that arise as a function of sheer system complexity: when non-adjacent parts of the system that aren’t meant to interact do, then there is a potential for non-linear reactions. The Erebus disaster is a classic case.
Catastrophic failures are all too easy to recognise when they do happen.
But what happens when the malfunctioning complex system is not a tightly coupled power station or commercial aviation system, but one of bureaucracy and law? Here human judgment and misapprehension is just as vital. What happens when the failure is not a sudden explosion or a plane crash on a distant mountain, but a series of loosely coupled misapprehensions, misalignments of interests, and clouded judgments? When they happen not in seconds, but over years? Where each error is not recognised as a failure but, rather, as a success? What if errors are taken as validations of a system that is in fact malfunctioning?
These would not look like “accidents” at all: they would present — much later — like latent defects. In the meantime they may have been built on and integrated into foundations. Those afflicted by these everyday misapprehensions might not be blamed, or vilified, but rewarded for carrying out their appointed function. These calumnies may only unravel years later. When they do, there will be the same hue and cry, and those responsible will be, in the same way, eviscerated.
These we might call “system glitches” — failures that don’t look like failures, and so are allowed to repeat, without a fixed end point.
What might one of these look like? The best recent example is the Post Office Horizon IT scandal: the prosecution over fifteen years of literally hundreds of sub-postmasters for non-existent fraud. The prosecutions had been wrought by a sprawling, incomprehensible “complex system” — a complex combination of software, software vendors, investigators, inhouse lawyers, Post Office executives, externally appointed solicitors and barristers and the court system — seemed to be working properly. It identified fraudulent behaviour, successfully extracted compensation for it and punished those the system held responsible.
Subpostmasters tend to be “pillar-of-the-community” types, often having taking the role out of a sense of public duty. They tend to notably lack criminal records, nor any motive for petty fraud.
But it is only when we step back to ask a bigger question that the possibility of a system glitch emerges: can it really be true that nine hundred generally upstanding individuals should start independently defrauding the Post Office in strikingly similar ways, just as the Post Office introduced a state-of-the-art accounting system designed to detect fraud?
Catastrophic accidents typically only happen once. The lesson is learned, systems are updated, protocols introduced, and the complex system adapts. They tend, therefore, to be highly unusual events. The system gradually gets safer.
The Post Office Horizon IT scandal shows us that this need not be true for system glitches. They may not recognised. They can happen over and over again. A false prosecution that looks like a fair one will not prompt any change in behaviour. If the same circumstances come up, we should expect the same outcome.
Hunter becomes the hunted
This is the most insidious system effect of all: the self-justifying hunt for another scapegoat to explain how the system made scapegoats. Once revealed, the system glitch becomes obvious. Of course these public-spirited subpostmasters weren’t siphoning away money! That is absurd! Now, we must make an example of those villainous few who made it their business to prosecute them! With this new resolve, the system reconfigures itself to find some new “bad apples” to replace the subpostmasters in the public stocks. Commissions of enquiry are convened. King’s Counsel appointed. The shellacking is broadcast on the internet for all the world to see. It is a modern-day public flogging.
But is not this to commit the same category error the system made in the first place? Aren’t we looking for simplistic, linear causes of a complex, non-linear failure?
The lawyers who prosecuted subpostmasters were, in their own way, acting within their expected function in the system.
They were presented with what appeared to be clear evidence of theft, from a respected institution, backed by what they were told was infallible technology. Their role in the system was to prosecute apparent crime; that is what they did. The system incentivised success, not scepticism. They were operating with their own “local” information, unable to see the wider pattern that would have revealed the truth.
This is not to excuse malicious or knowingly dishonest conduct. But it is to doubt that many of the hundreds of people involved in prosecutions over 15 years were malicious or knowingly dishonest. Most were, like the rest of us, beset by misapprehension, cognitive bias and ordinary human fallibility. That is all it took. They were unsighted components in a sprawling, malfunctioning system failure. Now that failure has manifested itself, the system turns on those same components and treated them the same way they had treated the subpostmasters. We find this to be a satisfying type of retributive justice, but it serves to misdirect attention away from a system that keeps glitching.
What if the real villain is not a person, but the pathological dynamics of the complex system?
Human organisations as complex systems
Complex system/ˈkɒmplɛks ˈsɪstəm/ (n.)
A self-organising system of autonomous individuals, components and subsystems, which interact in non-linear ways and whose behaviour cannot be reliably predicted from the behaviour of the individual parts. The rules, boundaries and components of a complex system are typically not well defined, may themselves be complex systems, and may change unexpectedly.
The thing about human organisations is that they are complex systems. The Post Office Horizon IT scandal wasn’t caused by one bad actor — nor even a lot of them — but by the unlikely conjunctions and interactions of actors doing, generally, what they were there to do, and were expected to do, only mediated by the peculiar motivations, incentives and information gaps that cross-cut any component’s interactions with the rest of the system.
In this way the system itself contrived to distribute a latent failure across multiple institutions and people. Each component operated within its own domain and according to its own incentives and with its own limited information. The Post Office needed to boost its results. Fujitsu needed to demonstrate Horizon’s reliability. Fujitsu employees knew there may be bugs but were disincentivised by their employer from flagging or escalating them. Isolated subpostmasters could not see that others were experiencing the same impossible discrepancies. Lawyers presumed an institution as august as the Post Office had well founded concerns and played their role in optimising the success of what they took to be thoroughly justified prosecutions.
The individuals interacting with this “system” — and their subsystems — broadly did not have the information needed to assemble apparently innocuous local warning signs into a coherent picture of wide-scale injustice. Now that we do have that information — thanks to the intervention of other complex systems, like the fourth estate — it is hard to put ourselves back in the purblind position that the actors were in at the time.
This is not to excuse, but to explain: many are understandably anxious to isolate and punish a villain, but that is to make exactly the same category error that this glitching system has made: to look for a single root cause of a complex and unpredictable interaction. The right reaction, on seeing in-house lawyers squirming under cross examination at the Post Office Horizon Inquiry was to think, “there but for the grace of God go I”.
This does not mean there should be no consequences: it just means they should be target at the people who are meant to be accountable for these systems. There is little to be gained from blaming operators, even when they are still here to speak for themselves. Accountability belongs with those empowered to design, maintain and change the systems we operate. For the Post Office, that was Paula Vennels and the Post Office executive. For Erebus it was Morrie Davis and Air New Zealand’s executive. In both cases they notably baulked at accepting a responsibility that was clearly theirs. In stark contrast, Gene Kranz and the NASA flight control team accepted their responsibility and acted on it. They are rightly held up as exemplars of how accountability is meant to work.
Air New Zealand’s executive massaged the picture to deflect their accountability. Responsibility lay not with a pilot following his brief, even if he was mistaken, but with airline management allowing a system to persist where simple misapprehension could result in disaster. Human error is, in any system, inevitable. Likewise, final responsibility for the Post Office scandal lies not with those pursuing the prosecutions, however short-sightedly, but with those who set their incentives and encouraged that short-sightedness, discouraging internal challenge and ushered in those outcomes regardless of their plausibility.
Complex systems have a “mind” of their own. Especially those that comprise human agents who, literally, have minds of their own. We can try to harness them, but the opportunity for chaos, swamping our best laid plans to remain in control, is never far away. Those who commission systems remain responsible when, as they will, they play up.
We tell ourselves we are in control: all to often, the gods have other ideas.
Memoriam
The Erebus disaster could have been something that bound a small nation together. Instead, it became a story about blame, anger and recrimination. A quest for the truth that went badly off course.
— Michael Wright, White Silence, episode 6.
The Erebus recriminations did not stop with the Chippindale report. The uproar was such that the government commissioned a royal enquiry. It would be chaired by a High Court Judge, Mr. Justice Peter Mahon. Mahon rejected Chippindale’s report and exonerated Collins and his crew, finding that the airline’s administrative mistakes — especially that typo and the failure to communicate its correction to the crew — were the true operating causes of the crash.
The Judge was unusually scathing about the corporation’s dissembling after the crash, theatrically describing the totality of Air New Zealand’s evidence as an “orchestrated litany of lies”. That meaty phrase has become part of the New Zealand canon.
The mountain continued to claim its victims. Air New Zealand’s combative chief executive Morrie Davies abruptly retired in the wake of the Mr. Justice Mahon’s report in, he claimed, “an attempt to remove a focus point from this current controversy and hasten the company’s recovery.”
The Royal Commissioner himself would be next in line. The airline sought judicial review of Mr Justice Mahon’s findings — especially his description of the corporation’s evidence as “an orchestrated litany of lies”. New Zealand’s Court of Appeal ruled that as the airline had not been offered an opportunity to respond to the allegations before they were published there had been a “breach of natural justice”. Rather anaemically, the Privy Council then upheld that ruling. Consequently, Mr. Justice Mahon felt forced to retire. Though he had been sharply criticised by the courts, he remained markedly popular with the New Zealand public until his death in 1986.
Even forty years after the disaster Air New Zealand was still too wounded to allow its then outgoing chief executive, Christopher Luxon — now the nation’s Prime Minister — to contribute to an excellent Radio New Zealand podcast about the episode.
In forty years, no-one took the “Apollo programme” approach, to accept that the system had failed, and that everyone who was responsible for the system had some share of the responsibility for the accident. The emphasis on apportioning blame on individuals was misplaced and counterproductive: damage had been done. What was required was to explain what had happened, to give a full account, to learn from it, and to honour the memories of the dead.
For a small country in the middle of the Southern Ocean, New Zealand has had its fair share of tragedies. By and large, it is good at remembering them and honouring those who fell. There are national memorials to the Ferry Wahine, the country’s major earthquakes and its occasional tragedies of human conflict. But, even forty-five years later, the only national memorial to New Zealand’s greatest peacetime tragedy is the decades of unseemly wrangling about who was responsible and some memorable judicial phraseology.
In 2017, Jacinda Ardern’s New Zealand Government committed to building a National Erebus Memorial. Eight years later, though a former Air New Zealand Chief Executive now holds Ardern’s old job, it has yet to be built. Perhaps that is the signature of that same old system glitch, still running through the Erebus affair. Perhaps it’s just a sign of our times. But in that same period of eight years, Gene Kranz’s Apollo Programme put a man on the moon.
Without a concrete memorial we have only Peter Mahon’s words, but they have an enduring resonance. Mahon understood the system dynamics at play, and turned a beautiful phrase. The closing paragraph of his Report of the Royal Commission, which came to him as he surveyed the crash site in Antarctica, captures perfectly the idea of the “normal accident”:
By a navigational error for which the air crew was not responsible, and about which they were uninformed, an aircraft had flown not into McMurdo Sound but into Lewis Bay, and there the elements of nature had so combined, at a fatal coincidence of time and place, to translate an administrative blunder in Auckland into an awesome disaster in Antarctica. Much has been written and said about the weather hazards of Antarctica, and how they may combine to create a spectacular but hostile terrain, but for my purposes the most definitive illustration of these hidden perils was the wreckage which lay on the mountain side below, showing how the forces of nature, if given the chance, can sometimes defeat the flawless technology of man. For the ultimate key to the tragedy lay here, in the white silence of Lewis Bay, the place to which the airliner had been unerringly guided by its micro-electronic navigation system, only to be destroyed, in clear air and without warning, by a malevolent trick of the polar light.
Thanks for reading! This post is public so feel free to share it.
Resources