Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, if the primary is known not to be in a good state, you might as well fail over and hope that the issue was a fried disk or a cosmic bit flip or something.

The real safety feature is the 4 hour lead time before manual processing becomes necessary.

One of the key safety controls in aviation is “if this breaks for any reason, what do we do”, not so much “how do we stop this breaking in the first place”.



I'm no aviation safety controls expert but it seems to me that there are two types of controls that should be in place:

1. Process controls: What do we do when this breaks for any reason.

2. Engineering controls: What can we do to keep this from breaking in the first place?

Both of them seem to be somewhat essential for a truly safe system.


It's very hard to ensure you capture every single possible failure mode. Yes, the engineering control is important but it's not the most critical. What to do if it does fail (for any reason) is the truly critical control, because it solves for the possibility of not knowing every possible way something might fail and therefore missing some way to prevent a failure


One or more of three results can come from the engineering exercise of trying to keep something from breaking in the first place:

1. You could know the solution, but it would be too heavy.

2. You could know the solution, but it would include more parts, each of which would need the same process on it, and the process might fail the same way

3. You miss something and it fails anyway, so your "what if this fails" path better be well rehearsed and executed.

Real engineering is facing the tradeoffs head on, not hand waving them away.


The engineering controls don't independently make systems safe, they make things more reliable and cost-effective, and hopefully reduce the number of times the process controls kick in.

The process controls do however independently make things safe.

The reason for this is that there are 'unknown unknowns'—we accept that our knowledge and skills are imperfect, and there may be failures that occur which could have been eliminated with the proper engineering controls, but we, as imperfect beings and organisations, did not implement the engineering controls because we did not identify this possible failure mode.

There are also known errors, where the cost of implementing engineering controls may simply outweigh the benefits when adequate process controls are in place.


Everyone uses slightly different terminology and groups things differently but this will give you the gist.

https://en.m.wikipedia.org/wiki/Hierarchy_of_hazard_controls


It was in a bad state, but in a very inane way: a flight plan in its processing queue was faulty. The system itself was mostly fine. It was just not well-written enough to distinguish an input error from an internal error, and thus didn't just skip the faulty flight plan.


at the risk of nitpicking: "a flight plan in its processing queue was faulty" isn't true, the flight plan was fine. It couldn't process it.

I mention this only because the Daily Mail headline pissed me off with it's usual bullshit foreigner fear mongering crap.


Indeed, that intention is quite transparent in this case. Anyways, I suspect that invalid input exists that would have made the system react in a similar way




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: