> The software and system are not properly tested.
Followed by suggesting to do fuzzing tests.
* Automatically generating valid flight paths is somewhat hard (and you'd have to know which ones are valid because the system, apparently, is designed to also reject some paths). It's also possible that such a generator would generate valid but improbable flight paths. There's probably an astronomic number of possible flight paths, which makes exhaustive testing impossible, thus no guarantee that a "weird" path would've been found. The points through which the paths go seem to be somewhat dynamic (i.e. new airports aren't added every day, but in a life-span of such a system there will be probably a few added). More realistically some points on flight paths may be removed. Does the fuzzing have to account for possibilities of new / removed points?
* This particular functionality is probably buried deep inside other code with no direct or easy way to extricate it from its surrounding, and so would be very difficult to feed into a fuzzer. Which leads to the question of how much fuzzing should be done and at what level. Add to this that some testing methodologies insist on divorcing the testing from development as not to create an incentive for testers to automatically okay the output of development (as they would be sort of okaying their own work). This is not very common in places like Web, but is common in eg. medical equipment (is actually in the guidelines). So, if the developer simply didn't understand what the specification told them to do, then it's possible that external testing wasn't capable of reaching the problematic code-path, or was severely limited in its ability to hit it.
* In my experience with formats and standards like these it's often the case that the standard captures a lot of impossible or unrealistic cases, hopefully a superset of what's actually needed in practice. Flagging every way in which a program doesn't match the specification becomes useless or even counter-productive because developers become overloaded with bug reports most of which aren't really relevant. It's hard to identify the cases that are rare but plausible. The fact that the testers didn't find this defect on time is really just a function of how much time they have. And, really, the time we have to test any program can cover a tiny fraction of what's required to test a program exhaustively. So, you need to rely on heuristics and gut feeling.
None of this really argues against fuzz testing; even with completely bogus/malformed flight plans, it shouldn't be possible for a dead letter to take down the entire system. And, since it's translating between an upstream and downstream format (and all the validation is done when ingesting the upstream), you probably want to be sure anything that is valid upstream is also valid downstream.
It's true that fuzz testing is easiest when you can do it more at the unit level (fuzz this function implementing a core algorithm, say) but doing whole-system fuzz tests is perfectly fine too.
This is not against the principle of fuzz testing. This is to say that the author doesn't really know the reality of testing and is very quick to point fingers. It's easy to tell in retrospect that this particular aspect should've been tested. It's basically impossible to find such defects proactively.
Easy for me to say in retrospect, but IMO this is a textbook example of where you should reach for fuzz testing; it’s basically protocol parsing, you have a well-known text format upstream and you need to ensure your system can parse all well-formed protocol messages and at very least not crash if a given message is invalid in your own system.
Similarly with a message queue, handling dead letters is textbook stuff, and you must have system tests to verify that poison pills do not break your queue.
I did not think the author was setting unreasonable expectations for the a priori testing regime. These are common best practices.
This all sounds like exactly the stuff that fuzzing or property-based testing is good for
And if the functionality is "buried deep inside other code with no direct or easy way to extricate it from its surrounding" making it hard to test then that's just a further symptom of badly designed software in this case
> The software and system are not properly tested.
Followed by suggesting to do fuzzing tests.
* Automatically generating valid flight paths is somewhat hard (and you'd have to know which ones are valid because the system, apparently, is designed to also reject some paths). It's also possible that such a generator would generate valid but improbable flight paths. There's probably an astronomic number of possible flight paths, which makes exhaustive testing impossible, thus no guarantee that a "weird" path would've been found. The points through which the paths go seem to be somewhat dynamic (i.e. new airports aren't added every day, but in a life-span of such a system there will be probably a few added). More realistically some points on flight paths may be removed. Does the fuzzing have to account for possibilities of new / removed points?
* This particular functionality is probably buried deep inside other code with no direct or easy way to extricate it from its surrounding, and so would be very difficult to feed into a fuzzer. Which leads to the question of how much fuzzing should be done and at what level. Add to this that some testing methodologies insist on divorcing the testing from development as not to create an incentive for testers to automatically okay the output of development (as they would be sort of okaying their own work). This is not very common in places like Web, but is common in eg. medical equipment (is actually in the guidelines). So, if the developer simply didn't understand what the specification told them to do, then it's possible that external testing wasn't capable of reaching the problematic code-path, or was severely limited in its ability to hit it.
* In my experience with formats and standards like these it's often the case that the standard captures a lot of impossible or unrealistic cases, hopefully a superset of what's actually needed in practice. Flagging every way in which a program doesn't match the specification becomes useless or even counter-productive because developers become overloaded with bug reports most of which aren't really relevant. It's hard to identify the cases that are rare but plausible. The fact that the testers didn't find this defect on time is really just a function of how much time they have. And, really, the time we have to test any program can cover a tiny fraction of what's required to test a program exhaustively. So, you need to rely on heuristics and gut feeling.