Data structure choice no laughing matter
Earlier this year, I casually mentioned that data structures are serious affairs, like hypertension or texting-while-driving: people die when they're done badly. A friend assumed I was jesting. It's time to begin to explain how serious I was. It can only be a beginning, because the topic deserves at least a book's-worth of explanation. Here's my start: Care in datatyping
October 4, 2016
Earlier this year, I casually mentioned that data structures are serious affairs, like hypertension or texting-while-driving: people die when they're done badly. A friend assumed I was jesting. It's time to begin to explain how serious I was.
It can only be a beginning, because the topic deserves at least a book's-worth of explanation. Here's my start:
Care in datatyping
Focus on typing. Good data structures are appropriate types, ones that facilitate correct computations. Bad data structures introduce noise and confusion.
The Therac-25 was a piece of medical equipment that famously killed at least three patients. While careful review revealed at least a dozen systematic errors that combined unhappily, at its heart was a variable, Class3, intended to be used as a boolean with values "test mode" and "operating mode". Operations on it were, "assign zero", and "increment as an octet". Do you see the problem? That second operation could be implemented as, "assign non-zero"; "increment as an octet", though, sometimes overflows, and has the effect of "assign zero". I classify this as a type error: the programmer used byte arithmetic on what should have been a boolean.
More recently, Toyota settled wrongful death claims that resulted, in part, from inconsistently typed global variables and inappropriately cast values.
Thousands of less dramatic typing confusions result in "... names that break computers", working software that mysteriously fails in August, and a host of more mundane mysteries. Data errors regularly lead to people being falsely arrested, evicted, denied credit, diagnosed, and so on, and plenty of these cases result directly from poorly-specified data structures. Y2K apparently cost $100 billion, and it was just a more-or-less deliberate conflation of "integers from 0 to 99" with "calendar year less 1900".
While type errors are far from the only problems possible in data structure, they make the point that good data definitions can be a matter of life and death.
About the Author
You May Also Like