In March of 1988 I went to work for one of the weekly nag rags after a year of free lancing. This particular rag was affiliated with and partly owned by one of the Thoroughbred database and computing companies. The main idea was for me to write articles about Thoroughbred pedigrees and other topics using statistics generated by their omniscient computer.
I use the term “omniscient computer” (OC) somewhat sarcastically. It did not take too many days of meeting people at the new job and seeing how things actually ran there for me to decide that I trusted their computer-generated statistics about as far as I could throw their main frame (yes, they still had a main frame back then).
Aside from the HHT (more about him later), the main problem was that a computer is only as good as its programmers. The programmers there sucked, at least from my point of view. I’m sure that they were totally adequate as programmers. The problem was that they knew diddly squat about racing and/or pedigrees and fundamentally did not give a shit about either one.
Looking back on it now, I am surprised that I was not fired (or decided to quit myself) within the first month or two at that job. It was that bad. I am surprised that I lasted almost five years there, especially considering that I did not exactly cooperate with their master plan for statistics generated by their OC.
About halfway through my tenure there I came across a programmer who actually seemed to know what he was doing. Keith was his name, and I started using some of the statistics the OC generated using his programming (his was the only programming there I actually trusted enough to use).
I remember one of those projects with Keith quite well. I wanted to analyze the 20 most popular sires in the third generation of all foals sold as yearlings in 1980-1984 (sound familiar?). Keith wrote the program. The OC spit out the results. I was looking over the results. They looked perfectly credible. They looked pretty random, but that was more or less what I was expecting. They looked like a CRAPSHOOT (also more or less what I was expecting).
The numbers for Native Dancer at P2 in the third generation caught my eye. Native Dancer was the broodmare sire of Northern Dancer. The latter had dozens of yearlings sold for $1,000,000+ in 1980-1984. So the prices for Native Dancer at P2 should have been astronomically high. They were a little high, but not astronomically so.
So I went and talked to Keith and explained the problem as I perceived it to him. He said he would look into it and get back to me. He did so the next day and explained the problem. The prices and results were not matching up correctly. They were off by one. The price for one hip was matching up with the result and pedigree for the next hip.
So Keith fixed the program and reran it. This time the results looked even more credible. Native Dancer was appropriately astronomical at P2 in the third generation for both prices and results. The results still looked like a crapshoot, but at least now it was a more understandable crapshoot. I scribbled the article, and it was published without further problems.
That was a classic example of computer error. Keith was by far the best programmer I ever encountered, but even the best can make mistakes. And when the computer makes an error (whether because of the program or some other reason), the amount of error can be WHOPPING. All you have to do is misalign prices and pedigrees by one nag, and POOF!!!, all of your results are XXXXed up.
Here is another example of an even more pernicious type of computer error. I had a friend and fellow slave at this place I called the MPD (for multi-peckered dog, as in XXXXX, you are luckier than a dog with two peckers). The MPD was showing me a printout from the OC one day. It was a sire report on some South American nag, maybe Mat-Boy (ARG). This nag had five stakes winners from 17 foals (almost 30%) according to the report from the OC.
The MPD was raving about this sire with 30% stakes winners from foals. Not as politely or as diplomatically as I should have, I explained the facts of life to the MPD. The facts of life are that just because a report from the OC says that a sire has five stakes winners from 17 foals does NOT mean that is a FACT. In this case the nag stood in some South American country. His number of stakes winners was complete and correct (probably). His number of foals was nowhere near correct. If he had five stakes winner, he probably had at least 100 foals of racing age. The OC listed the 17 foals in its data base. The number of foals NOT in its data base in this case was a big unknown.
The point of this story is that people will believe ANYTHING if they see it on a computer printout. It may be true that computers do not lie, but computers do not know how complete or incomplete their database is either. So if you work with computers, it behooves you to know what is complete and what is not in the database. A certain amount of skepticism usually does no harm, especially when you encounter something that seems improbable.
Human error occurs more frequently than computer error but is not likely to be as WHOPPING in amount (unless the human error was made by a computer programmer). That is one big reason why I still prefer to do all these statistics myself (aside from the fact that I am not a computer programmer and do not have access to computer programmers anymore). I use the computer as an instrument for data mining, but not for data management or number crunching. I know I make human errors along the way, but those errors are minuscule in the grand scheme of things (especially compared to the possible scale of computer errors).
In order to recognize computer error when you see it, it helps if you how how the results SHOULD look (that an AEI of 2.26 for 65,196 average mares is NOT within the realms of probability, for example). I fear that is what is being lost these days, as evidenced by that article in the Bland-Horse. If you don’t know how the results SHOULD look, you will accept anything the computer tells you (no matter how erroneous) and publish it as if it were the truth and not give a damn.
You need to have a “feel” for the data. And the best way to acquire that “feel,” needless to say, is by mining it all yourself. If you let the computer do all the data mining, you have absolutely no “feel” for what the results should be. You could end up accepting anything the computer tells you (no matter how erroneous) and publishing it as if it were the truth and not giving a damn.
That “feel” sometimes boils down to intuition that something just is not kosher. “This is right/That is wrong/You just keep nagging me all day long. . . . ”
As I have probably stated before, I am a big fan of Hank Williams Sr. (author of the lyrics above). I am going to quote from the liner notes of a Hank Williams album (yes, an album, not a CD) I bought about 25 years ago:
“A Tennessee farmer once asked, ‘Hank, how do you “make” all those songs?’ Though the choice of words might be improved upon, in one way the question was well put. Hank Williams did more than just write a song on paper and then sing it. He ‘made’ the song from his own creative genius, then he made it a lasting standard in American music with his own style of delivery. When reference was made to Hank’s rural manners or his unglamorous use of English grammar he was known to have commented, ‘You’ve had to surveyed a lot of farm land over the back side of a mule to be a good country singer.’ Hank surveyed his share of acreage in south Alabama and it evidently didn’t hurt his career.”
I propose a corollary to that quote above. You have to have looked at a few million pedigrees and crunched a few million numbers to know anything at all about pedigrees and statistics.