The Back Side of a Mule

In March of 1988 I went to work for one of the weekly nag rags after a year of free lancing. This particular rag was affiliated with and partly owned by one of the Thoroughbred database and computing companies. The main idea was for me to write articles about Thoroughbred pedigrees and other topics using statistics generated by their omniscient computer.

I use the term “omniscient computer” (OC) somewhat sarcastically. It did not take too many days of meeting people at the new job and seeing how things actually ran there for me to decide that I trusted their computer-generated statistics about as far as I could throw their main frame (yes, they still had a main frame back then).

Aside from the HHT (more about him later), the main problem was that a computer is only as good as its programmers. The programmers there sucked, at least from my point of view. I’m sure that they were totally adequate as programmers. The problem was that they knew diddly squat about racing and/or pedigrees and fundamentally did not give a shit about either one.

Looking back on it now, I am surprised that I was not fired (or decided to quit myself) within the first month or two at that job. It was that bad. I am surprised that I lasted almost five years there, especially considering that I did not exactly cooperate with their master plan for statistics generated by their OC.

About halfway through my tenure there I came across a programmer who actually seemed to know what he was doing. Keith was his name, and I started using some of the statistics the OC generated using his programming (his was the only programming there I actually trusted enough to use).

I remember one of those projects with Keith quite well. I wanted to analyze the 20 most popular sires in the third generation of all foals sold as yearlings in 1980-1984 (sound familiar?). Keith wrote the program. The OC spit out the results. I was looking over the results. They looked perfectly credible. They looked pretty random, but that was more or less what I was expecting. They looked like a CRAPSHOOT (also more or less what I was expecting).

The numbers for Native Dancer at P2 in the third generation caught my eye. Native Dancer was the broodmare sire of Northern Dancer. The latter had dozens of yearlings sold for $1,000,000+ in 1980-1984. So the prices for Native Dancer at P2 should have been astronomically high. They were a little high, but not astronomically so.

So I went and talked to Keith and explained the problem as I perceived it to him. He said he would look into it and get back to me. He did so the next day and explained the problem. The prices and results were not matching up correctly. They were off by one. The price for one hip was matching up with the result and pedigree for the next hip.

So Keith fixed the program and reran it. This time the results looked even more credible. Native Dancer was appropriately astronomical at P2 in the third generation for both prices and results.  The results still looked like a crapshoot, but at least now it was a more understandable crapshoot. I scribbled the article, and it was published without further problems.

That was a classic example of computer error. Keith was by far the best programmer I ever encountered, but even the best can make mistakes. And when the computer makes an error (whether because of the program or some other reason), the amount of error can be WHOPPING. All you have to do is misalign prices and pedigrees by one nag, and POOF!!!, all of your results are XXXXed up.

Here is another example of an even more pernicious type of computer error. I had a friend and fellow slave at this place I called the MPD (for multi-peckered dog, as in XXXXX, you are luckier than a dog with two peckers). The MPD was showing me a printout from the OC one day. It was a sire report on some South American nag, maybe Mat-Boy (ARG). This nag had five stakes winners from 17 foals (almost 30%) according to the report from the OC.

The MPD was raving about this sire with 30% stakes winners from foals. Not as politely or as diplomatically as I should have, I explained the facts of life to the MPD. The facts of life are that just because a report from the OC says that a sire has five stakes winners from 17 foals does NOT mean that is a FACT. In this case the nag stood in some South American country. His number of stakes winners was complete and correct (probably). His number of foals was nowhere near correct. If he had five stakes winner, he probably had at least 100 foals of racing age. The OC listed the 17 foals in its data base. The number of foals NOT in its data base in this case was a big unknown.

The point of this story is that people will believe ANYTHING if they see it on a computer printout. It may be true that computers do not lie, but computers do not know how complete or incomplete their database is either. So if you work with computers, it behooves you to know what is complete and what is not in the database. A certain amount of skepticism usually does no harm, especially when you encounter something that seems improbable.

Human error occurs more frequently than computer error but is not likely to be as WHOPPING in amount (unless the human error was made by a computer programmer). That is one big reason why I still prefer to do all these statistics myself (aside from the fact that I am not a computer programmer and do not have access to computer programmers anymore). I use the computer as an instrument for data mining, but not for data management or number crunching. I know I make human errors along the way, but those errors are minuscule in the grand scheme of things (especially compared to the possible scale of computer errors).

In order to recognize computer error when you see it, it helps if you how how the results SHOULD look (that an AEI of 2.26 for 65,196 average mares is NOT within the realms of probability, for example). I fear that is what is being lost these days, as evidenced by that article in the Bland-Horse. If you don’t know how the results SHOULD look, you will accept anything the computer tells you (no matter how erroneous) and publish it as if it were the truth and not give a damn.

You need to have a “feel” for the data. And the best way to acquire that “feel,” needless to say, is by mining it all yourself. If you let the computer do all the data mining, you have absolutely no “feel” for what the results should be. You could end up accepting anything the computer tells you (no matter how erroneous) and publishing it as if it were the truth and not giving a damn.

That “feel” sometimes boils down to intuition that something just is not kosher. “This is right/That is wrong/You just keep nagging me all day long. . . . ”

As I have probably stated before, I am a big fan of Hank Williams Sr. (author of the lyrics above). I am going to quote from the liner notes of a Hank Williams album (yes, an album, not a CD) I bought about 25 years ago:

“A Tennessee farmer once asked, ‘Hank, how do you “make” all those songs?’ Though the choice of words might be improved upon, in one way the question was well put. Hank Williams did more than just write a song on paper and then sing it. He ‘made’ the song from his own creative genius, then he made it a lasting standard in American music with his own style of delivery. When reference was made to Hank’s rural manners or his unglamorous use of English grammar he was known to have commented, ‘You’ve had to surveyed a lot of farm land over the back side of a mule to be a good country singer.’ Hank surveyed his share of acreage in south Alabama and it evidently didn’t hurt his career.”

I propose a corollary to that quote above. You have to have looked at a few million pedigrees and crunched a few million numbers to know anything at all about pedigrees and statistics.

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to The Back Side of a Mule

  1. Lakotasblaze says:

    “Garbage in. Garbage out.” That’s what they taught me way back when. If you don’t put the right data in (to find the answer to your question or experiment) and put that data in correctly, then your results are garbage. Just because it was analized by the computer does not make it true. So Keith accidentally matched up the wrong race results to the wrong horses, which made the results ODD and turned out due to error??

    I don’t think finding results out of the norm is “intuition.” I think it is an experienced eye, someone with a knowledge of typical statistical results in a particular area or topic. In other scientific communties like medicine, research is published and held up to the scrutiny of other experts in the field. Research methods & statistical analysis are carefully examined. If there is some fantastic, out of the norm new result in a particular study, the question is then, “WHY?” did somebody discover something new or is it an error of some kind?

    As a personal example, I had a work study job working in the lab of a biology professor who was studying DNA. I mainly washed his test tubes & beakers but I also counted drops for him in his research. I counted the # of drops from a large glass thing dripping into a beaker. I have no idea of the purpose of my data collection at all, I had no idea what I was doing. Dr. So and So probably told me at one point but I don’t remember. I probably didn’t know what he was talking about. Brainy, research scientific types are different kinds of people from most of us, IMO.

    The point to my story is that if I lost count or spaced out or whatever, he could tell something was not right, the results (number of drops) was way off. He would ask me about it and say “Hmmmm, lets do it again.” Repeating that part of the experiment eliminated my human error. So I can identify with a small human error throwing things off but an experienced researcher knows what the norms are and can see if something is fishy. Peer review by others one way or another is a good thing. I used to read a Journal from the Centers For Disease Control. They published the results of research in the particular field of infectious disease I studied. It was done to educate people who worked in this health field and develope new research topics.

    Sounds like the TB industry could use some peer review and information sharing to advance the sport. I would be interesting to study the long held beliefs of trainers & breeders to see how they play out by numbers. I think a great deal could be learned from that. Are there any journals or publications on TB research? Are they presented at meetings or conferences? It also sounds like you have spent a lot of time on the back of your mule down there in KY.

    • ddink55 says:

      Thank you for the comments. Call it intuition. Call it experience. Call it whatever you like. Whatever you call it, it is what was missing from that “B-H” article. The problem, as I perceive it, is that there are very few people in the TB industry who actually understand statistics. Also very few people in the TB industry who care about them. A very small pool from which to draw for peer review. “Market Watch” (a “B-H” newsletter) is more statistically oriented than most, but it generally just puts me to sleep. So no, there are no journals or publications devoted exclusively to TB research (aside from the supremely boring “Market Watch”). Nor any meetings or conferences. A few blogs here and there. Incidentally, by “back side of a mule,” I think Hank was referring to plowing from behind a mule (not riding on a mule). At least that is what it seemed to mean to me (because of the reference to “acres”). Could be wrong of course.

  2. Pingback: Thoroughbred Times | Boojum's Bonanza

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s