Statistics 202

I reckon I am about halfway through my three-year subscription to The Bland-Horse. Sometimes I wonder why I wasted that money in the first place. Other times they do something that makes the subscription seem worthwhile. Not because they did something particularly good. Au contraire. In this case they attempted something statistical and in doing so revealed just how inept they really are in that department. Which makes me laugh, and God knows I NEED all the LAUGHS I can get these days. Which makes the price of the subscription occasionally seem worthwhile.

The article in question is “Class Action” in the issue of October 23, page 3030. My first reaction was amusement because the topics they chose to study are topics that I myself have covered. Racing class of dams I covered in three parts this past summer. Birth rank and age of mares I covered this spring in “Bloodstock in the Bluegrass” (The B-H chose to study only the former). The postbarren effect I covered about 20 years ago.

My second reaction was one of amusement as well, amusement because The B-H acted as if it were some great revelation that the racing class of mares correlates pretty closely with producing class, as if this has not been known for 80 years or so, since the days of “Uncle Joe” Estes. Ditto for birth rank. They acted as if it were some great revelation that first foals are generally below average, that second through fourth foals are generally best, and that results generally tail off after the fourth foal. Numerous studies have been made on birth rank dating back to “Uncle Joe” Estes 80 years ago, and they all show essentially the same thing. It is not exactly a great revelation.

I read the article and scanned through its results and was satisfied that their results were essentially the same as mine (at least in relative terms, though not in absolute terms; more about that later). I did find some problems with the statistics in this article–problems which did not really affect its overall conclusions–but problems nonetheless.

I made several efforts to email the author of this article (to be denoted by the initials EH). No response. Finally I emailed a friend who has the misfortune to slave at that rag. She informed me that EH does not answer ANY of his emails. Nothing personal. OK. So I decided to write this post (nothing personal, merely a professional evaluation of the problems I perceive in this article).

The first problem is in Table 1 on racing class of dams. It starts by showing the number of mares in each category and the collective AEI (Average Earnings Index, 1.00 being average) for those mares. The AEIs range from 20.31 for G1 winners to 0.16 for runners (mares who started but did not win, includes both placed and unplaced mares). No surprises there.

The problem is that it lists 11,529 unraced mares as having a collective AEI of 0.90. AEI is based on earnings. Unraced mares did not start. Unraced mares have no earnings. Unraced mares by definition have an AEI of 0.00. Or if you prefer, you can say that AEI is not applicable to unraced mares.

I fingered that was probably just a typo (“9” being next to “0” on the keyboard). Then I started looking more closely at those numbers, and they did not make sense to me. This table listed all 65,196 mares as having a collective AEI of 2.26.

That 2.26 figure struck me as being WAY TOO HIGH. Only the very best stallions have AEIs of 2.26 or higher. I flipped back to the list of 2010 leading sires on page 3020 of that same issue. Sure enough, only nine of the top 70 stallions listed there have an AEI higher than 2.26, and Candy Ride is right at 2.26. That figure of 2.26 is higher than the AEIs for Giant’s Causeway (1.95), Elusive Quality (1.91), Dynaformer (2.19), etc., etc. No way in Hades that these 65,196 mares were collectively better racehorse than all runners by Giant’s Causeway, Elusive Quality, Dynaformer, etc., etc.

I began to have my doubts that these particular numbers in Table 1 even added up correctly. So I got out my trusty calculator and examined the internal evidence and crunched some numbers:

Class         Mares       Mares’ AEI       Mares x Mares’ AEI
GISW           462             20.31                    9,383
G2SW          391             11.39                     4,453
G3SW          594               8.74                     5,192
SW            4,911               4.44                   21,805
GSP             633               3.96                     2,507
SP             4,403               2.34                   10,303
Wnr        30,141               1.24                    37,375
Rnr          12,132              0.16                     1,941
Unraced  11,529              0.90                   10,376
Totals       65,196            1.58                   103,335

If you take the numbers in Table 1 at face value, the correct AEI for all 65,196 mares is 1.58 (the total of all AEIs, 103,335, divided by the total number of mares, 65,196). Even that is too high because it takes that 0.90 for 11,529 unraced mares at face value. I believe that unraced foals are ignored in the calculation of a stallion’s AEI. If you do the same for unraced mares here, the answer is 92,959 (103,335 minus 10,376) divided by 53,667 (65,196 minus 11,529), or 1.73.

Either 1.58 or 1.73 is much closer to the truth than 2.26, but both numbers are still too high. These are supposed to be average mares. Not that average mares should be 1.00 on the nose. There is SOME selectivity in deciding which mares go to stud and which do not (depending on how much overproduction was in the population in question). So all mares at stud should have a collective AEI of 1.2 or 1.3 or so (just an estimate), but not 1.58 or 1.73 and certainly not 2.26.

Which leads to the biggest overall problem with this study. It is not based on average mares. It is based on an elite (above-average) population of mares. Table 1 lists 18,580 total stakes winners from 407,812 total foals (4.56%). That 4.56% tells you right there that this is NOT an average population of mares. It is an ELITE (above-average) population of mares.

Historically the percentage of stakes winners from foals has fluctuated from 2.5% to 3.5% and generally centers around 3.0% (at least for all foals born in North America). I think TB-H used to run a table of breed norms. Can’t seem to find one now. Let’s go over to the competition then. The most recent one I can find is from the Thoroughbred Times of January 2, 2010, page 28. That chart shows that of all North American-bred named foals of 1992-2001, 3.4% became stakes winners (exact numbers not given). That 3.4% is a little high, but it is within the historical range of 2.5% to 3.5%.

Ditto for graded winners. Table 1 shows 4,652 total graded winners from 407,812 total foals (1.14%). Historically this figure is much closer to 0.6%. Thoroughbred Times shows 0.6% on the nose.

What gives? What gives is that this article is based on an elite (above-average) population of some sort. Here is the language of the article in explaining the sample population chosen.

“The population used in the study consisted of 65,196 mares that produced a foal in North America in 1998, 1999, or 2000. Their entire produce records (foals before and after) were then used for the study–a total of 407,812 foals–more than enough to get an overall picture of measuring class.”

Seems reasonable enough. The only sampling error I perceive is a minor one. Say you have a filly born in 1996 whose first foal was born in 2000 (hence all of her foals fall within the sample). That filly of 1996 could conceivably (no pun intended) still be producing foals in the 2020s. Therefore, all of her foals after 2008 are missing from the sample. Those foals generally have below-average results, as the study itself shows. Removing those foals from the sample increases the overall results of the sample.

So that is a slight bias, but I do not think it is nearly enough to account for the difference in results. The only other possibility is that mares with foals born in 1998, 1999, or 2000 were above-average mares in the first place. I reject that hypothesis out of hand. Populations such as this do not change that much from year to year.

Another reason why I believe this sample must be elite in some fashion is the last column of Table 1,  which shows AEI for the progeny of the different groups of mares. Those AEIs range from 2.60 for G1-winning mares to 1.02 for placed or unplaced mares. All of the groups are above 1.00 (the theoretical average). The fact that not one of the groups is below 1.00 is pretty strong evidence that this sample is above average in the first place. If this sample WERE average, its overall AEI should be right around 1.00. It is actually listed as 1.17. That is not nearly close enough to 1.00 for this entire sample to be considered average.

I strongly suspect that there is some sort of sampling error in determining these 407,812 foals in the first place. That is the ONLY explanation for their results of 4.56% stakes winners from foals and 1.14% graded winners from foals (way above the historical numbers). I admit that I can not pin down the exact nature of the sampling error. Suffice it to say that the absolute results of this study should not be taken as any sort of breed norms. The relative results of this study appear to me to be okay.

Perhaps it would of have been better to have selected a different sample in the first place. For racing class of mares, select all North American-bred foals of 2000-2004, for example. If you take all the foals from five complete crops, that is a pretty good sample and one that is not subject to any sort of sample error.

Of course you could not have done birth rank with all foals of 2000-2004. Perhaps it would have been better to have selected a different sample for that study, say all mares with foals of 1990-1994 and their entire produce records (foals both before and after 1990-1994). Even that might not have been far enough back in time. If a mare had a first foal in 1994 and hence was part of the sample, you would still miss some of her later produce (if she continued to produce into her late teens or 20s). Determining the sample is a tough decision at times and also a critical one.


As for the postbarren effect, the biggest problem with this study is its lack of definitions. The study fails to make clear what it did or did not include as postbarren. All it says is, “In the study, however, the overall stats delivered after a barren or missed year are generally down across the board.” What constitutes a “missed” year? The study does not say.

If it includes “no report” as postbarren, that could be a big problem. No report could mean ANYTHING (including a foal by a non-Thoroughbred stallion). No report means exactly what it says. No foal, no reason given to The Jockey Club. Owners of better mares tend to report the reasons why their mares did not produce a foal in a given year. Owners of cheaper mares tend NOT to report the same reasons. So including “no report” as postbarren (if that is what the study actually did) definitely hurts the postbarren results.

To give an example, turn to page 3063 of the same issue. Look at the stakes shell on Spend a Buck Handicap (G3) winner Mad Flatter. His dam, Miss Pangea, produced foals in 2002, 2003, and Mad Flatter in 2005, but nothing at all is listed for 2004. Presumably 2004 was a “no report” (especially since she IS listed as barren for 2009 and 2010). So was Mad Flatter counted as postbarren (if he fell within the sample, which in this case he did not) or not? I suspect that he would have been counted as postbarren, and I believe that is a mistake, because “no report” could mean ANYTHING. Perhaps it would have been better to have counted only “barren” and “not bred” as postbarren. It could make a big difference in the ensuing statistics.

Turning to the internal evidence, the study lists 117,665 foals as being produced after barren or “missed” year (whatever that means). That is almost 29% of all 407,812 foals, which seems very high to me (too high to have included “barren” and “not bred” ONLY). Which leads me to conclude that this particular study most probably included just about anything nonproductive as “missed” (including “no reports”) and could have seriously skewed the statistics in doing so (particularly by including “no reports”).

All of the above is intended to be a practical discussion of the many pitfalls of statistics, which can include all sorts of errors: computer errors, human errors, and sampling errors. This particular study appears to include examples of all three. For all of the reasons above I would give this particular study a C– at best if I had to grade it. Next time out I will discuss a particular example of computer error.

This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Statistics 202

  1. Pingback: blood-horse can’t add « bloodstock in the bluegrass

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s