Almost all my statistical posts are based on an assumption that the relationship between prices (Price Index) and results (PPI) is pretty close to linear.

I feel pretty certain that if your prices are above 1.00 and your results are below 1.00, that is a poor result. Conversely, if your prices are below 1.00 and your results are above 1.00, that is a good result.

I also feel pretty certain that prices and results below 1.00 correlate well. I am not so certain of the correlation between prices and results above 1.00.

My assumption has been that if you have a Price Index of 2.00, your PPI (results) should be around 2.00 as well. And the corollaries to this are that if your results are below 2.00, you underperformed, and if your results are above 2.00, you overperformed.

I have not really proved that this relationship is that close to perfect. It would take a lot of mathematics (correlation and regression, feeding a large amount of data into a computer) to establish the exact degree of correlation and to finger out the equation that describes the relationship between the two factors more exactly.

For example, it could be that a Price Index of 2.00 should produce a result of 1.90 or thereabouts (instead of 2.00, as I have so far assumed).

—————————————————————————————-

The paragraphs above are from a previous post (a sidebar actually) called Without a Coon????? Just for grins I decided to explore this idea a bit further by examining the results of all sales foals of 2003-2007 by their price ranges. Be forewarned that the following post is going to be even more mathematical than usual.

Below is the distribution of foals by price ranges.

Price Range Foals Average Maverage Price Index

$100-9,999 27,364 $3,895 38.95 0.24

$10,000-19,999 11,023 $13,576 115.76 0.71

$20,000-29,999 7,031 $23,130 151.72 0.93

$30,000-49,999 7,181 $37,067 191.91 1.18

$50,000-99,999 8,579 $68,435 268.83 1.65

$100,000-199,999 5,359 $134,410 364.81 2.24

$200,000-499,999 3,280 $273,136 518.55 3.18

$500,000-999,999 664 $650,369 794.85 4.87

$1,000,000+ 233 $2,087,339 1,105.62 6.78

Totals 70,714 $54,140 163.11 1.00

No surprises here. The main purpose of classifying the foals thusly was to see how the results from the individual price ranges stacked up against their prices. Results are listed below. APPPSW stands for average Performance Points per stakes winner and is a measure of the quality of stakes winners involved (618 being average).

Price Range Foals Stakes Winners % APPPSW PPI (Result)

$100-9,999 27,364 389 1.42 468 0.32

$10,000-19,999 11,023 305 2.77 502 0.66

$20,000-29,999 7,031 231 3.29 568 0.89

$30,000-49,999 7,181 345 4.80 566 1.29

$50,000-99,999 8,579 486 5.66 714 1.92

$100,000-199,999 5,359 324 6.05 734 2.11

$200,000-499,999 3,280 247 7.53 742 2.66

$500,000-999,999 664 61 9.19 654 2.86

$1,000,000+ 233 27 11.59 839 4.60

Totals 70,714 2,415 3.42 618 1.00

No surprises here either. As the price ranges increase, the percentage of stakes winners from foals also increases from 1.42% for the lowest group to 11.59% for the highest group. The average Performance Points per stakes winner also increases from 468 for the lowest group to 839 for the highest group (with some glitches along the way). Most importantly, the PPI (result) increases from 0.32 for the lowest group to 4.60 for the highest group. All of this merely confirms that the markets are more or less rational, at least in the macro sense (over a large body of data).

Now lets us compare prices with results.

Price Range Foals Price Index PPI (Result)

$100-9,999 27,364 0.24 0.32

$10,000-19,999 11,023 0.71 0.66

$20,000-29,999 7,031 0.93 0.89

$30,000-49,999 7,181 1.18 1.29

$50,000-99,999 8,579 1.65 1.92

$100,000-199,999 5,359 2.24 2.11

$200,000-499,999 3,280 3.18 2.66

$500,000-999,999 664 4.87 2.86

$1,000,000+ 233 6.78 4.60

Totals 70,714 1.00 1.00

The first thing I notice is that the lowest three groups (up to $29,999) are all below 1.00 for both prices and results. The highest six groups ($30,000+) are all above 1.00 for both prices and results. So $30,000 is the price at which you start to receive above-average (1.00+) results for your money.

The most interesting thing is how prices stack up versus results. The lowest three groups are not too far off (0.24 to 0.32, 0.71 to 0.66, and 0.93 to 0.89). The middle three groups are a bit more erratic (1.18 to 1.29, 1.65 to 1.92, and 2.24 to 2.11). The highest three groups start to show some significant separation between prices and results, with the former higher than the latter (3.18 to 2.66, 4.87 to 2.86, and 6.78 to 4.60).

At this point I got out my old college stats book and refreshed my memory on correlation and regression. I suspect that in order to do a proper statistical analysis of these data you would need to treat them as 70,714 individual pieces (one for each foal). That is obviously beyond my available computing resources.

Just for grins, though, I decided to treat them as only nine individual pieces of data (one for each price range involved). X equals price, Y equals result. I lined up the nine pairs of numbers (as shown above) and crunched some numbers and came up with an equation (Y’ = 0.47 + 0.6X) to describe the relationship between variables X and Y. (I hope I calculated correctly!!!!). The results are shown below.

X (Price) Y (Result) Y’ (Predicted Result From Equation)

0.24 0.32 0.61

0.71 0.66 0.90

0.93 0.89 1.03

1.18 1.29 1.18

1.65 1.92 1.46

2.24 2.11 1.81

3.18 2.66 2.38

4.87 2.86 3.39

6.78 4.60 4.54

I was basically interested to see if the regression equation did a better job of predicting results from prices than a simple assumption that prices equal results (X = Y). It did not in five of the nine price ranges examined. Only at the highest three ranges ($200,000+) did the equation do a better job of prediction than assuming that X equals Y.

At $30,000-49,999 both predicted 1.18, but the actual results were 1.29. So I have to call that a tie. In the other five ranges the assumption that X = Y did a better job of prediction than the regression equation.

X matched up with Y (assumption that X = Y) better than with Y’ (regression equation) in the lowest six ranges (except for $30,000-49,999, which was a tie, as mentioned above). At the highest three ranges (beginning with $2o0,000), however, X starts to overestimate Y pretty severely.

Therefore, I conclude that assuming X equals Y works just fine up to about $200,000, at which point the regression equation did a better job of estimating Y from X. Only 4,177 of the 70,714 foals sold for $200,000+. (An average of $200,000 corresponds to a Price Index of 2.74.)

So I do not feel too bad about that, especially considering that almost all of the subpopulations I have examined have Price Indexes (X values) much closer to 1.00 than to 2.74 (which corresponds to an average of $200,000) or higher. Almost all of the subpopulations I have examined contain a wide variety of prices, both high and low, and most of those subpopulations cluster around a Price Index of 1.00.

In terms of which price range produced the best results from its prices, I would have to say that the big winner is $50,000-99,999. It had a Price Index of 1.65 and a PPI (result) of 1.92. (And regression predicted only 1.46). Not bad at all.

I should caution you to take that result with a grain of salt, however. That price range contained the three best stakes winners of the entire group of 70,714 foals and 2,415 stakes winners (Zenyatta, Curlin, and English Channel). Without those three this range had a PPI of 1.72, still better than 1.65 but much closer to it.

And as for the $1,000,000+ group, the regression equation just about nailed it. It predicted a result of 4.54. The actual result was 4.60.

Of course it is also possible that if I had the computing power to calculate all this as 70,714 individual pieces of data, regression analysis might have yielded a better and more accurate equation. I am not sure about this point. Any professional mathematicians out there care to enlighten me??????

David

If you want to send me an excel sheet with the data for the 70,714 foals (I think that there may be a 65,000 limit on excel though) with their price and PPI I’d be happy to run it through our symbolic regression program and send you back the best model to describe the relationship. It uses cloud computing to generate it, so it shouldn’t take long to find a model with decent correlation.

Byron.

Thank you for the offer. I do not know if it would be worth the effort. I do not have Excel. I do have a spreadsheet function in word processing that is limited to 65,536 units. Do not know if that function has the same capabilities as Excel. Could send you the 2,415 stakes winners with their prices and Performance Points. The remaining 68,299 could be summarized by using the maverage for the whole group (38.95 for the 26,976 foals sold for $100-9,999 who were not stakes winners, for example). All non stakes winners would have a result (PPI) of zero. Would that work????? Would that be worth the effort????

I should add, it might be more interesting to add sex as a variable (1 for colts, 2 for fillies) as it could have an influence on the outcome given that high prices tend to be saturated with colts.

Byron,

Don’t bother responding to my last comment (unless you still want to). I won’t be needing to analyze 70,000+ pairs of data. The equation that satisfies almost all of my requirements is Y’ = .20 + 0.8X. Thanks for the input.