Tuesday, February 2, 2010

Projecting career rushing yardage

A blog post by a Dallas Morning News reporter in December claimed that LaDainain Tomlinson "remains on pace to match" Emmitt Smith's all-time rushing record, but anyone who watched Tomlinson trudge along at 3.27 yards per carry this season knows that Tomlinson's chances of approaching Smith's record are almost nil. Bill Nichols' claim was based on the fact that Tomlinson had played 137 career games and had 12,321 career rushing yards, which was just 26 yards fewer than Smith had through the same number of games. Of course, Nichols ignored the two most important factors that go into determining the likelihood of a player ever reaching Smith's total of 18,355 yards, those being the player's age and his recent level of production.

Smith's 137th career game came when he was 29 years, 205 days old. Tomlinson's came when he was 30 years, 166 days old — essentially a full year older. And after 137 games, the younger Smith was far more productive than the older Tomlinson. The 29-year-old Smith rushed for 4.2 yards per carry and 1,332 yards, including seven 100-yard games. The 30-year-old Tomlinson rushed for just 730 yards and has gone 22 straight games without reaching the 100-yard mark. Smith at 30 also was far more productive than Tomlinson at 30, rushing for 1,397 yards at 4.2 yards per carry, with nine 100-yard games despite missing one game and having just one carry in another game. Smith, of course, remained productive for several years after that, even surpassing at age 35 what Tomlinson produced at age 30.

As the example of Smith and Tomlinson shows, being nearly equal in games played and career rushing yards does not make two running backs equal at that stage of their careers. A similar example is that of Steven Jackson and Priest Holmes. Jackson has 6,707 rushing yards after 84 career games, which is almost equal to Holmes' 6,692 after 87 career games. Jackson just completed a 1,416-yard season. Holmes after 87 games had just completed a 1,420-yard season. They were nearly equal in career games, career yards and current level of production. The difference, though, is their age. Jackson is 26 years old. Holmes, who was a rookie at age 26, was 30 years old after 87 games. He would go on to rush for just 1,480 more yards in his career.

A running back's career rushing yardage can be projected based on the two factors I mentioned earlier: age and recent level of production. Doug Drinen at pro-football-reference.com once attempted to do this by using a variation of Bill James' Favorite Toy, a method used to project career statistics for baseball players. The Favorite Toy calculates a player's remaining seasons based on his age and calculates his established level of production based on his previous three seasons. Those figures are then used to project his totals for future seasons, which can be added to his previous totals to project his career totals. Realizing that a running back's production would decline more quickly than a baseball player's production, Drinen made a slight adjustment to James' formula, changing one value from 0.7 to 0.6. Testing Drinen's formula by comparing its projections to the actual results for 20 running backs who had completed their careers since the mid-1980s, I found that their projected career totals using his formula were too low early in their careers and too high later in the careers. It became apparent to me that the shape of a running back's career doesn't match what is projected by the formula Drinen used.

The formula that James and Drinen used to determine a player's established level of production is this: three times the most recent season's total, plus two times the previous season's total, plus the total from the season before that, all divided by six. (Or, put another way, half of the most recent season's total, plus one-third of the previous season's total, plus one-sixth of the total from the season before that.) Drinen then calculated the player's remaining seasons by subtracting 70 percent of a player's age from 24. For example, a 27-year-old running back would be said to have 5.1 seasons remaining, because 70 percent of 27 is 18.9. (At the time of Drinen's post, James calculated baseball players' remaining seasons by using 24 minus 60 percent of the players age, but he since has changed his formula to use 21 minus half of the player's age, with a minimum of 1.5 seasons remaining.) The established level of production times the number of remaining seasons would then give the projected totals in future seasons. Adding that total to the player's actual total so far would yield the player's projected career total.

In my attempts to find a more accurate projection, it became apparent to me that, at least for running backs, a player's "remaining seasons" should not be considered to be the specific number of seasons that he will yet play. Rather, it merely is a number equivalent to his number of remaining seasons at his established level of production. In other words, a 28-year-old running back whose established level of production is 1,000 yards and who is projected to have four seasons remaining isn't necessarily projected to rush for 1,000 yards each season at ages 28, 29, 30 and 31, then end his career. Rather, he simply is projected to rush for 4,000 more yards in his career, which could mean rushing for 900 yards at age 28, then 800 at 29, 700 at 30, 600 at 31, 500 at 32 and 500 at age 33, or some other comination of ages and production that would equal a total of 4,000 yards. After moving past the idea that a player's career has a predetermined end point, and thus a certain number of "remaining seasons" based on a formula, I realized that it is not necessary to project how many more seasons a player will play. Rather, it's necessary only to project how many more total yards he will gain. That might seem obvious now, but it greatly simplified the exercise of coming up with a more accurate way to project the career yardage for running backs.

What I did next was to take the list of running backs with more than 9,000 career rushing yards, then choose those who had ended their careers most recently. I eliminated three players — Barry Sanders, Warrick Dunn and Tiki Barber — for several reasons. Sanders walked away from the game too soon, so using his numbers to help project career totals would skew the results. Dunn never carried the ball more than 286 times in a season and averaged just 222 carries per season. My primary goal was to project career totals for running backs who might challenge Smith's record, and a player with that few carries wouldn't have a chance. Using Dunn's numbers might not have affected the results significantly, but I saw no reason to include him. I excluded Barber for reasons that were a combination of those for excluding Dunn and Sanders. Barber averaged just 126 carries in his first five seasons, then retired at age 31 after rushing for 1,662 yards in a season. No player with so few carries in his first five seasons or who retires early will challenge Smith's record, so I saw no reason to include Barber, either.

That left me with nine of the top running backs whose careers had ended recently. I decided to include the No. 2 all-time rusher as well, although I adjusted Walter Payton's 1982 yardage to account for the strike-shortened, nine-game season. That left me with these 10 running backs —

Name Career yards
Emmitt Smith 18,355
Walter Payton 16,726
Curtis Martin 14,101
Jerome Bettis 13,662
Marshall Faulk 12,279
Thurman Thomas 12,074
Corey Dillon 11,241
Rickey Watters 10,643
Eddie George 10,441
Shaun Alexander 9,453

For those 10 running backs, I calculated their established performance level at each age (starting at 24) and the total yardage they gained after each age, then determined the average mathematical relationship between those numbers for each age. By dividing their average remaining career yardage by their average established performance level at each age, I arrived at a more realistic equivalent of James' and Drinen's "remaining seasons." For now, I'll just call it the "age coefficient." I used some smoothing of the numbers starting after age 31, because players' careers were ending and the sample size was getting smaller.

Here is the age coefficient I ended up with for each age, along with the "remaining seasons" projected by Drinen's formula (note that "after age" refers to the remaining seasons after the completion of the season in which the player was that age on Dec. 31) —

After age 24 8.05 6.5
After age 25 6.39 5.8
After age 26 4.84 5.1
After age 27 3.90 4.4
After age 28 3.18 3.7
After age 29 2.54 3.0
After age 30 1.52 2.3
After age 31 .96 1.6
After age 32 .83 0.9
After age 33 .72 0.2
After age 34 .62 ??

These numbers confirm my initial observation that Drinen's formula underestimated career totals early in a player's career and overestimated them later in their careers, although it again underestimates them after age 33. Perhaps Drinen set a minimum number of remaining seasons, as James did, but he did not mention it in his blog post. Comparing each method to the actual career yardage for various running backs from recent seasons, my method was consistently closer to the actual results. So, although the numbers could be adjusted if necessary, they seem for now to be a pretty good method of projecting career rushing yardage.

Going back to Tomlinson, he has 12,490 rushing yards after his season at age 30. Based on his past three seasons, his current level of production is 980.67 yards. Using an age coefficient of 1.52, he is projected to rush for just 1,491 more yards in his career, which would leave him at 13,981 yards, good for merely fifth on the all-time list.

Another player who some people think could make a run at Smith's record is Adrian Peterson, who has 4,484 yards after age 24. His current production level of 1,501.67 yards multiplied by the age coefficient of 8.05 would give him 12,088 future yards for a total of 16,572 yards, which would leave him third all-time. As impressive as Peterson has been, he hasn't been able to match Smith thus far in his career. At the same age, Smith had rushed for 1,215 more yards than Peterson and had a production level of about 73 more yards in a season. Smith's projected career total at Peterson's age would have been 18,216 yards, which is just 139 yards less than his actual total of 18,355 yards.

Peterson's chances of making a run at Smith's record might be slim, but they're better than the chances of any other current player. Here's a look at the active running backs who are projected to finish with more than 10,000 career yards, along how far they are behind Smith at the same age —

Name Age on Dec. 31 Current total Smith's total at same age Deficit at same age
Adrian Peterson244,4845,6991,215
LaDainian Tomlinson3012,49013,9631,473
Clinton Portis289,69611,2341,538
Maurice Jones-Drew243,9245,6991,775
Steven Jackson266,7078,9562,249
Chris Johnson243,2345,6992,465
Edgerrin James3112,24615,1662,920
Jamal Lewis3010,60713,9633,356
Frank Gore265,5618,9563,395
Matt Forte242,1675,6993,532
DeAngelo Williams263,8508,9565,106
Fred Taylor3311,54017,1625,622
Thomas Jones319,21715,1665,949

And here are their projected totals —

Name Age on Dec. 31 Current level AC Projected yards Projected career total
Adrian Peterson 241,501.67 8.05 12,088 16,572
Chris Johnson 241,617.0* 8.05 13,017 16,251
LaDainian Tomlinson 30

980.67 1.52 1,491 13,981
Clinton Portis 28953 3.18 3,031 12,727
Edgerrin James 31437.5 0.96 420 12,666
Maurice Jones-Drew 241,098.17 8.05 8,840 12,764
Steven Jackson 261,222.33 4.84 5,916 12,623
Fred Taylor 33520.17 0.72 375 11,915
Jamal Lewis 30801.33 1.52 1,218 11,825
Frank Gore 261,089.00 4.84 5,271 10,832
Matt Forte 241,083.5* 8.05 8,722 10,889
Thomas Jones 311,324.83 0.96 1,272 10,489
DeAngelo Williams 261,183.00 4.84 5,7269,576

*—For Chris Johnson and Matt Forte, each of whom have played only two seasons, I calculated their current level as if they had rushed for the same yardage in 2007 as they did as rookies in 2008. The result, then, is equal to their career average.

One other active running back I should mention is Jonathan Stewart, whose 1,969 career yards at age 22 are just 531 behind Smith's 2,500 at the same age. Given that Stewart shares carries with DeAngelo Williams, he's unlikely to challenge Smith's record unless Williams suffers a major injury or one of the two players changes teams soon. The age coefficients that I calculated begin after age 24, by which age most running backs have played at least three seasons, so I can't project Stewart's career total using that method. I can, however, project a total using Smith's career path. At age 22, Smith had a production level (calculated the same way I did for Chris Johnson and Matt Forte) of 1,250 yards. In the remainder of his career, he rushed for 15,855 yards, which would give him an age coefficient of 12.68. Multiplying that by Stewart's current level of production (984.5) gives him 12,483 projected yards for a projected career total of 14,452 yards. It's highly doubtful that Stewart will be able to match Smith's career path, especially given that Stewart currently is a backup, but even doing so would leave him almost 4,000 yards behind Smith's record.

So, exactly what will it take for a running back to challenge Smith's record, short of the NFL expanding the regular season? Most likely, it will take a player who entered the NFL by age 21, as Smith did. Smith essentially has a head start over any back who is a rookie at age 22 or 23. It obviously also will take a player who is among the most productive backs in history during his prime. Smith remains the most productive back in history from ages 22 to 27 — a period of his career that has nothing to do with entering the NFL early or his extraordinary longevity. And it almost certainly will take a player who can match or surpass that longevity. Smith holds the record for rushing yards at 28 and older by more than 1,000 over Walter Payton and by more than 1,700 over every other back who had a heavy workload before age 28. And Smith had 14 seasons with at least 240 rushing attempts and 900 rushing yards; no other player has had more than 10 of either one.

Finally, what would have happened if Barry Sanders had not retired after age 30? Based on this method of projecting yardage, he would have gained 2,567 more yards in his career (a production level of 1,688.67 times an age coefficient of 1.52). That would have given him 17,836 career yards, which is 519 fewer than Smith's record. Sanders would have had to exceed his projected longevity and production in order to finish with more yards than Smith's career total of 18,355. It's possible that he could have, if only he would have had the desire to keep playing, but it's not something we can say was statistically likely to happen.

1 comment:

Savoian said...

Dear Adam. I have your site listed as a "Favorite" and you never cease to deliver. This analysis is brilliant and I look forward to reading more of your fantastic articles. Shant in Los Angeles.