Adding error bars to averages

A batting average is a record of what a player has achieved. The fewer matches played, the less meaningful that average. I propose an approach to estimate the range of possible long term averages a player might have, based on their current average plus the number of innings played. I’ll talk about Mark Ramprakash too, just to keep things fun.

Imagine a new batsman on the scene. Assume we magically know they have the technique to average exactly 30. By using a Monte Carlo simulation, we can see the paths their average might take by chance.*

Fig 1 – Expected range of averages after 15/30/100 innings. Even after 30 innings a player’s data may not reflect their talent, with 1% averaging 20, and another 1% averaging 40- just through luck. Curves should be smooth but I only ran the simulation 150,000 times, causing some noise.

By embracing the uncertainty in averages we can better interpret them. Using the simulated data, we can estimate the uncertainty (standard deviation) of a player’s average once they have played a given number of games.**

Fig 2 – Standard deviation in batting average as a function of innings played and average. Note that 68% of results lie within one standard deviation of the mean, and 95% within two standard deviations. Also note the rapid reduction in uncertainty during the first 40 innings.

A standard deviation of 10 means a 95% chance the true average is within 20 runs of the observed average. That’s where we are for a top order batsman after 15 innings: it’s too early to use the data to conclude, though qualitative judgements on technique are possible.

There are four practical uses for this analysis:

1. Identifying players out of their depth

Mark Ramprakash averaged 53 in First Class but only half as much in Tests. We can now quantify how likely it is that he wasn’t going to cut it as a Test batsman: and thus propose an approach for when players should be dropped.

Ramprakash’s career: 671 dismissals in First Class means a negligible level of uncertainty in his FC abilities (albeit we could fit an age curve to this for a better estimate of his peak talent). 86 Test dismissals averaging 27.3 (that takes me back to the 1990s).

Let’s assume Tests were about 20% harder than 1990s First Class cricket. Ramps’ theoretical Test average would be 53 * 0.8 = 42.4. Now for the distribution of averages for a batsman with that skill level playing 86 innings:

Fig 3: Range of averages after 86 dismissals if a player’s true average were 42.4 (ie. the implied average Ramprakash would have had in Tests based on his FC record). Note that over the 2,000 iterations none came out with an average lower than Ramprakash’s 27.42.

Figure 3 tells us it’s almost certain Ramprakash wasn’t the same player in Tests.

But when should he have been dropped? He got a lot of chances. Assume the weakest specialist batsman averages 35. A player should be dropped when they are underperforming the reasonable range of scores that a batsman averaging 35 would produce.

Fig 4 – chance a player was unlucky vs dismissals. Note this is for a player averaging 27 who needs to average 35. The further their performance is from the target average, the faster it becomes clear that it’s not just bad luck.

We can see that after 86 dismissals there was a less than 2% chance Ramprakash was capable of averaging 35 in Tests***. Personally, once a player is down to a 15% chance of just having been unlucky, I’d be looking to drop them. That’s 25 innings averaging 27 or under. There’s some evidence that England are already thinking along these lines. Jennings got 31 Test innings, (averaging 25), Denly 26 (30), Compton 27 (29), Malan 26 (28), Hales 21 (27), Vince 22 (25), Stoneman 19 (28). Note that a stronger county or 50 over record should get a player more caps- as it increases the chance that early Test struggles were bad luck (after 30 Test innings Kallis averaged 29, he ended up averaging 55).

I’m intrigued by the possibilities this method presents – I’ll follow up at a later date by looking at promising 2nd XI players who have struggled in County Cricket, and assessing whether they deserve another shot, or they’ve probably had their chips.

2. Adding error bars to averages

Remember here where I analyzed county players by expected D1 average? I now have the tools to add error bars to those ratings. Back in September I rated Pope above Root. What I wasn’t able to do at that time was reflect the uncertainty in Pope’s ranking because he only had 42 completed innings. Will cover that in a future blog post (with two small children at home, it’s surprisingly difficult to find time for analysis). Spoiler alert – after that many innings, we can say Pope’s expected Division 1 average was 61 (+/-14).

3. Modelling

I can use the uncertainty in Pope’s rating when modelling match performance. Something like for each innings assigning him an expected average based on the distribution of possible averages he might end up with in the long term. That uncertainty will be one of the inputs in my next Test match model (along with Matchups, realistic bowling changes, impact of ball age).

There will be greater uncertainty in a player’s rating when they have just stepped up or down a level.

4. Matchups

We can move from the limited “Jones averages 31 against left handers” to the precise “Jones averages 31 +/- 12 against left handers”, just by taking into account the number of dismissals involved. The cricketing world can banish the cherry pickers and charlatans with this simple change, where stats come with error bars.

***

There’s plenty to chew on here. I’ve not found any similar analysis of cricket before – kindly drop me a line and tell me what you think.

* A bit more detail – my model assumes a geometric distribution of innings-by-innings scoring. With that, one can assign probabilities of all possible outcomes to each ball, then simulate an innings ball-by-ball. To see the spread of outcomes after 100 innings, I ran the simulation 150,000 times, then grouped scores into 1,500 batches of 100 innings. Previous discussion about the limitations of using the first 30 innings as a guide to future performance is here.

** This isn’t perfect. This method estimates the range of observed averages from a given level of ability. In the real world it’s the other way around. That’s a more complicated calculation.

*** Actually slightly better than 2%, because his first class record was so strong (114 hundreds). Ramprakash was an unusual case: there was an argument for him playing far fewer Tests, and an equally good one for him to have been managed better and picked more consistently. I did a twitter poll which was split down the middle on these two choices. Nobody thought stopping slightly earlier would have been the right choice.

First Day Blues – when multiple debutants struggled with the bat

44 Test players picked up a pair on debut. This article covers when a raft of new faces are introduced, and things don’t go to plan.

While looking at some proper analysis (“has professionalism seen an increase in the depth of batting lineups?”), I noticed the torrid time Pakistan Women had at the hands of Denmark in the 1997 Women’s World Cup. That inspired me to trawl through the records and see what we can learn from history.

This could be interpreted as being somewhat cruel – that’s not my intention. Just a bit of trivia, and the pleasure of hearing some new stories from scorecards of the past.

5. Sri Lanka vs Pakistan, 1994 Test. Pakistan won by an innings and 52 runs. Debutants scored 19-6. Average 3.2 runs per wicket.

In their defence, two of the three hapless debutants were batting at 10 and 11 (see here). Also Pakistan had Younis, Akram and Mushtaq Ahmed.

4. New Zealand vs Australia, 1946 Test. Australia won by an innings and 103 runs. Debutants scored 35-12. Average 2.9 runs per wicket.

Hard to be too critical as countries rebuilt after World War Two. New Zealand were outclassed, making just 96 runs in the match. Len Butterfield and Gordon Rowe bagged two of the 44 pairs mentioned above. 32 year old Butterfield went wicketless in his only Test, and final First Class match.

There were two silver linings. It was the only Test for Ces Burke (2-30) thus securing a career average of 15. Also, New Zealand didn’t stay in the doldrums for long: going on an unbeaten run of six draws after this defeat.

3. Turkey vs Luxembourg, 2019 T20I. Luxembourg won by 8 wickets. Debutants scored 21-10. Average 2.1 runs per wicket.

The Romania Cup in 2019 is best known for bringing Pavel Florin into the limelight. It also yielded this blowout – 21 runs off the bat, 28 all out. One boundary in 69 balls of T20, the top scorer made seven.

During the tournament Turkey were rolled for the three lowest T20I scores ever recorded. On two of these occasions they were bowled out in the first ten overs.

Luxembourg’s chase is on Youtube. Turkey look really raw – at 21:10 Serkan Kizilkaya takes a wicket while fine leg was sprinting to third man, having not noticed the single off the previous ball.

Let’s try to “take the positives”: Peshawar Zalmi of the Pakistan Super League hosted two of the Turkish team during the 2019 PSL, as part of a programme to support Turkey Developing Sports Branches Federation. One success was the development of 19-year-old Mehmat Sert, whose 42 runs were 31% of Turkey’s tally in the Romania Cup.

2. Pakistan Women vs Denmark Women, 1997 ODI. Denmark Women won by 8 wickets. Debutants scored 3-6. Average 0.5 runs per wicket.

This is my favourite of the five tales. Denmark Women, in the ’97 World Cup, beating Pakistan. There’s no writeup I can find, so crumbs from the scorecard will do:

Pakistan were inserted. From 58-4 when Asma Farzand was run out, the other five debutants contributed 0-5 from 19 balls as Susanne Neilsen and Janni Jonsson ran amok. Somehow (if Cricinfo is to be believed), Shazia Hassan managed to be LBW without facing a ball.

There were 29 extras in Pakistan’s 65 all out – 45% of the runs were sundries. Let me know if you can find a higher ratio in international adult Cricket.

Despite it being a limited overs game, Pakistan’s quickest scorer went at 28 runs per hundred balls.

1. Mali Women vs Rwanda Women, 2019 T20. Rwanda Women won by 10 wickets. Debutants scored 1-10. Average 0.1 runs per wicket.

The card: 1 0 0 0 0 0 0 0 0 0* 0

Rwanda truly turned the screw. Six wicket maidens and two maidens. They knocked the target off in four balls – just think of the net run rate.

Take pity on Margueritte Vumiliya – Rwanda’s opening bowler had figures of 3-3-0-2 and got pipped to the player of the match award.

I mentioned Turkey being bowled out twice in under 10 overs. The only other international side to manage that was Mali Women. Twice.

***

Just because New Zealand and Sri Lanka went on to become strong teams, doesn’t mean that Turkey or Mali Women will. Denmark Women folded in 1999. What did we learn from this? Nothing. In my excitement to say something about Denmark Women’s win in 1997, I’ve created a listicle.

Test cricket’s evolution and professionalism

Imagine a sport where only a handful of its best players participated full time. There would be an elite few head and shoulders above the rest, and a lot of weak players. That’s how the era of amateur cricket looks statistically.

Here I’ll demonstrate that the quantum leap in Test Cricket was the 1960s, with professionalism ensuring the brightest talent wasn’t lost to the game.

A 1950’s professional cricketer could earn twice what a manual labourer could.[1] A good wage, but sporting careers are short. There’s no way cricket was attracting all the talent that was out there. In 1963 British county cricket turned fully professional. I don’t know about the evolution in other countries, but it’s striking that in 1962 Richie Benaud was described as “a newspaper reporter by profession” when being recognised as one of Wisden’s Cricketers of the Year.

In the two decades after the Second World War, the depth of talent increased. We can see that in the distribution of batting averages:

Fig 1 – Top order Test averages. Min 10 Tests.

The 1960s distribution reflects a mature sport: lots of players of similar ability, a sprinkling of duffers, and few standing out from the crowd.

Contrast that with the 1930s – over a quarter of the players averaged over 50. Admittedly there were only 42 players that met the criteria, and averages were noisier because there were fewer Tests played then. Bradman’s average should be considered as a function of the era he played in: in the 1930s four others averaged over 65, nobody has achieved that in the last four decades.

There were far fewer batsmen averaging under 25 by the 1960s: this will be a function of a more talented player pool. Interestingly, this wasn’t driven by improving the batting of wicket-keepers: they averaged two runs per wicket less in the 1960s than the 1930s.

Here’s the trend year by year:

Fig 2 – “Mean absolute deviation” is a measure of the extent to which performances differ from the mean. The higher it is, the more outliers there were. While there is a lot of noise, the trend is of a reduction over time.

But what about all the developments since then- improvements in bats, coaching, and technique? These improve all players similarly, so don’t impact the mean absolute deviation. Thus, they aren’t detected by this technique: there will never be one number that says how high the standard of cricket was at a point in time.

For completeness, here’s the decade-by-decade view:

Fig 2 – “Mean absolute deviation” by decade. Top order batsmen, min 10 Tests.

The maturity of Test Cricket was complete by the 1960s. Note that there wasn’t significant impact from the addition of Test teams through the years: indicating sides were generally added when ready (some would say we waited too long).

Professionalism swelled the ranks of the most talented. What we don’t know is the proportion of the high potential players that ever play cricket: could Rooney have been better than Root?

The logical extension to this maturity analysis would be to look at T20 and/or women’s cricket. Let me know if you’d find this interesting.

***

P.S. while researching this piece, a story from the late David Sheppard about the social division between amateurs and professionals (like Tom Graveney) caught my eye…

When I was at Cambridge we played against Gloucestershire at Bristol. I had made some runs, and, as we came off the field, Tom Graveney, with whom I had made friends in 2nd XI matches said, “Well played, David.” A few minutes later the Gloucestershire captain walked into our dressing-room and came over to me. “I’m terribly sorry about Graveney’s impertinence,” he said. “I think you’ll find it won’t happen again”.[2]

[1] Rain Stops Play, Andrew Hignell

[2] Amateurs and professionals in post-war British sport, edited by Dilwyn Porter & Adrian Smith

How to win a Super Over

I had a piece published in Vox Cricket’s first issue.

Suggest you read the full article there. In case you don’t fancy clicking, here are the key drivers to Super Over success…

  1. Score at least twelve runs if batting first
  1. Pick a set batsman if batting first
  1. Don’t let the number three play it safe if batting first
  1. Stay calm when chasing
  2. Pick a bowler to trouble their opening batsmen
  3. Put the best batsman on strike for the first ball
  4. Plan for a second super over

That’s seven factors without even considering lines, lengths, field placings or shot selection. Super Overs might look like a six ball thrashabout – but there are subtle forces at play.

Women’s T20 World Cup – Rating the teams

Now that every team (bar Pakistan) have played, I can use the batting and bowling records of each starting XI to paint a picture of what we can expect to happen in the group stages.

This is a quick and dirty piece of analysis – I’ve only used ODI and T20I data between the top nine teams. Scarcity of T20I data meant ODI was used as a proxy – scaling down the averages by 76% and increasing the strike rates by 147%. Time will tell how good this method is.

Somehow watching sport without understanding context and probabilities no longer satisfies me – I want to know what is happening, and to do that data is required. Hence this piece.

The below chart ranks batting strength on the x-axis (expected runs on an average pitch against an average attack). The y-axis is the same but for runs conceded. The ideal team would be in the bottom right of the chart.

The big three stand out: Australia, New Zealand and England. These are consistent with the ICC rankings.

Let’s look at the groups.

Group A is marginally stronger. Despite beating Australia, India aren’t all that hot at batting – remove Shafali Verma early and the rest of the order are unlikely to score at much over a run a ball. Both India’s wins have come after Verma set a platform. Bangladesh have what is on paper an economical bowling attack, though having slipped up against India, they’ll have a tall order containing Australia and New Zealand.

Current expectation is that two of Australia, New Zealand and India should go through. Australia vs New Zealand on 2nd March is the final game of the group, and is likely to decide both who goes through and the position they go through in.

Group B is more clear cut. England lost to South Africa, which was seen as something of an upset, though player data indicates the sides are fairly well matched.

Aside from Chloe Tryon, South Africa aren’t an explosive batting unit. What they have in their favour is that they are dependable. Strong averages down the order mean they will rarely get rolled. That should be good enough to get them three wins out of four and into the semi finals. Note that the women’s version of T20 cricket is subtly different – with lower averages, teams are at much greater risk of being bowled out: so the averages of the lower middle order matter.

England are a similar proposition to South Africa – no stars with the bat, yet a top eight who should all yield more than a run a ball. Hard to see anyone other than England and South Africa progressing.

Being frank, West Indies and Pakistan are holed below the water line once three wickets are down. Look out for them wasting good starts.

Wrapping up, it’s hard to look past the big three teams. Still, South Africa at odds of 14-1 look tempting since I’d expect them to be the fourth semi-finalist. (Odds as at 25th Feb).

What can we learn from England Lions tours?

England Lions have three First Class games in Australia, starting on 15 February. Here I look at the merit of a Lions tour, and what we can learn from them. I’ll start by busting a couple of common myths, and then consider the current squad.

Myth 1: Lions performances as auditions before Test squad selection

Only a genius or a fool would pick a player based on one performance. There are very few geniuses.

Consider the stand out performances in Lions history: 11 players have registered 150+ scores, including Michael Yardy, Chris Read and Eoin Morgan. These gentlemen didn’t have the batting to thrive in Test Cricket – so don’t assume that anyone getting a daddy hundred is the next Ollie Pope.

Nobody should be given the message that a big red ball score on tour will secure them a Test place. Even if that’s what might happen…

Myth 2: Lions performances are a predictor of Test batting success

There are very few Lions First Class matches. Just nine players have more than ten games under their belts. That means Lions averages are poor indicators of Test average. Look how scattered the plot is:

Appendix 1 – Lions vs Test batting average for players dismissed ten or more times in each

The below charts show how First Class records are more reliable for Test selection than Lions stats.

Appendix 2 – FC vs Test average, same players as Appendix 1.

Without ratings for every player the Lions come up against, any batting data is of limited use. Ideally one would combine Lions data and county stats to enrich the picture we have of a player.

A healthy average over three games doesn’t tell you a lot. A few fifties in a Lions tour doesn’t necessarily make a “horses for courses” selection next time England tour there. No matter how tempting.

Not entirely myth: Lions performances as a predictor of Test bowling success

We’ve seen that you can judge a bowler on fewer matches than a batsman.

Lions stats are almost as good at predicting Test performances as First Class Cricket (Appendix A). Consider those with 15+ Lions and Test wickets: the two highest averaging Lions found Test Cricket the hardest (Plunkett and Rashid). However, they also averaged over 30 in First Class, so they’d be expected to struggle in Tests even if we didn’t have Lions data to go on.

OK MR SMARTYPANTS … WHAT IS THE POINT OF LIONS TOURS? OTHER THAN RATING BOWLERS, A BIT.

There are five reasons I can think of to include someone in a Lions First Class squad. Three are in the following table; the other two are “getting a good look at players outside the county 1st XI” and “keeping on some white ball players that are around anyway”.

Let’s look at the red ball squad:

Lewis Gregory doesn’t tick any of the boxes, as captain I think his role is slightly different: a senior player, just falling short of the Test squad. It may be harsh to label the three “White Ball Player Kept On” players as such – Abell and Kohler-Cadmore are better than some players in the original squad. Still, that’s been their route to selection – being retained as many of the group will leave early to join the Test squad.

The Lions XI that takes the field against a Cricket Australia XI on February 15th will reveal something about the ECB’s goals.

The strongest XI? If played in English conditions it would be Sibley, Jennings, Northeast, Kohler-Cadmore, Lawrence, Bracey+, Gregory (c), Bess, OE Robinson, Overton, Gleeson.

***

A final word on a similar theme- the recent U19 World Cup has drawn a few optimistic predictions about players based on a handful of games. Note that only two players have scored three centuries in U19 World Cups. Shikhar Dhawan is a success story, yet Jack Burnham still a work in progress.

***

Appendix 1 – Correlations with Test performance

Appendix 3 – Lions vs Test bowling average for players who took 15 or more times at each level.
Appendix 4 – FC vs Test average, same players as Appendix 3.

Appendix 2 – Kookaburra red ball blues

In a recent Cricinfo article, Saqib Mahmood described the challenges of switching from the Dukes ball in England to the Kookaburra ball overseas. I was hoping for some predictive data- showing that some players struggle with the Kookaburra ball with the Lions, and those struggles would continue at Test level. The results are inconclusive.

Bowlers: Things Change

Writing last week about the careers of batsmen and the predictive power of their early performances, I glossed over something important. Batsmen don’t get magically better as they play more Tests.* Which supports my hypothesis that there is no benefit in giving a batsman experience at Test level. A batsman has a level of ability, which is revealed in Test Cricket as they play more games. It’s thus easy to model a batsman’s expected scores.**

What about bowlers?

Fig 1- Average after 25 wickets (x-axis) plotted against rest of career average for the 50 highest wicket taking Test bowlers since 2000

Noisy, as expected. This is (on average) after only seven Tests. Let’s skip forward to when they have 100 wickets:

With 100 wickets players are well into their careers – yet there’s still no consistent pattern. I’m going to split these 50 players into two groups now: the main sequence, who behave nicely and whose past performance is a good guide to future success, and the others.

Here’s the 32 well behaved players:

For two thirds of the players, once they have 100 wickets their future is neatly mapped out, and you can approximate that they’ll play at that level until they get dropped. What about the others?

Let’s reveal who these miscreants are. Amazing how career averages can gloss over being rubbish to start with (Flintoff) or how the mighty fell (Harmison).

Fig 5 – Outliers – bowlers whose first 100 wickets this century were not predictive of future performance. “Wickets” and “Career” columns refer to post-2000 only.

Test career average is no good to measure these players. And they make up one third of the bowlers I’ve looked at. Crumbs, my models have been wrong all this time to use Test career average to measure current skill levels.

What causes this? Many possibilities: injury; being a late bloomer; switching from batting all rounder to bowling all rounder; getting “found out” as opponents learn your varieties and batsmen adapt.

How can we identify these players in advance? How do you know for sure who is now better or worse than their career average? With a spreadsheet, you won’t know. That’s a problem for me, because that’s all I’ve got. If you can read technique and separate the irrelevant detail from the significant change, then maybe. Perhaps there should be a “days since last ran” metric, like in horse racing, and anyone returning from a long layoff should be treated as a different player.

If we can’t identify the outliers, how can we rank every player accurately with one methodology? The good news – unlike batsmen, bowlers yield more data per match because they take lots of wickets per game. Whereas for a batsman we would use Difficulty-Adjusted-Career-Average, for a bowler we can use Difficulty-Adjusted-Last-Four-Year-Average, or similar.

Here’s the predictive power of more recent data. It may not look much better to the eye, but mathematically this is a better fit:

Fig 6 – Bowling averages for players with >100 red ball wickets in 2016-18 and >30 County Championship wickets in 2019. A strong correlation given the limited test period (2019 being only 14 matches maximum).

What have we learned? We should predict bowling performances based on what they have achieved recently – because for about a third of players their career average has limited predictive power. That means my model should pick up last four year performances, if too little data it should instead use career records.

* The line of best fit when plotting past vs future averages is a straight line that almost passes through zero.

**You also need to adjust for the age curve – batsmen get better as they get older, then drop off in their mid thirties. Also there will be the odd outlier (Ramprakash and Hick, for example, never made it at Test level), though examples of players with abnormal records after 50 Tests are likely to be rare.

How many innings before we can accurately predict T20I Strike Rate?

Last time I looked at how long it takes for averages to mean something. Thought I’d try the same analysis for 20-20 Strike Rates. How long before a player’s 20-20 SR is a fair representation of that player?

Play for long enough and a batsman’s Strike Rate reflects their ability. However, in the early stages their career Strike Rate will be volatile as the sample is small. One significant factor is the impact of average on Strike Rate: most innings accelerate as they go on, so one big score early on will give a player a temporarily favourable career SR.

The below chart shows T20I Strike Rates for all players with 60+ Innings since 2009, split by their first ten, twenty or thirty innings (x axis) and then subsequent innings (y axis). Note that the acceleration in T20 scoring in recent years means most players scored faster in their later innings.

Consider the players who had a SR of 130 in their first 30 innings: one (Dilshan) stuttered and struck at 114 afterwards. Another (Nabi) scored at 156 per hundred balls in subsequent T20Is. If you have a player that has scored at eight an over in their first 30 innings, you may only know that they’ll score at between seven and nine per over from then on. Not very insightful.

Tom Banton has a T20 SR of 160 after 25 dismissals. That’s too few innings to be confident in him maintaining that scoring rate, but enough to say he’s probably a 140+ SR batsman.

Another recent example comes from Dawid Malan:

I don’t know what else I can do to break into the team for the T20 World Cup. I don’t know how you can be under pressure with an average over 57 and a strike rate over 150

Dawid Malan, Sky Sports Cricket Blog

Malan has done very well in his nine T20Is. Yet that tells us little about how we would expect him to perform in the future. Fortunately, T20 players get a lot of stamps in their passports- Malan scored at 145 per 100 balls in the Banglasdesh Premier League and 148 in the most recent Blast. It’s just a case of doing the legwork to calculate an expected Strike Rate at international level. I’ll leave it to the T20 experts to work out whether Malan is worth a spot in the World Cup squad.

Of England’s current players, only Roy and Morgan have more than 30 completed innings in T20Is. There’s insufficient international data. Yet most batsmen have played over 100 innings in T20 leagues – plenty to have a good read on them.

Summing up, there’s too few T20Is to use them to set expected average/strike rate in later T20Is. Far better to set this expectation based on club stats, adjusted for difficulty. There’s even enough data to weight analysis towards more recent performances. Also, beware small sample sizes: even 30 completed innings are too few. Anything under 100 innings and you should apply some judgement to the data.

On the limitations of averages

Averages are the currency of red ball cricket. We know they get misused (eg. after just a handful of games Ben Foakes averages 41) and when abused they have little predictive power. What I hadn’t realised is just how limited averages are: we almost never have a satisfactory sample size for someone’s average to be the definitive measure of their ability.

Number of innings before you can rely on an average

We can all agree that averages after a couple of innings are of very little value. By “value” I mean predictive power: what does it tell you about what will happen next?

Ben Foakes averaging 42 after five Tests doesn’t mean a lot. But how about Keaton Jennings averaging 25 after 17 Tests?

The below charts show the limitations of averages by comparing them after 10/20/30 Tests (x-axis) with those players’ averages for the rest of their careers (y-axis). The sample is players since 2000 who played more than 70 Tests.

It’s quite striking how dispersed the data is. Not just the 10 Test version (Stuart Broad averaged more than Steve Smith), but even over a longer horizon: Michael Vaughan averaged 53 in his first 30 Tests of this century, then 36 in his last 50 Tests (32% less).

Modelling and True Averages

Sports models are often positively described as “simulating the game 10,000 times”. This isn’t just to make the model sound authoritative, it can take that many simulations to get an answer not influenced by the laws of chance. When I look at an innings in-running, balancing speed against accuracy, I’ll run at least a thousand simulations – any fewer and the sample size will impact results. An example from today – Asad Shafiq’s expected first innings average was 55, yet a 1,000 iteration run of the model gave his average as 54.3. Close, but not perfect.

Shouldn’t it be the same with averages? If we don’t have a thousand innings, lady luck will have played a part. We never have a thousand innings.

Looking at modelled data, I find that after 35 innings (c. 20 Tests), there is still a one-in-five chance that someone’s average differs by more than 20% from what they would average in the long term. A batsman that would average 40 in the long run could, through bad luck, average 32 after 20 Tests.

Fig 2 – Theoretical evolution of average and how it converges with true average (based on Red Ball Data model).

Sir Donald Bradman had a 99.94 average at the end of his career (70 completed innings). There’s a c.40% chance his average would have been +/- 10% if he had played enough innings for his average to have been a true reflection of his ability. We don’t know how good Bradman was*.

Implications

  • Don’t blindly slice & dice averages – they’ll tell you a story that isn’t true. Yes, if you have a mechanism to test (eg. Ross Taylor before and after eye surgery), there might be a real story. But just picking a random cutoff will mean you misread noise as signal (Virat Kohli averaged 44 up to Sept 2016, but 70 since then).
  • Use age adjusted career averages as a best view of future performance.
  • First Class data has to be a factor in judging Test batsmen, even when they have played 30 Tests. Kane Williamson averaged just 30 in his first 20 Tests. Credit to the New Zealand selectors for persevering.
  • There has to be a better metric than batting average. Times dismissed vs Expected Wickets times (Strike Rate / Mean Strike Rate) is one that I’d expect to become commonplace in future. Another might be control % in the nets. Yes, I went there: I believe there is some merit in the “he’s hitting it nicely in the nets” line of reasoning.

This analysis can be repeated for 20-20 – I’ll cover that in my next post.

Further reading

Owen Benton already covered the modelled side of averages here. His found an 80% chance that a batsman’s average is within 20% of their true average after 50 innings, which is in line with my modelling. His approach is rather practical: what’s the chance an inferior batsman has the better average after x innings?

*Factor in Bradman’s 295 completed First Class innings at an average of 95 and we can get precision on how good he was. But that sentence would lack punch, and this blog’s barely readable at the best of times.

Best bowlers of the last 50 years

During a rain delay at Johannesburg last week, the radio commentators were putting an all-time England XI together. The usual arguments ensued: how can you compare players across eras? Is bowling average the sole measure? While looking at something quite unrelated, I realised I’d stumbled upon a new way of comparing players which is perfect for this question.

The metric is “percentage impact on batsman’s average”. For instance, batsmen generally scored 31% below their average facing Malcolm Marshall, making him the best Test bowler of the last 50 years.

Here’s the bowlers since 1970 with at least 150 wickets at under 25 apiece, ranked by their impact on a batsman’s average:

Fig 1 – Impact on Batsman’s average, leading bowlers of the last 50 years in Tests. Note that these have been adjusted to reflect players like Ravi Jadeja who has mostly played at home, mainly when conditions favour a second spinner.

There are four other players whose average flatters them, where Impact on Batsman’s Average is a better metric. Joel Garner picked up 92 of his 259 wickets against a mixed England team. Muttiah Muralitharan and Waqar Younis benefited from a disproportionate number of games against Bangladesh, Zimbabwe and (in Younis’ case) Sri Lanka. Wasim Akram is the hardest to explain: 38% of his wickets were against batsmen with career averages under 20 (a 25% figure would be more normal).

Did you spot Vernon Philander muscle in at fourth on the list? A phenomenal bowler. His average (and Impact on Batsman’s average) may be boosted by favourable conditions where he happened to play most of his away games: England, Australia and New Zealand. Still, I won’t fudge the numbers: he has a brilliant record and South Africa will miss him.

Here’s a comparison of Philander and Muralitharan

Fig 2 – Philander was better than Muralitharan.

Contextual Averages

England have been defending Joe Denly’s average (30) lately by saying that his performances are better than they appear because of the conditions he has played in.

This piece supports that approach: Marshall and Garner had the same bowling average, but Marshall was 10% better than Garner. If averages can mask that kind of difference over a whole career, imagine how skewed an average could be after ten Tests.

Further Reading

ICC’s all time rankings. The ICC have listed players according to their peak performances, while I have used their career. Consider Akram – his average puts him fourteenth on the list, but accounting for who he dismissed the ICC rankings take him all the way down to 76th. That supports my calculations that he had a -11% impact on batsmen’s averages.