Kohli’s ODI run ranges are as expected for a phenomenal batsman

Clive (@vanillawallah) was looking at Kohli’s scores in ODIs since the last World Cup, suggesting that:

  1. Kohli is consistent
  2. He succeeds more than he failures

To check this, I compared Kohli’s performances against what my model would expect him to do – Kohli’s run ranges are broadly in line with what you would expect given his average. His consistency is a consequence of his ability, rather than a specific trait of his batting.

I modelled 1,000 innings for Kohli batting at 3 for India, with an assumed average of 95 (his average over the last 54 games / 3 ½ years).

The results show slightly more single figure scores in the real world vs model, offsetting slightly fewer scores in the teens. This is likely due to small sample sizes.

Two interesting observations:

  1. In a quarter of innings he would (and did) score a hundred. Phenomenal.
  2. The run distribution is skewed towards the 30-50 range by Kohli running out of time – caused by India successfully chasing down targets and the match ending while he is mid-innings.

Rest of the Top 3

Clive also pulled in data on all other top 3 ODI batsmen since the last World Cup. This is a much larger sample size- and worth checking the distribution as a way of verifying my modelling.

Simulating 1,000 innings with two openers: one of whom averages 35, one of whom averages 45 reasonably reflects the real world distribution of scores that Clive showed.

Two exceptions:

– The real world having more low scores (probably from the times when weaker openers have been selected)

– More hundreds modelled than seen.

P.S. Appreciate this is White Ball ODI Cricket rather than Red Ball Data. Don’t tell the Branding Police.

England’s current Test batting, in the context of the last 20 years

In this post I consider the evolution of England’s batting – how it steadily improved through the 2000s, peaked in 2010-11 (as England became World Number 1), tailed off from 2013, and is only recently recovering.

The Data

I took the career averages of the top 7 batsmen for each England Test since 2000, and adjusted them for the age of the batsmen (I’ll cover how I do that in a later post). To eliminate artificially low results, Nightwatchmen are excluded. Where someone only played a few Tests, I made a judgement about what their long term average would have been had they played more Tests.

To bring out the trend, the chart above is smoothed with a moving average of the last five Tests.

Evolution: 2000 to 2019

Weakest Team: 29th June 2000, vs West Indies (Home) – Age adjusted Average 218

Atherton, Ramprakash, Vaughan, Hick, Stewart, Knight, White

If you don’t want to remember how bad England used to be, I suggest you skip to the next paragraph. Don’t worry, we’ll be talking about 2005 soon enough.

Let’s reel off why this team was the weakest this century: England had no batsmen in the top 10. Over five tests the highest total England could manage was 303. Only Trescothick and Atherton averaged over 29. Ramprakash and Hick never settled at Test level. By 2000 Hick was 34 and Stewart was 35. The weakest link was White – averaging 25 in 30 Tests is not enough for a number 7. The need for “the next Botham” was real.

That England won that series 3-1 was down to Cork, Gough, Caddick and White dominating with the ball rather than England imposing themselves with the bat.

England vs West Indies in 2000 marked a watershed for West Indies cricket: this was their first series defeat in England since 1969. Their record in Tests in England since is W1 D2 L13. England were on the up.

2005 Ashes Winners – Age adjusted Average 268

Trescothick, Strauss, Vaughan, Bell, Pietersen, Flintoff, Jones.

That’s more like it. Five of the fifteen England batsmen in the modern era to average over 40. This was a good batting side (rather than a great one), with room for improvement at 6 and 7. Flintoff was a proper all-rounder – a luxury England had not had for a long time. He was an all-rounder with a career batting average of 32 however. Geriant Jones was carried that summer, averaging 25 (in line with his career average of 24).

Strongest Team: 26th December 2010, vs Australia (Away) – Total Average 305

Strauss, Cook, Trott, Pietersen, Collingwood, Bell, Prior

It’s too soon for many to realise just how good this team was: World no.1 from August 2011 to August 2012, with a strong enough four man bowling attack to confidently play six specialist batsmen.

In the 2010-11 Ashes series six of the top 7 averaged over 40; they accrued nine hundreds in only five tests.

But the side was aging: Collingwood 34, Strauss 33, Pietersen 30. The eldest (Strauss & the retiring Collingwood) needed to be replaced in 2011. As it was, Strauss stayed on for 18 tests, but would pass 100 only twice more, averaging 31 after the 2010/11 Ashes.

After a decade of continuous improvement the team had peaked. They remarkably managed to have a top 7 who all had career averages over 40. These are hard to replace: a Test team are doing well if they can unearth someone every other year who can average 40.

What happened next?

As players retired they were, predictably, hard to replace. England were also unlucky in that Pietersen and Trott didn’t go on to play full careers with England.

By March 2014 England’s ICC Test Rating had slumped to 100: they went from best in the world to average in 38 months. In May 2014 they lost a home series against Sri Lanka. Some stars (Cook and Root), some young players being played too early to succeed (Ali and Buttler) and Bell had gone on a bit too long.

Current Team: 23rd Jan 2019, vs West Indies (Away) – Expected Average 268

Jennings, Burns, Bairstow, Root, Stokes, Buttler, Foakes

Not bad, probably the best selections that could have made, and should be too strong for the West Indies.

It’s important to see this side for what it is: lacking in stars, yet well balanced with three all-rounders. With Ali bolstering the batting at 8, this team are likely to continue the trend of winning at home but losing away against the top 6.

Verifying the Data

To check this model (age adjusted batting average) against reality, I compared this to the ICC rankings. The correlation is clear. Worth noting that since the Age adjusted Batting Average is smoothed using a 5 point moving average, there is a time lag in the orange curve. This correlation is surprising as the ability of the top 7 batsmen makes up less than half of the strength of a team (the remainder being bowling ability and tail batting strength).

Conclusion: England 2019 are at about the level of the 2005 Ashes side, by having no weak links rather than being packed with world-beating batsmen.

35 is the new (and old) 40

Managers tend to pick a strategy that is the least likely to fail, rather then to pick a strategy that is most efficient. The pain of looking bad is worse than the gain of making the best move.

Moneyball (2003)

In the last 35 years England have had just 15 batsmen who averaged more than 40 over their career. Expectations should shift: aspire to players averaging 40; accept batsmen averaging 35.

The chart below may surprise you – it surprised me. How could barely any recent English batsman reach the benchmark set for them? Averaging 40 (at least in my head) was a minimum, not an elite average.

The data speaks for itself- 45 isn’t the new 40. 35 is the benchmark, and has been for a long time.

We, the red ball loving hordes (and our journalist generals) need to help the selectors by having realistic expectations.

The selectors should return the favour: stick with players that are good enough, even if they aren’t stars, and even if pundits are piling on the pressure.

Next time someone is 10 tests into their career, averaging 34 and with the data saying they would average 35 long term, let’s not call for a change because they aren’t scoring enough. Only remove them if a better prospect comes along – not someone with similar numbers who we might want to gamble on.

There’s a great case study: Andrew Strauss retired in 2012, and received wisdom is that he is yet to be replaced as an opener. We wanted the next Strauss. We should have been looking for the next Rob Key (15 tests averaging 31 between 2003-2005 while we waited for the next star batsman to come along).

Instead Cook was partnered by Compton (average 31) – Root (42) – Carberry (28) – Robson (31) – Trott (12) – Lyth (20) – Ali (14) – Hales (27) – Duckett (23) – Hameed (32) – Jennings (27) – Stoneman (28) – Jennings

Remember who Carberry got his runs against? An away Ashes series in 2013: Harris, Johnson, Siddle, Lyon, Watson. Those 281 runs were well earned.

With hindsight, pretty much every pick between Robson and Jennings was an error. England had viable alternatives for Strauss 3 times: Compton, Carberry and Robson. Having rejected them, playing people out of position (Trott / Ali) and gambling on youth (Duckett / Hameed) as the next cabs off the rank as England moved ever further down the list of possibles.

England chose weaker options because they weren’t willing to settle for a batsman averaging in the low-30s. That cost England runs- and since the selectors’ are employed to pick the best team possible, this is a failure. One they don’t get criticised enough for. Fear not, dear reader, we know England’s best batting options– and will collectively tut if the selectors deviate from them!

Conclusion: England should hold their nerve, even if Burns and Jennings are only averaging 33 coming into the Ashes.

Using CricViz False Shot % as an alternative to Averages

CricViz now use False Shot Percentages as a metric for assessing batsmen. Most recently they have done this as one factor when considering Australia’s options for the Sri Lanka tour.

A key point is that False Shots and averages are not equivalents – if a two batsman both have a 10% False Shot rate, the more attacking batsman will average more because they will score more runs for each error they make. One has to combine False Shot Rate and Strike Rate to get a useful metric.

As such, I’ve used the data CricViz published, and overlaid that with First Class Strike Rates to give an expected average derived from False Shot %

The chart shows that Maxwell leads the options (due to his Strike Rate of >70 runs per hundred balls, combined with a healthy 10.4% False Shot rate. This is interesting because his 3 year Sheffield Shield average was only 43. Worth bearing in mind he isn’t a Red Ball regular, with only 962 runs in the last 3 years.

Handscomb (real world average 50, False Shot average 57) can feel hard-done-by to have missed out on selection. He averages 38 in Tests, it looks an odd choice.

There is evidence that Pucovski is as good as the hype – CricViz’s data suggesting that not only has he performed well (FC Average 49 after 8 games), but that it isn’t a fluke (v.low False Shots implying he may have been unlucky to average only 49 in those 8 matches). Still, it’s a small sample size.

Conclusions: False Shots combined with Strike Rate are a potentially useful tool in predicting player averages when limited data is available (such as young players). However, more evidence is required of long term correlations before False Shot % and Strike Rate replaces averages.

A review of England’s batting options

Eeny meeny miny moe

Anon, Pre-1820

Whinging about selection is part of how I traditionally spend the days leading up to an England Test. It’s my habit, and I’m probably not alone in that.

With the new(ish) England selection panel of Ed Smith, Trevor Bayliss, and James Taylor, whinging about batting selection has been more difficult.

Burns in for Cook? The logical choice. Moeen Ali recalled? Makes sense. Buttler plucked from White Ball obscurity? Not what I would have done (Hildreth or Livingstone), but OK.

Looking for some whinging ammunition ahead of England’s first warm up game against a West Indies Board XI on 15th Jan*, I did some analysis of England qualified batsmen. Specifically, their records in the last 3 years of all Red Ball Cricket (Test to 2nd XI, adjusted for difficulty).

What I expected to see was a clear hierarchy of players, with some of my favourites at the top, and England’s sub-optimal picks somewhere down the list. Actually, the selectors’ choices are supported by the data, and England have a big group of players who are of very similar abilities.

Below I’ve grouped players by expected Test average, based on the last 3 years:

World Class (Expected Average 42+) – Root & Bairstow

Test Regulars (Expected Average 35-42) – Pope, Burns, Ali, Stokes

Plausible Selections (Expected Average 30-35) – Stoneman, Roy, Buttler, Westley, Wells, Jennings, Livingstone, Gubbins, Brown, Ballance, Foakes, Clarke, Hales, Denly, Woakes, Duckett.

Wildcards (Data says Expected Average >30, but reasons to be suspicious)– Northeast: mostly driven by 2016 scores in Division 2. A poor run at Hampshire lately. Hughes: scored 425-3 in 2nd XI last 3 years. Didn’t play a first class game in 2018, only made 209 runs at 23 in the 2018 North Staffs Premier League, so probably safe to rule him out of Ashes contention.

Conclusion:

From a batting perspective, England have chosen well. They’ve picked all the World Class and Regular players (apart from Pope, who only has 32 completed innings, and is on the fringes of the squad). All their other batsmen are from the Plausible Selections bucket. England have a lot of Plausible Selections; it doesn’t really matter which of them they pick. Dropping Buttler for Hales would be worth about 4 runs over the course of a Test. As long as the selectors keep picking players that are amongst the best available, I’ll cut them some slack.

Other Discoveries:

  • England’s batting is weaker than at the start of the decade. England were spoiled by a team with 7 batsmen who averaged over 40 – like this side that beat South Africa by an innings in Durban in 2009. Pragmatically, they use 2 or 3 all-rounders (Stokes, Ali, Woakes) and often use 8 batsmen to do the job that 7 did at the start of the decade.
  • A number of players have been tried that currently average under 30 in Tests: Stoneman, Westley, Jennings, Duckett, Hales, Pope. This analysis indicates that these were good selections, and much of the underperformance is due to chance. An example: Stoneman averaged 28 in 11 tests, against an expectation of 34. But 11 tests is a small sample size, and 7 of those tests were away, including an Ashes series.
  • Bairstow is one of England’s two best batsmen. Dropping him would be an error.

*England’s Squad to tour the West Indies (Batsmen only):

Joe Root (Yorkshire) (captain), Moeen Ali (Worcestershire), Jonny Bairstow (Yorkshire), Rory Burns (Surrey), Jos Buttler (Lancashire), Joe Denly (Kent), Ben Foakes (Surrey), Keaton Jennings (Lancashire), Ben Stokes (Durham), Chris Woakes (Warwickshire)

Explaining the Underperformance of Overseas batsmen in County Cricket

At Globogym we’re better than you. And we know it!

Dodgeball – 2005

In last week’s blog, the data showed how poorly some overseas players performed in First Class cricket compared with their Test performances.

Looking at overseas players, surprisingly they perform 21% worse in Division 1 than their Test average. Contrast that with England players who do 28% better. Two examples jump out: Pujara scoring 172 runs at 14, Kane Williamson scoring 260 runs at 26. How can we explain those scores?

As there have been only 20 non-England Test players in Division 1 over the last three years, the sample size is too small for meaningful analysis. To get more insight, I’ve combined Division 1 and Division 2, which increases the sample size to 331 completed innings. I then found 3 factors which influence performance:

  • SA / NZ / Australian players outperform other nations (probably because these are the countries with conditions most similar to those in England).
  • Test players will average more in Division 2 than Division 1.
  • Top order (1-3) batsmen are most affected by English conditions (this makes sense – they will face lengthy spells against the best County bowlers with the ball swinging and seaming more than they are used to). Middle order players (numbers 4-7) are unaffected, while tailenders get a boost to their average.

I created a model to quantify this behaviour, combining these factors. The best fit to the data is as follows:

  • SANZAR +10%, others -10%
  • Top order -25%, Middle order +3%, Lower order +25%
  • Division 2 +10%

Applying this makes Pujara’s performance less of an outlier, and more a function of being a number 3, and therefore the wrong type of overseas batsman to go for. Using my model, his expected average in D1 is just 36, and while he underperformed this, it’s no longer an outlier. Similarly, Azhar Ali (Test Avg 48) would be expected to average 33, and averaged 34.

But – the current iteration of the model has arbitrary cut-offs (why should a number 4 outscore a number 3 by 25%?) and the above table has a high standard deviation. I’ll enhance it once it can be tested against 2019 data.

What the current model can do is make predictions:

Poor 2019 Overseas Player selections

Azhar Ali will be playing for Somerset next season. He’ll be 34 by then, and will be expected to average 30. I hope they aren’t paying him too much. Next season could be the one where Somerset’s batting frailty bites.

Bancroft at Durham and Joe Burns at Lancashire should struggle at the top of the order.

Top 2019 Overseas Player picks

1. S.Marsh better hope Glamorgan bat him below 3 – he could do well if he avoids the new ball.

2. Temba Bavuma isn’t the strongest Test batsman, but as a 28 year old he’ll be at or near his peak, and Division 2 cricket with Northamptonshire should suit him. It helps he doesn’t start until 14th May.

3. Bowlers! Abbas, Worrall, and Siddle should be far more valuable than top order batsmen. That said, I’ve not done the analysis of bowlers yet. Watch this space.

Test vs County Cricket Averages

“Coach woulda put me in fourth quarter, we would’ve been state champions. No doubt. No doubt in my mind.”

Napoleon Dynamite (2004)

It’s often assumed that we cannot compare Test and first class batting performances – the old comparing ‘apples to oranges’ conundrum. But if we can quantify the relative values of the different formats, we can compare like with like.

Looking at batting performance of players who’ve played across multiple formats in English* domestic cricket (2016-2018), one can assess the relative difficulty of each tier. My analysis found that it’s 19% harder to bat in Test Cricket than it is in Division 1.

If a player averages 40 in Division 1 – the data says you could expect him to average 31 in Test cricket, 44 in Division 2, and 54 in the 2nd XI.

That tells us that you’d need to consistently average over 55 in Division 2 to average 40 in test cricket – hence so few England players being pulled from those ranks in recent years.

It also means that Hildreth (who I’ve previously thought of as an England option as he averages 41 in Division 1) would be expected to average 32 in Tests, and therefore isn’t the batsman we are looking for.

A few examples of 2016-2018 Division 1 and Test averages:

Note that only Root and Buttler underperformed in Division 1 relative to Test Cricket.

At this point its worth going into the assumptions – professionally I’m always keen to show where the data ends and the judgement begins. The data can tell us performances for each player who crosses tiers. Judgement needs to be applied to appraise that data and turn it into a single factor.

Some options:

  • Jonas (@cric_analytics) has looked at minimum 10 innings in both competitors – the downside of this is that it excludes valid data points. For instance, Ben Stokes scored 226 @ 28.3 in D1 in the last 3 years – 10 runs below his test average. That should count to the total, even if it’s a small sample. Jonas reckoned a 20% gap between Test and County cricket – slightly wider than my data suggests.
  • Include all overlap – the risk is that this is skewed by a few high/low scores from one-test wonders against weak/strong opponents. This gives a mere 2% difference between Test and D1.
  • Overseas players included: this gave an 8% gap between D1 and Test – but playing away from home knocks 10% off batting average, so this is not a fair comparison. To put it another way, Pujara playing for Yorkshire averaged 14, because every game was an away game.
  • I have used relative performance for English players with >4 completed innings in each format, and weighted the overall result according to the lower of the completed innings in each format. For instance, Ben Stokes has played 8 completed D1 innings, but 46 Test innings – so the overall result is weighted with a factor of 8 because of Stokes’ performances, while Dawid Malan played 36 D1, 26 Test innings, so is more useful for this exercise and receives a weighting of 26.

Adjusting for the level individuals are playing at, allows comparison of players in different tiers. In future posts I’ll look at some implications of this data:

  1. 2nd XI players with the potential to be First Class batsmen
  2. England’s best available batsmen
  3. Overseas players: who has & hasn’t succeeded – will look at any trends in the data.
  4. It’ll take more number crunching, but I’m interested in linking First Class / List A performance- to see how well correlated they are, and use that to gauge quality of players for which limited data is available (there are a lot of players with a handful of FC games behind them – too few completed innings to fairly appraise them

*I know it’s English and Welsh. Sorry Glamorgan. There isn’t an easy word for English and Welsh, so I’ll use English as shorthand for English and Welsh.

The Journey Begins

Thanks for joining me!

“You’d better listen to her, because the Pentagon does”

Top Gun (1986)

A bit about me before I get into the numbers:

It’s easy to have an opinion, and particularly easy to broadcast that view online. Filtering out the noise is a challenge.

So why should anyone care what I think about cricket?

Well, my cv for starters- Masters degree in Physics from Oxford (4th year was focused on simulations of Earth’s atmosphere), then qualified as an accountant, spent 2 years in Banking Front Office (where I cut my teeth on excel modelling), and after a further role in Banking Finance I’m now working for a FTSE-100 retailer, doing modelling and strategy.

It’s not quite the Pentagon, but you should listen to me, because some people at a FTSE-100 retailer do.

In 2011 I built a test match simulator – which could predict the outcome of an innings from a given starting point, based on ball by ball bowler vs batsman probabilities, and running the simulated innings enough times to get a reasonable sample (>1,000). This was mainly for gambling, and it works.

Later I expanded this to cover the two white ball formats, though the 50 over model has always received more attention than the 20-20 one – I don’t mind 20-20, but I struggle to love it.

With a full time job, and a young family, cricket data comes third on the list – and that means I will focus on red ball cricket. There’s a lot of professionals who have got further than me in 20-20, and I’m not going to stand out by splitting my efforts across 3 formats.

Let’s see if I can come up with some original thoughts, and some predictions which stand the test of time.

Ed Bayliss, Dec 2018.