Rating Wicket Keepers

A good wicket keeper takes their chances, scores runs, and doesn’t allow many byes.

Only one of those can be easily measured – we use the batting average.

For the others, a proxy will have to do: percentage of team dismissals (measuring ability to hold catches), and byes per hundred overs (in lieu of measurement of fielding errors).

Here’s a chart showing those three measures (the larger bubbles are higher batting averages).

Fig 1: Wicket keepers who have played >20 Tests during the 2010s.

Now we are getting somewhere. The ideal player is at the top left with a big bubble. AB de Villiers wins. There’s insight in this chart, Adnan Akmal’s career passed me by – but through the above we can see a decent gloveman, albeit not up to the batting requirements of Test cricket.

How about comparing players? De Villiers beats Mushfiqur Rahim on all three measures. But what about Tim Paine vs Sarfaraz Ahmed? We need to apply a weighting to each factor. Here’s my estimate and why:

  • Byes: runs impact = Byes/100 overs*1.62 because there are 162 overs in the average match.
  • Average: runs impact = Average * 1.48 (because the average wicket keeper is dismissed 1.48 times per match.
  • Percentage of team dismissals: runs impact = (% team dismissals – 28%) * 146.5 (I’ve estimated 4.5 dismissals per match for the keeper that takes 100% of chances, which would be 28% of team dismissals. At 32.6 runs per wicket, the perfect keeper is worth 146.5 runs more than a non-catching keeper. All keepers are between 0-146.5).

Here’s the same players ranked according to those weightings:

Fig 2: Wicket keepers who have played >20 Tests during the 2010s, now rated according to the weightings above.

AB de Villiers is comfortably the Red Ball Data Wicket Keeper of the Decade (RBDWKotD). I’m sure he’ll be chuffed. Was he flattered by averaging 59 with the gloves? No – he averaged 57 overall this decade.

There are secondary effects which one could measure (for instance, a left handed first slip taking chances from the keeper, reducing the number of dismissals). One might also want to consider intangibles, such as the ability to be distractingly inane, or contribute to strategy. These are judgemental, and I’m not qualified to opine on them.

There’s a point at the end of all this. Should Ben Foakes be in the England team?

Fig 3: Rating England’s best available wicket keepers

Wow, that’s surprised me. It has changed my mind – I was expecting this to vindicate the selection of Jos Buttler. If Foakes is as good a keeper as we are told, it is sufficient to outweigh his inferior batting. There is little to choose between Jonny Bairstow and Buttler. Note that the usual disclaimer applies: need at least 20 Tests to judge, and only Bairstow has hit that threshold. Oh, and I’ve used my (more detailed) batting ratings for this chart, while Fig 2 uses averages while keeping wicket in Tests.

The case against Zak Crawley

Running England’s first innings (NZ vs Eng, 29th Nov 2019) through my model told me that Zak Crawley had a median first innings score of 12. Absurdly low.

Rather than just spout that opinion in a tweet, I’ll walk you through how the model got there, and we’ll see if there are any gaps in logic. Have England made a terrible selection?

Zak Crawley’s County Cricket Record – Average 31

County Championship Division Two: 2017-18 – Runs 830. Dismissed 30 times. Average 27.7

County Championship Division One: 2019 – Runs 820. Dismissed 24 times. Average 34.2.

Second XI Cricket: 2016-18 – Runs 708. Dismissed 22 times. Average 32.2.

Redballdata.com Ratings – Expected D1 Average 30 – Rating those performances, and placing more weight towards recent performances, Crawley’s expected Division One average next year is 29.6.

Adjusting for Age – Expected D1 Average 30.4 – Zak Crawley is 21. He gets a c.3% boost to his expected average because his average is based on runs scored when he was 18/19/20.

Adjust for this innings – Expected Average 19.4 – A Test Match, away, against a strong New Zealand attack is much harder than a county game. That has a severe impact on average.

Run all that expected average of 19.4 through the model, and it predicted a median score of 12.0. What you would expect from a number eight batsman, not a specialist.

Gaps and biases

Let’s look at this from England’s point of view – why is Crawley in the team? I can think of three reasons:

  • He was in the squad, they didn’t really expect him to play. (That links to home advantage getting bigger as a series goes on: in this case it’s because injury means that a squad player, there to gain experience, gets drafted into the team.
  • England selectors use a different age curve and/or bias towards recent matches – bumping up Crawley’s expected average (along with every other young player).
  • Something in performance specific data (that doesn’t show up in averages) makes the England selectors think he’ll be especially suited to batting in New Zealand.

What happened?

Crawley made one run before Wagner got him. That additional innings has moved his expected average down a little more.

Getting your eye in across formats

As ever shorter forms of Cricket proliferate, fast starters with the bat are becoming more valuable (just ask Dane Vilas). Let’s take stock of how players start across the three formats.

Tests, ODIs, and 20-20 are essentially the same game. Each match starts on a fresh pitch, with different ball, bowlers and climatic conditions to the last game. It takes more than one over to get one’s eye in, so batsmen are simultaneously settling in against multiple bowlers no matter the flavour of Cricket they are playing. The key differences which might impact how batsmen settle in are the white ball deviating less than the red ball, and the intent of batsmen differing.

How do we measure how settled a batsman is in each format in a given stage of the innings? In Tests we can use a batsman’s average ball-by-ball. However, in white ball Cricket Strike Rate is the prize (a cynic might suggest the curves suit my story so I’ve cherry picked).

The first thing to note is that these curves are quite similar – 20-20 batsmen really do take time to go through the gears. I’ll wager the slow acceleration is not through lack of effort. Players will be going as fast as they think they can: accelerating faster risks their wicket.

Data sources

Normally, this site is based on numbers I’ve crunched. This piece is an exception – while the ODI analysis is my own, I’ve based the 20-20 curve on this piece from sportdw.com. An aside – I looked at this back in 2017, and the bookmakers weren’t getting their “what happens next ball” odds right. Wish I’d been in a position to capitalise, but with a young family such activity was very much on the back burner. Anyway, it’s a fine post from sportdw.com, so take a look.

For Test matches I’ve used the analysis mentioned in my last post, Bayesian survival analysis of batsmen in Test Cricket. I had to make some assumptions to take Stevenson & Brewer’s player specific data and create a general case. Any errors in that process are mine – if you’d like to know about “getting your eye in” for Test Cricket, please read their work.

The current state

Let’s think about this behaviour from a theoretical perspective: in any innings a team are looking to make the best use of their resources. In Tests that’s as simple as maximising runs per innings: pick players that score the most runs per wicket, regardless of whether they start badly then get good, or quickly settle but then don’t improve much. Appraising Test players is easy – just look at the average*!

ODIs and 20-20s need to manage two scarce resources: wickets and balls. In 20-20 the number of overs is more of a constraint than in one day Cricket, which is to the advantage of faster starters. After 12 balls a 20-20 batsman is at full speed, while an ODI player is only scoring at 90% of their career SR.

What happens next

Data on 20-20 is everywhere. That’s partly why I don’t focus on it: the people that analyse 20-20 for a living are doing a great job. Seeing what’s available publicly, I can only imagine what depth of analysis the richest franchises have locked away on their laptops! I would be very surprised if there aren’t “getting your eye in” curves for each batsman, and teams optimising batting orders to get the fast starters into the right roles, alongside the high-average-slow-start-but-quick-when-they-get-going types.

Personally, I like a blunt approach – and assume all players play themselves in in the same way. Partly this is because I lack data – so have to make assumptions else I’d never have a model. Ultra short form games derived from Cricket will test that approach, and we will learn more about Cricket over the next couple of years.

Comparison with baseball

Baseball is a little like Cricket. Yet batters are up and running after just a handful of pitches. Why? A combination of there being only one pitcher, the ball not bouncing (so the ground matters less), and long breaks between innings mean that there’s limited benefit from repeated plate appearances in a game. Adding the yellow line for baseball shows just how similar the curves for Cricket’s three formats are. For now.

*For consistency I should say “factor in how many innings they have played, whether they were home or away, their record in each competitor adjusted for difficulty, what number they were batting, the grounds they were playing on, how old they are, which attacks they played against, what innings of the match each innings was” – but my writing barely flows at the best of times; that detour would not have been helpful. Hopefully you know what I meant.

Is middle order batting easier than opening?

I have a theory that openers are better than middle order batsmen with the same average. If someone averages 35 against the new ball, that has to be better than averaging 35 against the third change bowler using a 60 over old lump of leather.

Here’s Michael Carberry’s take on the unique challenges of opening:

As openers, we don’t have the luxury of being able to come in against the old ball where it’s doing less. You see it on the first morning of a match. Everyone’s prodding the wicket. ‘Oh yeah, this looks a belter’. It’s never a belter when you’re facing the new ball. If the ball is going to do something, generally you’re the one who’s going to get it.

If Carberry is right, once openers see off the new ball, their expected runs for the rest of the innings should be higher than their career average – they’ve done the hard part.

How to prove it though? The proper way would be to show what various batsmen average at differing stages of their innings, against particular bowlers, against both old and new ball, and when the bowler is in their first, second, third spells. That’s not complicated, but would be time consuming, starting from a ball by ball database of Test Cricket. I haven’t done that. Instead, I’ve looked at what happens to a batsman’s average once they are “in”. Previous analysis tells me a batsman is fully in by the time they are 30 not out.

Fig 1 – difference between Runs scored and Expected Innings Average once a batsman has got to 30 runs. This is the boost to someone’s average once they have “got their eye in”. Split by batting position. Expected Average is career average, adjusted for home advantage, innings number, ground. Test matches 2005-2019.

Analysis

  1. The benefit from “getting your eye in” is worth about five runs onto a player’s average (ie. if you average 40, by the time you get to 30* your expected average goes up to 45).
  2. Surprisingly, openers don’t get a further boost once they get to 30. This is odd – by the time an opener is on 30, 20 overs would have gone, the three best bowlers would have bowled six or seven overs and the ball would no longer be hooping round corners. I’ve definitely watched England play, rocking back and forth in my seat saying “if they can just get to 20 overs, see off the new ball, it will get easier. It’ll be all right”. Turns out that was piffle. It gets easier (c.12%), but it’s not a violent swing into the batsman’s favour.
  3. Weaker middle order batsmen get the biggest benefit from getting to 30. I think that’s because they really are the easiest times to bat – 40+ overs into the innings, tired bowlers, etc. In other words, these players aren’t becoming relatively better once they are in – they just tend to be building an innings as conditions become more favourable.

Conjecture

Put the above analysis together, and I’ll give you a second hypothesis – collapses in red ball Cricket are partly because lower middle order batsmen’s averages flatter them. A batsman that averages 30 can make hay in helpful conditions – yet they only average 30. That must mean that they average less than 30 in challenging conditions. Maybe when the going gets tough, the middle order will disproportionately get blown away. Unfortunately for me, that hypothesis doesn’t show up in the numbers. Yet.

Methodology

Since “Expected Innings Average” (EIA) is a non-standard metric, it’s worth explaining what it is and how I’ve derived it, else you’d have every reason to dismiss this as someone fitting the data to match their hypothesis.

EIA was calculated for every innings where a batsman scored over 30. Their runs in that innings (minus 30) were compared to their EIA to get a view of how their average (once they had got their eye in) compared to what one would expect from when they started their innings. Thus Benefit from getting eye in = Runs scored – EIA – 30.

To calculate EIA I started with the batsman’s career average. Then adjusted for the runs per wicket on that ground, then added or subtracted 8.5% depending on Home/Away. To adjust for not outs, I added the EIA to the not-out score.

For instance, when Virender Sehwag scored 319, his expected average was 49 (Career Average) * 1.2 (Ground adjustment for Chennai) * 1.17 (Innings Adjustment – for the 2nd innings of the match) * 1.085 (Playing at home) = 74. Conditions were favourable – but he still exceeded expectation by 245.

In case you aren’t a fan of the above, I also calculated the impact based on raw averages. It doesn’t reveal much. Just goes to show how important the context of an innings is: raw averages are just too simplistic.

Fig 2 – difference between Average and Career Average once a batsman has got to 30 runs. Test matches 2005-2019. Note how much more volatile this chart is than Fig 1. Also that (using raw averages only) numbers three and four appear to have a negative impact from getting their eye in!

Further reading

Michael Carberry’s recent interview in Wisden is linked here.

Here’s a proper statistician’s view of the early stages of a batsman’s innings – Bayesian survival analysis of batsmen in Test cricket. Note how low effective averages are when a player is on less than ten. A far more pronounced effect than I had expected.

Opening batsmen: the divergence of ODI and Test players

Before the Ashes Gio Colussi of The Cricket Academy analysed the two batting lineups and pointed out the White Ball bias in the England camp – they had picked batsmen who were stronger ODI players. He did not expect this to work out well for England. He was right.

Wind the clock back. The good old days. Specifically the noughties (or 2000s, or whatever). An opening batsman fulfilled the same role in Tests or ODIs. Hence their ODI and Test averages were similar, and you could use one to predict the other with a fair degree of confidence.

Fig 1 – Averages of openers to have played >20 innings in Tests and ODIs from 2000-2009

The correlation is so good that the names get all jumbled up on the straight line running from (20,20) to (50,50). Yes, there’s some Test specialists there (Cook, Strauss) but most of the 23 players that meet the criteria for inclusion behave as expected.

That correlation has broken down now.

Fig 2 – Averages of openers to have played >20 innings in Tests and ODIs from 2012-2019. Note the same axes as Fig 1.

There are three distinct types of player, reflected in the clustering in the chart:

  • Versatile elite batsmen (Warner, Iqbal) – just as good in either format, average over 40 in both.
  • Test specialists (Latham, Azhar Ali) – who are/were good enough to play in ODI Cricket, but averaged at least five lower in ODIs
  • ODI specialists (Hales, Guptill) – averaging under 30 in Tests.

I’m reminded of the film Titanic (1997) explaining the captain’s complacency: “26 years of experience working against him”. That line stuck with me – it’s easy to assume past trends will continue, and that you can use opening the batting in ODIs as a pathway into opening in Tests.

Not any more. Unless the player is good. And I mean really good, the best predictor I can see for successfully opening the batting in Tests is successfully opening the batting in red ball Cricket. Think about Jason Roy – ODI Average 43 as an opener, Test Average 19. I don’t think anyone is now expecting him to average 35 in Tests as an opener. Yet someone must have thought he could, else he wouldn’t have been picked.

redballdata.com – closing the stable door after the horse has bolted!

PS. This piece serves as another reminder to me to continually check that the trends I’ve seen still hold – else one day I could be the mug taking Fig.1 to a meeting, persuading everyone to pick the best ODI openers to open the batting in Tests.

County grounds ranked by ease of batting

In this piece I’ll look at which grounds are best for red ball batting, and use that to see what impact that has on averages: how much of a boost do Surrey’s batsmen get from playing at the Oval?

Fig 1 – County grounds ranked according to runs per wicket in County Championship matches over the period 2017-19. Grounds where fewer than 100 wickets fell in that time are excluded.

So what?

Beyond it being a spot of trivia, I can immediately see two reasons why this matters.

i. High scoring grounds harm the county’s league position

In County Cricket there are 16 points for a win, 5 for a draw and none for losing. A win and a loss is worth 16 points, while two draws is worth 10. Drawing is bad*.

Fig 2 – Runs per Wicket in the County Championship over 2017-19 plotted against the Draw percentage for that ground. Higher runs per wicket are associated with more draws.

And yet there are teams producing high scoring pitches, boosting the chances of a draw, and reducing their chances of picking up 16 points.

Compare Gloucestershire’s two home grounds since 2017: at Bristol (32 Runs per Wicket), W2 L4 D8. Cheltenham (28 Runs per Wicket), W4 L1 D2. Excluding bonus points, Cheltenham is worth an extra 5.4 points per match. While that’s an extreme example, and the festival only takes place in the summer months, there’s still the question “why make Bristol so good for batting”?

Maybe a deeper look at the data will reveal why Gloucestershire and Surrey don’t try to inject a bit more venom into Bristol and The Oval; for now it looks like an error.

*There’s an exception: a team that is targeting survival in Division 1 might choose to prepare a flat track and harvest batting points plus drawn match points in certain situations. For the other 15 counties, drawing is still bad.

ii. Averages should be adjusted to reflect where people play their Cricket.

When using data to rank county batsmen and bowlers, the one gap that I couldn’t quantify was the impact of how batting or bowling friendly each player’s home county is. With this data we can add an extra level of precision to each player’s ratings.

How would we do that? It would be wrong to simply take the difficulty of a player’s home ground as the adjustment – because there are also away games. The logical approach would be to take the average of that player’s home grounds (50%, weighted by the various home grounds that county uses) and the other teams in that division (50% weighting).

Fig 3 – Impact on batting average from the relative batting friendliness of that county’s grounds (2017-19).

For instance, Olly Pope’s average is artificially inflated by 10% from being based at The Oval. That takes his rating (expected Division 1 average) down to 54.6 from the suspiciously strong 60.7.

Fig 4 – Selected players’ expected averages, now we can adjust for each player’s home county

Equally, Tom Abell clambers up the ranks of 2019’s County batsmen: his rating jumps 7.1% to 35.6 from 33.2. Not an extreme move, but a nice boost to go from 50th to 31st on the list.

This takes us one step closer to a ratings system that captures everything quantifiable. Before next season I’ll adjust the ratings of batsmen and bowlers to reflect this factor.

Further reading

A summary from 2004 of the county grounds and how they play http://www.bookmakers1.com/englishcricketgrounds.html

Remarkable how many of the descriptions feel alien now – you wouldn’t believe that Taunton was “an absolutely stonking batting track”.

Underrated Bowlers – 2019 season

This is my first attempt at something difficult: finding the best players that aren’t regularly playing County Cricket, but that are good enough to do so. In theory there shouldn’t be very many players like this – because counties will know who their best players are.

I’ve used my database of bowling performances from 2016-19 in County Championship and 2nd XI Championship Cricket and picked out six that have promising data.

Time will tell how many of these players get regular first team cricket (and succeed) in 2020.

Fig 1 – Strongest bowlers that played three or fewer County Championship matches in 2019.

I’ve looked at players that have been selected for no more than three County Championship matches in 2019, for reasons other than injury.

Note that England’s Matt Parkinson only played four games for Lancashire in the 2019 County Championship, so might have made it onto a list like this, but he is unlikely to be under anyone’s radar now he’s in the Test squad.