Jofra Archer is struggling with the bat in Test cricket, averaging eight and lengthening the tail. Yet he has a First Class average of 26. Is he getting an easy ride batting down the order for Sussex, then being found out at the highest level? Let’s find out.
Recap – Linking Division 2 and Test Batting
Previous workings showed that a played would expect to average 72% as much in Tests as they do in Division 2 (D2). There isn’t that much data though: most Test players are drawn from the top division. Just four players have over 20 completed innings at both levels over the last four years:
Not a bad fit – D2 averages do have reasonable predictive power of Test performance for batsmen (please take note Mo Bobat). You just need to play a decent number of games in both formats.
But what about tail enders in Tests?
Most of the overseas players in D2 are batsmen. There aren’t many bowlers in D2 to have also played Test cricket lately. Here’s the data for the five lower order batsmen to have eight or more completed innings in Tests & D2:
Remember none of these players has 20 completed innings in both formats, so expect volatility. Archer and Mohammad Abbas are the outliers: Archer averaged nearly four times as much in D2, while Abbas has a slightly higher Test average.
Across the five players, their Test average is 63% of their D2 batting average (for all players this figure is 72%).
Tail enders in D2 vs D1
Data is lacking on tail enders in D2 and Tests. Let’s answer a different question. If we are happy with the standard of D1, then all we need to do is demonstrate similar averages for the lower order in D2 and D1, and we can conclude that Jofra Archer is good at batting.
The above chart is for all batsmen that have >15 completed innings in D1 and D2. If anything the trend is for higher averages in D1. Can’t explain that, but at least that gives some comfort that the tail isn’t getting an easier ride in the lower division.
Jofra Archer would be a very unusual player if he continues to average under ten in Tests. I would expect him to average 17 in Tests based on all available red-ball innings. It just happens that the County Championship has seen the best of his batting, and Test cricket the worst.
Sporting autobiographies tend to wash over me. A reminder of a forgotten series, but not much more. I was recently given a copy of On Fire: My Story of England’s Summer to Remember by Ben Stokes, and made some notes along the way to keep my focus.
If you have a copy of the book, you might find it insightful to read this analysis alongside it. Page references are for the hardback copy. If you don’t have a copy, the below analysis (hopefully) speaks for itself.
p10 [On Alex Hales] “The collective feeling was that he should have felt comfortable enough in the environment that we had created to let on what was happening”.
Somewhat naive – if someone doesn’t feel able to speak up, it isn’t necessarily their fault. It is easy to think that in any team we have a positive and open culture, though we only find that out when it is tested.
p50 [Moeen Ali explains to Stokes how to succeed in ODI middle overs bowling] “Bowling in one-day cricket, I have found you need to change your mindset. It’s not about trying to get the batsmen out. The building of pressure will lead to wickets”.
Note that this strategy works with a strong batting lineup (as England now have). With mediocre batting, attempting to contain will just get you milked and vulnerable to a late onslaught with wickets in hand. It’s surprising that Moeen is delivering this insight – shouldn’t this come from the management and coaching staff? Maybe this is just being told this way to make a better story.
p66 [On putting Buttler up the order in ODIs] – “I’m normally at the forefront of calls for him to his his pads on once we get beyond the 20-over mark with plenty of wickets in hand.”
Stokes and I have come to the same conclusion. If cricketing experience and modelling are in harmony, one is on fairly safe ground.
p71 / 117 “Don’t wait until you get into the middle to decide how you’re going to play”. / “Jimmy Neesham was in my sights. This wasn’t a pre match plan”.
I like the approach to planning an innings. Yet I wonder what caused this apparent contradiction in approach with Stokes not taking his own advice and winging it against Neesham.
p115 [on Colin de Grandhomme getting Joe Root out] – “he certainly would not have been the bowler expecting to be troubling him had you considered the key match-ups before the final”.
A rare sentence in a book that isn’t big on analytics – does that mean England are using match-ups to plan which bowlers to attack? It doesn’t sound that way from the way Stokes describes his approach to an innings elsewhere in the book. Or is it that this kind of analysis would not be of interest to the target audience so is kept to a minimum?
p121 Stokes tells Jofra Archer (facing the last ball of the penultimate over) not to score a single so Stokes can face the last over. A logical approach to chasing, maximising expected runs from the seven balls remaining. I wonder if they would have made the same choice if batting first? Do teams leave runs unscored because in the first innings there isn’t the same pressure?
p129 sets out how Stokes scored 84* in the World Cup Final then went straight into the super over, scoring 8* (3). Carrying on his innings gave England an advantage – look at how strike rate evolves with balls faced. I estimate this benefit to be worth at least one expected run in a super over. Two insights: consider keeping your not-out batsman at the crease for the super over; also worth looking at whether the team batting first in super overs has an advantage from a gambling perspective.
p172 “Edgbaston … has the best atmosphere”.
There’s a contradiction in Stokes’ writing – he is in a bubble when batting, doesn’t want external factors to influence him, yet the enthusiasm of the crowd matters to him.
p196 “Lord’s is such a fast scoring ground for a batsman who is in… no matter what the pitch is like… It really can give you 20 per cent more value for your shots than other grounds”. Nice hypothesis, and I know what he means. It isn’t true though. Here’s some charts:
p176 “if an umpire gets overturned from “out” to “not out” by his TV colleague, a reluctance to give more “outs” usually follows.”
p242 [England are nine wickets down, with eight to win. Stokes and Jack Leach are batting well together] “doubts started to creep in as I questioned myself on how to play going forward. Should I continue in the same manner?” Here, Stokes is struggling with a statistical question – the equation has changed. With 73 to win, England just needed to bat to maximise expected runs to have the best chance of winning. In other words, follow the approach I set out earlier this year. However, with eight to win, England needed to give themselves the best chance of scoring those runs. That might mean hitting boundaries off the last two balls of an over – which ordinarily would leave the number 11 facing a whole over, but in this case wins the game. Stokes’ cricketing instincts align with a statistical approach.
p260 Old Trafford is described by Stokes as generally the flattest pitch in the country. I wonder what he means by this – it certainly isn’t the easiest to bat – averages are 9% higher at The Oval in Tests. In ODIs Old Trafford is the lowest scoring of England’s grounds.
Overall, ignore the bits where Stokes talks about his current colleagues (because he will only praise them) and there’s still plenty to get your teeth into. We get a lot of insight into Stokes’ process when batting, and his soft skills as a senior player in the team. I wonder, is this a 304 page application to succeed Joe Root as England Test captain?
David Warner seems to have a preference for familiar conditions. After 82 Tests he averages 66 at home, 33 away. 2019 has been a rollercoaster: averaging 9.5 touring England, then dismissed just three times amassing 551 runs in Australia.
Would we expect that trend to continue? No. I’ll exhibit two bits of evidence against some players being disproportionately dominant at home. Firstly tracking a recent crop of players, and secondly by demonstrating that the great players in home conditions are what we would expect from chance.
We consider players that did relatively well at home up to a point in time (31/12/2016), and see if this continued, or if they regressed to the mean.
The above table indicates Home : Away Average Ratio (HAAR) history is a poor predictor of future returns. Elgar was great then OK. Amla was rubbish then brilliant. Plotting the data shows just how scattered the 12 data points are.
Putting it another way, if you had spent your Christmas 2016 holiday seeking home ground heroes, you would have been wasting your time*. Pujara, Broad and Elgar had HAAR ratios around 2 (just like Warner does now), but past performance is no guarantee of future success – all three of them subsequently performed no better than average.
And the players that favoured touring? Three of the four who were stronger away pre-31/12/16 flipped to subsequently be better at home. The one exception was Ben Stokes: in his career he averages 36 at home and 38 away. Take that nugget with a pinch of salt: if Stokes is better on tour why does he average 44% more batting at home in ODIs?
Now to compare HAARs for Test cricket’s highers runscorers vs the theoretical distribution after 50 innings at home and 50 away:
Randomness plays a huge part (possibly up to 100%) in explaining the variation in Home:Away Average Ratios of Test cricketers.
There are other factors I’ve not included (for instance, a player might only struggle in swinging conditions). If there are specific cases where you think a player thrives only at home (or away), then please let me know.
Where does this leave us? Hopefully (for Warner’s sake) he has a few more years of Test cricket in him. That would also be useful for this blog – I look forward to reporting at the end of 2021 that Warner’s HAAR over the last two years has been the standard 1.2, and that past outperformance at home is no guarantee of future success.
*An aside – there’s a line from I Robot “I’m sorry: my responses are limited – you must ask the right questions”. While I wouldn’t normally take lessons from fictional holograms, I like the message in this. You can do decent-looking research, but if you start with the wrong question you’ll be wasting your time. In this example, “who are the best batsmen in home conditions” is the wrong question, one should ask “is there anything special about the ratio of a batsman’s home average to their away average?”
A good wicket keeper takes their chances, scores runs, and doesn’t allow many byes.
Only one of those can be easily measured – we use the batting average.
For the others, a proxy will have to do: percentage of team dismissals (measuring ability to hold catches), and byes per hundred overs (in lieu of measurement of fielding errors).
Here’s a chart showing those three measures (the larger bubbles are higher batting averages).
Now we are getting somewhere. The ideal player is at the top left with a big bubble. AB de Villiers wins. There’s insight in this chart, Adnan Akmal’s career passed me by – but through the above we can see a decent gloveman, albeit not up to the batting requirements of Test cricket.
How about comparing players? De Villiers beats Mushfiqur Rahim on all three measures. But what about Tim Paine vs Sarfaraz Ahmed? We need to apply a weighting to each factor. Here’s my estimate and why:
Byes: runs impact = Byes/100 overs*1.62 because there are 162 overs in the average match.
Average: runs impact = Average * 1.48 (because the average wicket keeper is dismissed 1.48 times per match.
Percentage of team dismissals: runs impact = (% team dismissals – 28%) * 146.5 (I’ve estimated 4.5 dismissals per match for the keeper that takes 100% of chances, which would be 28% of team dismissals. At 32.6 runs per wicket, the perfect keeper is worth 146.5 runs more than a non-catching keeper. All keepers are between 0-146.5).
Here’s the same players ranked according to those weightings:
AB de Villiers is comfortably the Red Ball Data Wicket Keeper of the Decade (RBDWKotD). I’m sure he’ll be chuffed. Was he flattered by averaging 59 with the gloves? No – he averaged 57 overall this decade.
There are secondary effects which one could measure (for instance, a left handed first slip taking chances from the keeper, reducing the number of dismissals). One might also want to consider intangibles, such as the ability to be distractingly inane, or contribute to strategy. These are judgemental, and I’m not qualified to opine on them.
There’s a point at the end of all this. Should Ben Foakes be in the England team?
Wow, that’s surprised me. It has changed my mind – I was expecting this to vindicate the selection of Jos Buttler. If Foakes is as good a keeper as we are told, it is sufficient to outweigh his inferior batting. There is little to choose between Jonny Bairstow and Buttler. Note that the usual disclaimer applies: need at least 20 Tests to judge, and only Bairstow has hit that threshold. Oh, and I’ve used my (more detailed) batting ratings for this chart, while Fig 2 uses averages while keeping wicket in Tests.
As ever shorter forms of Cricket proliferate, fast starters with the bat are becoming more valuable (just ask Dane Vilas). Let’s take stock of how players start across the three formats.
Tests, ODIs, and 20-20 are essentially the same game. Each match starts on a fresh pitch, with different ball, bowlers and climatic conditions to the last game. It takes more than one over to get one’s eye in, so batsmen are simultaneously settling in against multiple bowlers no matter the flavour of Cricket they are playing. The key differences which might impact how batsmen settle in are the white ball deviating less than the red ball, and the intent of batsmen differing.
How do we measure how settled a batsman is in each format in a given stage of the innings? In Tests we can use a batsman’s average ball-by-ball. However, in white ball Cricket Strike Rate is the prize (a cynic might suggest the curves suit my story so I’ve cherry picked).
The first thing to note is that these curves are quite similar – 20-20 batsmen really do take time to go through the gears. I’ll wager the slow acceleration is not through lack of effort. Players will be going as fast as they think they can: accelerating faster risks their wicket.
Normally, this site is based on numbers I’ve crunched. This piece is an exception – while the ODI analysis is my own, I’ve based the 20-20 curve on this piece from sportdw.com. An aside – I looked at this back in 2017, and the bookmakers weren’t getting their “what happens next ball” odds right. Wish I’d been in a position to capitalise, but with a young family such activity was very much on the back burner. Anyway, it’s a fine post from sportdw.com, so take a look.
For Test matches I’ve used the analysis mentioned in my last post, Bayesian survival analysis of batsmen in Test Cricket. I had to make some assumptions to take Stevenson & Brewer’s player specific data and create a general case. Any errors in that process are mine – if you’d like to know about “getting your eye in” for Test Cricket, please read their work.
The current state
Let’s think about this behaviour from a theoretical perspective: in any innings a team are looking to make the best use of their resources. In Tests that’s as simple as maximising runs per innings: pick players that score the most runs per wicket, regardless of whether they start badly then get good, or quickly settle but then don’t improve much. Appraising Test players is easy – just look at the average*!
ODIs and 20-20s need to manage two scarce resources: wickets and balls. In 20-20 the number of overs is more of a constraint than in one day Cricket, which is to the advantage of faster starters. After 12 balls a 20-20 batsman is at full speed, while an ODI player is only scoring at 90% of their career SR.
What happens next
Data on 20-20 is everywhere. That’s partly why I don’t focus on it: the people that analyse 20-20 for a living are doing a great job. Seeing what’s available publicly, I can only imagine what depth of analysis the richest franchises have locked away on their laptops! I would be very surprised if there aren’t “getting your eye in” curves for each batsman, and teams optimising batting orders to get the fast starters into the right roles, alongside the high-average-slow-start-but-quick-when-they-get-going types.
Personally, I like a blunt approach – and assume all players play themselves in in the same way. Partly this is because I lack data – so have to make assumptions else I’d never have a model. Ultra short form games derived from Cricket will test that approach, and we will learn more about Cricket over the next couple of years.
Comparison with baseball
Baseball is a little like Cricket. Yet batters are up and running after just a handful of pitches. Why? A combination of there being only one pitcher, the ball not bouncing (so the ground matters less), and long breaks between innings mean that there’s limited benefit from repeated plate appearances in a game. Adding the yellow line for baseball shows just how similar the curves for Cricket’s three formats are. For now.
*For consistency I should say “factor in how many innings they have played, whether they were home or away, their record in each competitor adjusted for difficulty, what number they were batting, the grounds they were playing on, how old they are, which attacks they played against, what innings of the match each innings was” – but my writing barely flows at the best of times; that detour would not have been helpful. Hopefully you know what I meant.
I have a theory that openers are better than middle order batsmen with the same average. If someone averages 35 against the new ball, that has to be better than averaging 35 against the third change bowler using a 60 over old lump of leather.
Here’s Michael Carberry’s take on the unique challenges of opening:
As openers, we don’t have the luxury of being able to come in against the old ball where it’s doing less. You see it on the first morning of a match. Everyone’s prodding the wicket. ‘Oh yeah, this looks a belter’. It’s never a belter when you’re facing the new ball. If the ball is going to do something, generally you’re the one who’s going to get it.
If Carberry is right, once openers see off the new ball, their expected runs for the rest of the innings should be higher than their career average – they’ve done the hard part.
How to prove it though? The proper way would be to show what various batsmen average at differing stages of their innings, against particular bowlers, against both old and new ball, and when the bowler is in their first, second, third spells. That’s not complicated, but would be time consuming, starting from a ball by ball database of Test Cricket. I haven’t done that. Instead, I’ve looked at what happens to a batsman’s average once they are “in”. Previous analysis tells me a batsman is fully in by the time they are 30 not out.
The benefit from “getting your eye in” is worth about five runs onto a player’s average (ie. if you average 40, by the time you get to 30* your expected average goes up to 45).
Surprisingly, openers don’t get a further boost once they get to 30. This is odd – by the time an opener is on 30, 20 overs would have gone, the three best bowlers would have bowled six or seven overs and the ball would no longer be hooping round corners. I’ve definitely watched England play, rocking back and forth in my seat saying “if they can just get to 20 overs, see off the new ball, it will get easier. It’ll be all right”. Turns out that was piffle. It gets easier (c.12%), but it’s not a violent swing into the batsman’s favour.
Weaker middle order batsmen get the biggest benefit from getting to 30. I think that’s because they really are the easiest times to bat – 40+ overs into the innings, tired bowlers, etc. In other words, these players aren’t becoming relatively better once they are in – they just tend to be building an innings as conditions become more favourable.
Put the above analysis together, and I’ll give you a second hypothesis – collapses in red ball Cricket are partly because lower middle order batsmen’s averages flatter them. A batsman that averages 30 can make hay in helpful conditions – yet they only average 30. That must mean that they average less than 30 in challenging conditions. Maybe when the going gets tough, the middle order will disproportionately get blown away. Unfortunately for me, that hypothesis doesn’t show up in the numbers. Yet.
Since “Expected Innings Average” (EIA) is a non-standard metric, it’s worth explaining what it is and how I’ve derived it, else you’d have every reason to dismiss this as someone fitting the data to match their hypothesis.
EIA was calculated for every innings where a batsman scored over 30. Their runs in that innings (minus 30) were compared to their EIA to get a view of how their average (once they had got their eye in) compared to what one would expect from when they started their innings. Thus Benefit from getting eye in = Runs scored – EIA – 30.
To calculate EIA I started with the batsman’s career average. Then adjusted for the runs per wicket on that ground, then added or subtracted 8.5% depending on Home/Away. To adjust for not outs, I added the EIA to the not-out score.
For instance, when Virender Sehwag scored 319, his expected average was 49 (Career Average) * 1.2 (Ground adjustment for Chennai) * 1.17 (Innings Adjustment – for the 2nd innings of the match) * 1.085 (Playing at home) = 74. Conditions were favourable – but he still exceeded expectation by 245.
In case you aren’t a fan of the above, I also calculated the impact based on raw averages. It doesn’t reveal much. Just goes to show how important the context of an innings is: raw averages are just too simplistic.
Michael Carberry’s recent interview in Wisden is linked here.
Wind the clock back. The good old days. Specifically the noughties (or 2000s, or whatever). An opening batsman fulfilled the same role in Tests or ODIs. Hence their ODI and Test averages were similar, and you could use one to predict the other with a fair degree of confidence.
The correlation is so good that the names get all jumbled up on the straight line running from (20,20) to (50,50). Yes, there’s some Test specialists there (Cook, Strauss) but most of the 23 players that meet the criteria for inclusion behave as expected.
That correlation has broken down now.
There are three distinct types of player, reflected in the clustering in the chart:
Versatile elite batsmen (Warner, Iqbal) – just as good in either format, average over 40 in both.
Test specialists (Latham, Azhar Ali) – who are/were good enough to play in ODI Cricket, but averaged at least five lower in ODIs
ODI specialists (Hales, Guptill) – averaging under 30 in Tests.
I’m reminded of the film Titanic (1997) explaining the captain’s complacency: “26 years of experience working against him”. That line stuck with me – it’s easy to assume past trends will continue, and that you can use opening the batting in ODIs as a pathway into opening in Tests.
Not any more. Unless the player is good. And I mean really good, the best predictor I can see for successfully opening the batting in Tests is successfully opening the batting in red ball Cricket. Think about Jason Roy – ODI Average 43 as an opener, Test Average 19. I don’t think anyone is now expecting him to average 35 in Tests as an opener. Yet someone must have thought he could, else he wouldn’t have been picked.
redballdata.com – closing the stable door after the horse has bolted!
PS. This piece serves as another reminder to me to continually check that the trends I’ve seen still hold – else one day I could be the mug taking Fig.1 to a meeting, persuading everyone to pick the best ODI openers to open the batting in Tests.
In this piece I’ll look at which grounds are best for red ball batting, and use that to see what impact that has on averages: how much of a boost do Surrey’s batsmen get from playing at the Oval?
Beyond it being a spot of trivia, I can immediately see two reasons why this matters.
i. High scoring grounds harm the county’s league position
In County Cricket there are 16 points for a win, 5 for a draw and none for losing. A win and a loss is worth 16 points, while two draws is worth 10. Drawing is bad*.
And yet there are teams producing high scoring pitches, boosting the chances of a draw, and reducing their chances of picking up 16 points.
Compare Gloucestershire’s two home grounds since 2017: at Bristol (32 Runs per Wicket), W2 L4 D8. Cheltenham (28 Runs per Wicket), W4 L1 D2. Excluding bonus points, Cheltenham is worth an extra 5.4 points per match. While that’s an extreme example, and the festival only takes place in the summer months, there’s still the question “why make Bristol so good for batting”?
Maybe a deeper look at the data will reveal why Gloucestershire and Surrey don’t try to inject a bit more venom into Bristol and The Oval; for now it looks like an error.
*There’s an exception: a team that is targeting survival in Division 1 might choose to prepare a flat track and harvest batting points plus drawn match points in certain situations. For the other 15 counties, drawing is still bad.
ii. Averages should be adjusted to reflect where people play their Cricket.
When using data to rank county batsmen and bowlers, the one gap that I couldn’t quantify was the impact of how batting or bowling friendly each player’s home county is. With this data we can add an extra level of precision to each player’s ratings.
How would we do that? It would be wrong to simply take the difficulty of a player’s home ground as the adjustment – because there are also away games. The logical approach would be to take the average of that player’s home grounds (50%, weighted by the various home grounds that county uses) and the other teams in that division (50% weighting).
For instance, Olly Pope’s average is artificially inflated by 10% from being based at The Oval. That takes his rating (expected Division 1 average) down to 54.6 from the suspiciously strong 60.7.
Equally, Tom Abell clambers up the ranks of 2019’s County batsmen: his rating jumps 7.1% to 35.6 from 33.2. Not an extreme move, but a nice boost to go from 50th to 31st on the list.
This takes us one step closer to a ratings system that captures everything quantifiable. Before next season I’ll adjust the ratings of batsmen and bowlers to reflect this factor.
This is my first attempt at something difficult: finding the best players that aren’t regularly playing County Cricket, but that are good enough to do so. In theory there shouldn’t be very many players like this – because counties will know who their best players are.
I’ve used my database of bowling performances from 2016-19 in County Championship and 2nd XI Championship Cricket and picked out six that have promising data.
Time will tell how many of these players get regular first team cricket (and succeed) in 2020.
I’ve looked at players that have been selected for no more than three County Championship matches in 2019, for reasons other than injury.
Note that England’s Matt Parkinson only played four games for Lancashire in the 2019 County Championship, so might have made it onto a list like this, but he is unlikely to be under anyone’s radar now he’s in the Test squad.