James Anderson and the timescales of chance

Cricviz put out a tweet recently showing how James Anderson’s Expected Average has steadily improved over his career. That consistency is in contrast to volatility in his actual average. In this piece I’ll explore why Expected Average may be more reliable than actual averages.

Background

It’s worth recapping what Expected Average is. Cricviz use ball tracking to build a database of many deliveries, so know for a given ball (perhaps an 82mph full away-swinger) how many runs would on average be scored, and the likelihood of taking a wicket. Expected Runs divided by Expected Wickets is Expected Average*.

Why is Expected Average compelling? It captures that which is under the bowler’s control. The trajectory of a ball, its seam position, pace and spin are controllables. Everything else is outside the bowler’s sphere of influence (who are you bowling to? Do they edge it to first slip, edge it for four, or miss it completely?) Expected Average always rewards good bowling, unlike the fickle real world.

Expected Average beats actual average

Returning to Anderson: looking at his bowling average year-by-year you’d say he blows hot and cold. Seven years averaging under 24, but five years averaging over 30. Contrast that with his (very stable) Expected Average. Using xA the volatility disappears – it tells a different story: Anderson has been a consistent bowler, improving steadily to become the player he is today. I find that easier to believe.

The two metrics (actual average and Expected Average) can be reconciled by assuming that actual average is a function of Expected Average and luck. Using my formula for error bars [1 standard deviation = xA * xW(-0.5)], we get the following chart, which shows that Anderson’s ups and downs might be quite normal (ie. two thirds of the time the blue line is between the grey lines).

How can we get past the impact of luck? Look back at the formula. Uncertainty scales with the inverse square root of wickets. More matches, more wickets, less of a look in for Lady Luck. Let’s consider a rolling four year horizon, where luck is corralled into a +/-2 impact on average (below). Average and Expected Average nicely aligned. Expected Average bypasses the need for luck to iron itself out over time: it’s a better metric. Everything wrapped up in a neat little package.

Limitations of Expected Average

So far, so predictable. Now the fun part. Here’s a hypothesis for you: Expected Average is incomplete as it misses the impact of “setting a batsman up”.

A googly pitching on the fourth stump is a fine ball. But isn’t it better after a sharply turning leg break that beat the outside edge? Or an inswinger after four well targeted outswingers?

Did you notice how Anderson has outperformed his Expected Average on a rolling four year basis recently? What if that’s not noise, but rather a master craftsmen conditioning a batsman to play the last ball rather than the next one? And knowing from experience just what to bowl? That would manifest itself as Anderson taking wickets with balls that are better than they appear in isolation on a highlights package.

There could be a number of other causes (attacking fields / batsmen playing him on reputation), and it’s a bit rich me trying to throw shade on Cricviz’s metrics by mining the jeepers out of the data in one tweet. Still, food for thought. I’d give you something concrete, but that will take much more hoovering up the trail of breadcrumbs Cricviz leave through their twitter account and blogs.

The future of Expected Average

Cricviz will take their time building trust in Expected Averages. They have a tough concept to sell if they want xW xA xR to enter the lexicon: most of what we see (and debate) is noise. Signal takes years. People won’t want to hear that because it’s boring, counter-intuitive and at odds with standard narratives.

Expect them to make the case for luck’s impact slowly. They’ve done the right thing having strong communicators on board. Here’s an example, suggesting that Anderson’s ups and downs this summer were just luck. That’s something we can believe: a run of fortune ruining a week. The harder message to land will be impact of luck on a series or even a year. Discretion being the better part of valour, Cricviz didn’t actually state Anderson’s xA in the piece, just that it was better than his actual average. They waited until the numbers aligned before quoting xA.

The data revolution will be televised. It just might take a while to convince everyone it happened.

*I can’t be sure this is their exact definition, but it’s close enough.

England Batsmen at Lord’s

There are suspicions afoot that England have an ODI weakness at the home of Cricket.

CricViz’s analysis is here. In a nutshell, England struggle when the ball does a bit. Lord’s is a prime example of that, hence England have lost two of their last five games there and are vulnerable. It’s a neat piece of work.

And yet… Cricket is an individual sport masquerading as a team one. “England” as a batting lineup is a myth. In this piece I’ll explore the expected top seven for the game on 25th June 2019 and their track record in white ball cricket at Lord’s.

Firstly, ODI records.

Fig 1- ODI Records of selected England players at Lord’s

We can eliminate Bairstow, Root and Morgan from our enquiries. They have done well. Also, it’s Morgan’s home ground – surely he is familiar enough with conditions to not be at a disadvantage?

Note how Roy and Hales have been something of a flop at Lord’s. They aren’t playing tomorrow so we can put them to one side. That leaves Vince, Stokes, Buttler, Ali & Woakes under the spotlight. None of them have played a T20I at Lord’s but we can look at their Test Match record.

Fig 2: Test Records

Stokes has a decent red ball record at Lord’s. Not the same discipline, will let you make your own mind up.

List A records – note the very small sample size. Because Stokes, Buttler, Ali & Woakes all play in the North group, they rarely get the chance to play at Lord’s. Can’t read much into this.

How about the 20-20 record?

Oh. As far as I can tell none of Stokes / Buttler / Ali / Woakes have batted in a 20-20 at Lord’s. Vince has, and it hasn’t gone well.

What can we conclude? Firstly, county players generally stick to their half of the country when it comes to white ball Cricket, and many will only have strapped on their coloured pads in a minority of England’s grounds. Secondly, the jury is still out on Stokes / Buttler / Ali at Lord’s. More data please! Finally, over six white ball innings and four Test innings Vince has 151 runs at 15.1 – that’s not good.

Using CricViz False Shot % as an alternative to Averages

CricViz now use False Shot Percentages as a metric for assessing batsmen. Most recently they have done this as one factor when considering Australia’s options for the Sri Lanka tour.

A key point is that False Shots and averages are not equivalents – if a two batsman both have a 10% False Shot rate, the more attacking batsman will average more because they will score more runs for each error they make. One has to combine False Shot Rate and Strike Rate to get a useful metric.

As such, I’ve used the data CricViz published, and overlaid that with First Class Strike Rates to give an expected average derived from False Shot %

The chart shows that Maxwell leads the options (due to his Strike Rate of >70 runs per hundred balls, combined with a healthy 10.4% False Shot rate. This is interesting because his 3 year Sheffield Shield average was only 43. Worth bearing in mind he isn’t a Red Ball regular, with only 962 runs in the last 3 years.

Handscomb (real world average 50, False Shot average 57) can feel hard-done-by to have missed out on selection. He averages 38 in Tests, it looks an odd choice.

There is evidence that Pucovski is as good as the hype – CricViz’s data suggesting that not only has he performed well (FC Average 49 after 8 games), but that it isn’t a fluke (v.low False Shots implying he may have been unlucky to average only 49 in those 8 matches). Still, it’s a small sample size.

Conclusions: False Shots combined with Strike Rate are a potentially useful tool in predicting player averages when limited data is available (such as young players). However, more evidence is required of long term correlations before False Shot % and Strike Rate replaces averages.