James Anderson – Red Ball Data

Cricviz put out a tweet recently showing how James Anderson’s Expected Average has steadily improved over his career. That consistency is in contrast to volatility in his actual average. In this piece I’ll explore why Expected Average may be more reliable than actual averages.

Background

It’s worth recapping what Expected Average is. Cricviz use ball tracking to build a database of many deliveries, so know for a given ball (perhaps an 82mph full away-swinger) how many runs would on average be scored, and the likelihood of taking a wicket. Expected Runs divided by Expected Wickets is Expected Average*.

Why is Expected Average compelling? It captures that which is under the bowler’s control. The trajectory of a ball, its seam position, pace and spin are controllables. Everything else is outside the bowler’s sphere of influence (who are you bowling to? Do they edge it to first slip, edge it for four, or miss it completely?) Expected Average always rewards good bowling, unlike the fickle real world.

Expected Average beats actual average

Returning to Anderson: looking at his bowling average year-by-year you’d say he blows hot and cold. Seven years averaging under 24, but five years averaging over 30. Contrast that with his (very stable) Expected Average. Using xA the volatility disappears – it tells a different story: Anderson has been a consistent bowler, improving steadily to become the player he is today. I find that easier to believe.

The two metrics (actual average and Expected Average) can be reconciled by assuming that actual average is a function of Expected Average and luck. Using my formula for error bars [1 standard deviation = xA * xW(-0.5)], we get the following chart, which shows that Anderson’s ups and downs might be quite normal (ie. two thirds of the time the blue line is between the grey lines).

How can we get past the impact of luck? Look back at the formula. Uncertainty scales with the inverse square root of wickets. More matches, more wickets, less of a look in for Lady Luck. Let’s consider a rolling four year horizon, where luck is corralled into a +/-2 impact on average (below). Average and Expected Average nicely aligned. Expected Average bypasses the need for luck to iron itself out over time: it’s a better metric. Everything wrapped up in a neat little package.

Limitations of Expected Average

So far, so predictable. Now the fun part. Here’s a hypothesis for you: Expected Average is incomplete as it misses the impact of “setting a batsman up”.

A googly pitching on the fourth stump is a fine ball. But isn’t it better after a sharply turning leg break that beat the outside edge? Or an inswinger after four well targeted outswingers?

Did you notice how Anderson has outperformed his Expected Average on a rolling four year basis recently? What if that’s not noise, but rather a master craftsmen conditioning a batsman to play the last ball rather than the next one? And knowing from experience just what to bowl? That would manifest itself as Anderson taking wickets with balls that are better than they appear in isolation on a highlights package.

There could be a number of other causes (attacking fields / batsmen playing him on reputation), and it’s a bit rich me trying to throw shade on Cricviz’s metrics by mining the jeepers out of the data in one tweet. Still, food for thought. I’d give you something concrete, but that will take much more hoovering up the trail of breadcrumbs Cricviz leave through their twitter account and blogs.

The future of Expected Average

Cricviz will take their time building trust in Expected Averages. They have a tough concept to sell if they want xW xA xR to enter the lexicon: most of what we see (and debate) is noise. Signal takes years. People won’t want to hear that because it’s boring, counter-intuitive and at odds with standard narratives.

Expect them to make the case for luck’s impact slowly. They’ve done the right thing having strong communicators on board. Here’s an example, suggesting that Anderson’s ups and downs this summer were just luck. That’s something we can believe: a run of fortune ruining a week. The harder message to land will be impact of luck on a series or even a year. Discretion being the better part of valour, Cricviz didn’t actually state Anderson’s xA in the piece, just that it was better than his actual average. They waited until the numbers aligned before quoting xA.

The data revolution will be televised. It just might take a while to convince everyone it happened.

*I can’t be sure this is their exact definition, but it’s close enough.

When I was at university there was a rumour that one of the Geology professors was about to predict a massive earthquake in South America. This would have been a career limiting move if nothing happened.

In the end neither the bold prediction or the earthquake materialised.

I thought of that professor’s reputational gamble when I had the idea of asking whether Chris Woakes might be preferred to James Anderson for the Fourth Ashes Test. To misquote Nasser Hussain, “No Ed Bayliss, you cannot do that.”

The scenario

If you are reading this years from now, Sir James Anderson is currently England’s best bowler, though he doesn’t bat very well. Woakes is a decent batsman, and almost good enough to get into the England team as a bowler. Woakes shores up a mediocre top seven and gives the team balance, especially as Jack Leach is a non-batting spinner. Anderson pulled up during the first Test with a calf injury. He missed the next two Tests and has been added to the squad for the fourth. The series is level 1-1 with two to play. Current speculation is that Woakes might make way for Anderson.

When weighing the merit of the two players, I’ll look at two factors: England and Australia’s expected runs. To do this, I’ll run my model using each player’s career record as the input* and see how the different teams fare.

Batting

If Woakes were dropped, England would have Broad, Leach and Anderson as a long tail. That means a higher probability that a good batsmen gets left stranded and not out. The following table shows the impact on expected runs over the course of a match of replacing Woakes with Anderson and rejigging the batting order:

England would expect to score 29 runs fewer per match with Anderson rather than Woakes.

Interesting that Broad batting at ten outscores Leach in that position by so much – I think it’s because the likely partnerships with Leach at ten (9th wicket: Broad-Leach, 10th wicket: Leach-Anderson) won’t last long.

Bowling

From a bowling perspective, Anderson has an average that’s four runs per wicket better than Woakes. Their strike rates are similar (Anderson 56, Woakes 59). It’s likely this gap is narrower in English conditions (both average 23 at home), but let’s use the raw data rather than run the risk of flattering Woakes.

Note that England have a solid fifth bowler in Ben Stokes, (unlike some teams that would need to use a part-timer if they are bowling all day).

Running this through the model, adjusting for home advantage and Austalia’s brittle batting order, the benefit of Anderson’s bowling over Woakes is 13 runs per match. Not enough to offset the weaker batting.

That seems a little low to me, four wickets per match at four extra runs per wicket would be 16 runs – I think it ends up lower because Australia are away from home and aren’t that strong at batting.

Conclusions

Bringing Anderson into the team for Woakes would be a mistake. Maybe there’s a case for such a change in a must-win match (as the odds of a draw are reduced), but the model does not support such a change for the fourth Test.

It’s important to put this analysis into context. I’m not saying that all specialist bowlers should be replaced by all-rounders. Nor am I saying that Anderson shouldn’t be in the team because he can’t bat.

The head-to-head between Woakes and Anderson is considered in this specific scenario where England have a high quality fifth bowler (Test average 32), but two weak batsmen in Broad and Leach.

James Anderson is England’s best bowler. If fit he should play. If Anderson is fit one needs to reframe the question: you can pick two of Woakes, Broad and Archer. Just make sure one of them is Woakes. Whatever you do, don’t bring in Anderson for Woakes.

*This might be slightly contentious. Any debate on this topic (though the participant may not realise it) will boil down to whether they believe that career record is the right input to use. For example, I’m not making an adjustment for Woakes’ unusually strong home record, nor am I adjusting to reflect more recent performances (which would boost Anderson’s bowling). Nor am I adjusting because Woakes hasn’t scored many runs this series.

Tag: James Anderson

James Anderson and the timescales of chance