There is, as always, a lot of speculation from political commentators about what will be in this year’s Federal Budget .
Will the pension indexation be changed? Will there be a tax on multi-national profit shifting? Or will there be a tax on bank deposits?
But from the perspective of a data analyst, there is a more efficient way of predicting the contents of this year’s budget. We can simply look for patterns in previous budget speeches and make our forecast based on the probability of them happening again.
Helpfully, the last 19 years of budget second readings are available online. The most straightforward way to look at their contents is to look at the most frequently used words across that whole set of budgets read between 1996-1997 and 2014-2015.
Amongst the most frequently used words are, naturally, “Australia” and “Australian”, which is not particularly surprising. This is, after all, the Australian federal budget.
“New” also makes it into the top of the frequency list. This tells us something about what the budget is typically used for: the unveiling of new policy and funding objectives. So, despite claims that this year’s budget will be “dull and boring”, treasurer Joe Hockey will be breaking with a strong tradition if he announces that “next year, we’ll just keep doing the same thing as last year”.
The focus on Australia and on novelty in the federal budget is in keeping with the patterns detected in successful Australian Research Council Grants.
The frequency with which individual words are used is interesting, but it misses some of the context regarding how the words are used. To gain some insight into this we can use an application of probability theory knows as bigrams. These are pairs of words that frequently occur in a set order.
Bigrams are one of the ways predictive text works on your phone. If you frequently text your romantic partner then your phone will learn that the word “love” is most likely followed by the word “you”. But if you more frequently text your ice cream vendor, then you phone will learn that the word “love” is most likely to be followed by the word “chocolate”.
Using the same approach to the budget second readings we can see that if the treasurer says “economic”, the most probable next word is “growth”. And if the treasurer says “aged”, the most probable next word is “care”.
It’s telling that at the top of the list is the bigram “Mr. Speaker”. This is because, like all speeches in the House of Representatives, the budget second reading is addressed to the speaker of the house. And, even though the 2013-14 and 2014-15 budgets were addressed to female speakers, the previous fourteen years of male honorifics made this bigram top the list.
But if we want to predict the contents of this years budget, what we really need to do is see how the bigrams that have been used have changed over time.
Looking through the bigrams, two really caught my eye. The phrases “million over” and “billion over” were both used frequently. Looking through the text of the speeches, these phrases are used in relation to the proposed spend on an initiative or the forecast deficit.
To divine if we are planning to spend more billions than millions, I looked at the trends over time.
And, indeed, we do seem to be spending more than ever. In the last budget delivered by the former Labor Government, the phrase “billion over” was mentioned four times, while “million over” wasn’t mentioned at all.
Let’s turn to the substance of the budget: the policy areas that are funded. In the figure below the rise and fall of four policy areas are charted by the number of mentions they warranted in the last nineteen budgets.
There are some interesting patterns, not just in changes over time, but in the different areas addressed by Liberal and Labor budgets. Firstly, it’s clear that Wayne Swan was the only treasurer to talk up climate change.
Interestingly, Swan also talked plenty about tax cuts. In the first Swan budget these cuts were the result of the Labor Party’s decision to match the coalition’s announcement of tax cuts during the election campaign.
Health care has seen its ups and downs as an area of policy focus across the last nineteen years, but was mentioned more times in Labor budgets. Small business has been talked about by treasurers from both sides of politics. This is consistent with the political importance of small business to both sides politics.
Finally, no analysis of the bigrams used in budget speeches would be complete without a look back at one of the most infamous pairs of words used by a federal treasurer.
In 2008, Wayne Swan seemed to fixate on the term “working families”. Despite no clear definition of what it meant, he used it repeatedly. This reached a high point at the 2008-’09 budget speech, in which he used the phrase 16 times.
Interestingly, despite ridicule by multiple commentators, the phrase wasn’t completely dropped, but had small echoes in 2010-’11 and 2012-’13. This can be compared to the less jargon-laden “income earners” which has been used be treasurers from both sides of politics over the years.
So, lets make our predictions. These data suggest that this year’s budget will include much that is “new”. It will talk mostly about “Australia”, and it will definitely not mention “working families”. Beyond that, it’s up to the pundits to pick what will likely be announced.