Data scraping

So much data exists freely available on the net that we can now answer interesting questions by interrogating these data rather than recruiting participants and asking them. The advantages are numerous and several academics I respect including Prof. Dorothy Bishop have extolled the virtues of using data scraped from the internet.

So, I thought I’d give it a go. There is a great MATLAB function available from the FileExchange which downloads the content of URL tables from the internet. I pointed it at two great sites which house the results of more than a hundred years of sports matches played in Australia in Australian Rules Football and Rugby League.

I’ve had a suspicion for a few years that the home ground advantage in Australian sports has been declining. I imagine that part of what makes up a home ground advantage is the benefit of a crowd which is enthusiastically partial to the home team. However, in the 21st century, I suspect that there is more movement of people around the country so fans are often more likely to attend away games. Another part of the home ground advantage is, I suspect, the familiarity the home team has with the ground. But there are now several ‘shared’ grounds in use, particularly in the AFL.

So, I wrote this script to get the data, calculate the proportion of home side wins per season and graph it… (click on the figure to enlarge).


As you can see, it doesn’t look like there is a clear trend in either code over time. So, my suspicions are not confirmed. There is of course much more that could be done to analyse these data (points difference, attendance, the role of specific teams, etc).

But, it was fun today just to start scraping data from the internet. My next target will be a little less frivolous…


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s