AFL at the centre of the universe

With six different NFL clubs now eager to sign Jarryd Hayne I thought it would be interesting to put this latest code conversion in a historical and global context. Using this list of players who have played more than one code of football I generated this graph. The size of the nodes reflects the number of links too and from each code (these are known as edges in graph theory). The more red the colour of the node the more closely it is linked to AFL alone. Interestingly, Rugby Union and AFL have exactly the same number of edges. However, the AFL is the only code with edges linking to every other node which places it squarely at the centre of the international football universe. That is, more players have come to the AFL from other codes, or left the AFL for other codes than any other code.

One other feature of the graph is interesting to note, the two most interlinked codes are rugby union and league. As importantly, the exchange is fairly symmetrical. There is just a slightly stronger flow from union to league than from league to union. Compare this to the plight of gaelic football which is a serious net contributor to every code except the gridirons without getting much back at all!



The Chaser’s Media Circus vs Spicks and Specks (a statistical showdown)

This one is for fans of ABC television entertainment shows and statistics…

It’s a story about the number of ways we can arrange the elements which make up a set, which is called the factorial. But lets start at the beginning.

The new show on the ABC by the team from The Chaser is called Media Circus. In the show two teams compete in various games to answer questions about the news. One of the games is called That’s All We Have Time For. In it each team is given a set of four news items and a set of four time durations. The aim of the game is to match each news item with the time spent on it by the Australian media.

Likewise, the great music quiz show Spicks and Specs which ran on the ABC from 2005 to 2011 had a game called Sir Mix’n’Matchalot in which contestants were given a set of three musicians’ names and a set of three facts with the aim being to match the right musician to the right facts.

In both games, one point is given for each correct match.

The key thing to realise at this point is that, mathematically, these games use the same basic idea but change one important feature: the number of elements in the sets (three in Spicks and Specs and four in The Chaser’s Media Circus).

So what is the basic idea at the heart of both games? The basic idea is to arrange the set given to the contestant (news items or musicians’ names) in the order which matches the answers (time durations or facts).

Sometimes it is easier to see the statistics of a situation if we remove all the real world stuff and replace it with numbers. So lets invent a new game called The Sequence. I give you the numbers one to three and you have to arrange them in the order I am thinking of.

Obviously there is no way to solve this except through chance. So, what is the chance of solving it? If you take your time you can set up a table like the one below with all the possible orders of the set of numbers from one to three.

Possible answers to The Sequence

1 2 3
1 3 2
2 1 3
2 3 1
3 1 2
3 2 1

So there are six ways to arrange a set of three elements. This means that the chance of getting the right answer in The Sequence (or in Sir Mix’n’Matchalot) is 1 in 6.

And what if we changed The Sequence to have four elements like That’s All We Have Time For? For sets larger than three elements this can be time consuming to do by hand. So, we can use something called the factorial. The factorial of a set with n elements can be found by multiplying all the positive integers less than or equal to n. So, for the original Sequence game that would be 3*2*1 which is 6 ( which matches what we did by hand). For the new Sequence that would be 4*3*2*1 which is 120. Which means that the chance of getting the right answer in That’s All We Have Time For by chance alone is 1/120.

So, the first thing you can see if that it is harder to get maximum points in the game on The Chaser’s Media Circus than on Spicks and Specks. But that’s not why it’s a better game statistically.

Its not just that That’s All We Have Time For has more elements than Sir Mix’n’Matchalot, but rather, that That’s All We Have Time has an even number of elements.

This is about the way the games are scored. Remember, players are given one point for each correct answer.

To start thinking about this its useful to think about why we give different points to players in games when different feats are achieved. A goal in the AFL is given six points, a behind one point. In this case I think it is safe to conclude that we give more points to goals because they are harder feats to achieve.

Hence, it would make sense if, on Sicks and Specks and The Chaser’s Media Circus, one point was given for each correct answer to the game because it is harder to get more matches.


Actually, no. That’s not right.

Or rather, that is right for The Chaser’s Media Circus but not for Spicks and Specks.

Below are two graphs which show respectively the number of chances and the probability of getting each number of points available for games such as That’s All We Have Time For and Sir Mix’n’Matchalot with sets which range in size from 1 to 5.

Fig 1B


The first thing you might notice is that for each set, there is no way of getting n-1 points. So, in Sicks and Specks, no one ever scored 2 points and in The Chaser’s Media Circus no one will ever score 3 points. This is related to another topic, the degrees of freedom which we can talk about another time.

The bigger problem is with the probability of scoring 1 point. For the game to make any sense, it should be harder to score 1 point than 0 points. This is indeed the case for sets with four elements such as  in The Chaser’s Media Circus. In this case the probability of scoring 1 point is 33% and the probability of  scoring 0 points is 37.5%. But as you can see, for sets with three elements (such as in Spicks and Specks), the probability of scoring 1 point by chance alone is 50%, but the probability of scoring 0 points is actually less, at 33%. This is the equivalent of AFL players scoring 6 points for a goal, 1 point for a behind but 10 points for missing everything. The reward is greater for the easier task.

If you look at the other lines you might see a pattern, for sets with 1, 3 and 5 elements, it is harder to get one point than 0 points. For sets with 2 and 4 elements it is easier to score 1 point than no points at all. This is shown again for a larger range of set sizes in the figures below.

Fig 4A

To understand this it might help to remember what the points are gained for: getting one of the matches right. So, for the scoring system to make sense, there should be more ways of arranging the elements of the set so that none of them match the answer than ways in which one of them matches the answer.

And here is the difference between sets with odd and even numbers of elements: one of the ways we can arrange the elements is the reverse of the correct answer. For sets with an even number of elements, this arrangement scores no points because none of the elements match the answer. But as you can see in the table below, for sets with an odd number of elements, this arrangement gains one point because the middle element is correct.

Three member set

Answer 1 2 3
Reverse arrangement 3 2 1
Correct? No Yes No

Four member set

Answer 1 2 3 4
Reverse 4 3 2 1
Correct? No No No No

Hence, the new show by the team from The Chaser has a scoring system which more appropriately rewards players for achieving feats of greater difficulties. Unlike Spicks and Specs, which got things a bit back to front.

AFL Trades 2014

As of 2pm today, the trade period for the AFL has closed. It was a strange new world for sports fans in Australia. The trade period had its own hashtag, its own radio station and its own sponsor. But what actually happened. Well, after two weeks trading, this is the result. The width of the arrows represents the number of players traded, the direction of the arrows represents their movement.


Here are the finished trades as a static image.


AFL team connectivity

With the season over it can be hard for football fans to find something to talk about. Luckily there is the trade period. With trades almost finished I thought it would be interesting to see what this movement of players does to the connectivity of the league. If we assign one link weight to each pair of clubs every time the same player plays a game for both clubs then this is the graph we get…


It makes a lot of sense. The older clubs are the most heavily interconnected. The clubs which joined the original wave of expansion, Adelaide and West Coast are well connected too with Port Adelaide and Fremantle close behind. GWS and the Suns are still not well connected, but they will be soon. The really interesting cases are Brisbane (Bears) and Brisbane (Lions) which are well connected even though they didn’t exist at the same time! The interesting case of Melbourne University is also there. The club only existed in the VFL between 1908 and 1914.

The take home message is that we are all connected!

NRL Grand Final Twitter Data

Its over for another year…with the NRL Grand Final over I’ll have to think of new data to analyse until the 2015 season starts. The game on Sunday did provide some interesting twitter data. As I’ve found before, the twitter community surrounding #NRL is far more interconnected than the corresponding community for #AFL. And, once again we see the impact of the major media outlets. I’m still a bit surprised that the twitter communities don’t separate out by club loyalty but here is more evidence that sport unites us. Here is the network map for twitter activity for the #NRL during the NRL Grand Final.


And here is the word frequency map. I like the impact that ’43’ had in the tweets. It was certainly a long time between drinks!


The Finals

Over the next two weeks I’ll be looking at the twitter communities that emerge around the AFL and NRL Grand Finals. The first parse at this is below…a map of the tweets to the hashtag #AFL between 6am and midnight on the day of the AFL Grand Final. I collected around 1500 tweets, 376 of which were retweeted on the day. The map shows a much higher level of network completion than the map generated by the tweets from round 10 (see below). I had thought that the communities detected by the modularity algorithm might split the network into two based on their support for the Hawkes or the Swans. But that clearly hasn’t happened. It seems that sport really is something that unites Australians and we will retweet interesting sports opinions and photos regardless of the team they are associated with.


And here is the word frequency graph. It looks like #gohawks outstripped #goswans considerably. I’m a little disappointed that ‘kiss’ didn’t make it into the most frequently used words!

The use of the #everymoment hashtag by Hawks supporters is particularly interesting. Use of the hashtag has been promoted during the season by the club itself. There is a bit of background and a collection of tweets on the club website. Its likely that in addition to encouraging fans to tweet about their experience, the use of the hashtag also helps distinguish tweets in support of the Hawkes in the AFL from other sports teams called the Hawkes, such as the Seatle Seahawkes (in the NFL) and the Chicago Blackhawkes (in the NHL).


And here is the breakdown for tweets before the game, during each quarter and after the game. Interestingly, the hashtags #gohawkes and #goswans were even in frequency before the game started. But from the first quarter on, #goswans really dropped off the map…







The time dimension in twitter communities

It was pointed out to me in a few comments that the analysis of twitter communities for AFL and NRL followers graphed below miss an important dimension: time. So, here are the same graphs but animated to show the development of the retweets between account in time.

AFL Twitter Community


NRL Twitter Community


A few things are clear from the addition of time as a dimension in these graphs:

1) There are clearly events (probably actual game moments) which trigger multiple tweets at the same time. The moments are interspersed with long periods in which nothing seems to happen.

2) Retweets follow the original tweet very quickly. This makes sense in that twitter is a medium for communicating the present moment.