Introduction to Psychology – Selection Bias

Last weekend I helped out at Flinders University’s Open Days. One of the comments that I heard a lot was ‘I want to study psychology because I’m really interested in people.’ And that’s a great start. But its also worth thinking about why the study of psychology is different to a general interest in people and their behaviour. That difference is one thing: science.

Psychologists work hard to stay scientific when they study human behaviour. This helps avoid the biases that can easily lead us to false conclusions. So, here is a brief lecture on one of those biases and how staying scientific can help you avoid it.


The statistics, they are a changin’

So there is little doubt that many fields of science, especially psychology, could benefit from a change in research practices. There is ample evidence of p-hacking and optional stopping in the scientific literature. And many practising scientists have called publicly for a change in the culture of science.

But the retracted papers keep coming. And, even if this doesn’t actually mean that science is broken, it certainly means that the community of scientists hasn’t embraced the utopian new world of robust statistics yet.

So, recently I’ve started thinking that this might not be that surprising. After all, if you really want people to come together and dream of a better future then you probably have to use more than reason and logic.

Luckily, there is a well developed tool for getting people to join hands and walk boldly toward a brighter day. That tool is the guitar, and I used it with a class in Advanced Research Methods at the University of South Australia.

Enjoy, be good to each other, love your data.


First speeches in the Australian Parliament

One of the features of Australian politics which makes it different to American politics is the greater levels of party unity our parliamentarians show. Here is Australia, it is big news when a member of the government states that he holds a personal and political position on a topic which is different to that of the Prime Minister. But in the United States it takes a Presidential candidate calling another member of his party ‘an idiot’ to make the news.

There is, however, one time when tradition in the Australian Parliament encourages politicians to speak their minds, to talk about their personal views and beliefs, their background and their journey to Parliament House. This is the first speech.

This is a courtesy extended to every new member and senator and allows them to speak without interruption about themselves and their plans as a parliamentarian.

So, I thought that an analysis of the words most frequently used by new members from the majour parties and from the cross benches might give some insight into the stories that motivate parliamentarians from different sides of the chambers.

I collected the first speeches of all new members and senators that were delivered in 2014 and 2015 and analysed the frequency of each word used.

Co Lab Other

The first thing that’s noticeable is the similarities. Members and senators of all persuasions talk a lot about ‘Australia’, ‘Australians’ and being ‘Australian’. Interestingly, the graphs of all parties show a much higher frequency of adjectives and adverbs than I’ve shown before in budget second readings and in ARC grants. Things are ‘great’ and there are ‘many’ ‘more’.

The differences are also really profound. Despite Labour receiving such a drubbing at the election at which these parliamentarians were elected, the most frequently used word by Labor speakers was their party name. Other core Labor issues were right behind: union and support.

Coalition parliamentarians were less obviously partisan but one word really stood out. These were the only members and senators who were frequently thankful.

The cross benches are characteristically complicated. There’s certainly no obvious theme which isn’t surprising considering the very mixed allegiances of the cross bench which was elected at the 2014 election.

What will (probably) feature in next week’s federal budget

There is, as always, a lot of speculation from political commentators about what will be in this year’s Federal Budget .

Will the pension indexation be changed? Will there be a tax on multi-national profit shifting? Or will there be a tax on bank deposits?

But from the perspective of a data analyst, there is a more efficient way of predicting the contents of this year’s budget. We can simply look for patterns in previous budget speeches and make our forecast based on the probability of them happening again.

Helpfully, the last 19 years of budget second readings are available online. The most straightforward way to look at their contents is to look at the most frequently used words across that whole set of budgets read between 1996-1997 and 2014-2015.


Amongst the most frequently used words are, naturally, “Australia” and “Australian”, which is not particularly surprising. This is, after all, the Australian federal budget.

“New” also makes it into the top of the frequency list. This tells us something about what the budget is typically used for: the unveiling of new policy and funding objectives. So, despite claims that this year’s budget will be “dull and boring”, treasurer Joe Hockey will be breaking with a strong tradition if he announces that “next year, we’ll just keep doing the same thing as last year”.

The focus on Australia and on novelty in the federal budget is in keeping with the patterns detected in successful Australian Research Council Grants.

The frequency with which individual words are used is interesting, but it misses some of the context regarding how the words are used. To gain some insight into this we can use an application of probability theory knows as bigrams. These are pairs of words that frequently occur in a set order.

Bigrams are one of the ways predictive text works on your phone. If you frequently text your romantic partner then your phone will learn that the word “love” is most likely followed by the word “you”. But if you more frequently text your ice cream vendor, then you phone will learn that the word “love” is most likely to be followed by the word “chocolate”.


Using the same approach to the budget second readings we can see that if the treasurer says “economic”, the most probable next word is “growth”. And if the treasurer says “aged”, the most probable next word is “care”.

It’s telling that at the top of the list is the bigram “Mr. Speaker”. This is because, like all speeches in the House of Representatives, the budget second reading is addressed to the speaker of the house. And, even though the 2013-14 and 2014-15 budgets were addressed to female speakers, the previous fourteen years of male honorifics made this bigram top the list.

But if we want to predict the contents of this years budget, what we really need to do is see how the bigrams that have been used have changed over time.

Looking through the bigrams, two really caught my eye. The phrases “million over” and “billion over” were both used frequently. Looking through the text of the speeches, these phrases are used in relation to the proposed spend on an initiative or the forecast deficit.

To divine if we are planning to spend more billions than millions, I looked at the trends over time.


And, indeed, we do seem to be spending more than ever. In the last budget delivered by the former Labor Government, the phrase “billion over” was mentioned four times, while “million over” wasn’t mentioned at all.

Let’s turn to the substance of the budget: the policy areas that are funded. In the figure below the rise and fall of four policy areas are charted by the number of mentions they warranted in the last nineteen budgets.

There are some interesting patterns, not just in changes over time, but in the different areas addressed by Liberal and Labor budgets. Firstly, it’s clear that Wayne Swan was the only treasurer to talk up climate change.

Interestingly, Swan also talked plenty about tax cuts. In the first Swan budget these cuts were the result of the Labor Party’s decision to match the coalition’s announcement of tax cuts during the election campaign.


Health care has seen its ups and downs as an area of policy focus across the last nineteen years, but was mentioned more times in Labor budgets. Small business has been talked about by treasurers from both sides of politics. This is consistent with the political importance of small business to both sides politics.

Finally, no analysis of the bigrams used in budget speeches would be complete without a look back at one of the most infamous pairs of words used by a federal treasurer.

In 2008, Wayne Swan seemed to fixate on the term “working families”. Despite no clear definition of what it meant, he used it repeatedly. This reached a high point at the 2008-’09 budget speech, in which he used the phrase 16 times.

Interestingly, despite ridicule by multiple commentators, the phrase wasn’t completely dropped, but had small echoes in 2010-’11 and 2012-’13. This can be compared to the less jargon-laden “income earners” which has been used be treasurers from both sides of politics over the years.


So, lets make our predictions. These data suggest that this year’s budget will include much that is “new”. It will talk mostly about “Australia”, and it will definitely not mention “working families”. Beyond that, it’s up to the pundits to pick what will likely be announced.

When is the best time to ask a question of the MATLAB user community?

As I have mentioned before, I find it a little embarrassing but I still use MATLAB for most of my programming. One of the reasons I haven’t moved on to nice new open source languages is that so many of my peers still use MATLAB. So, if I have a question about how to do something, its easy to search the internet and find an answer. And, in the very unusual case that my question hasn’t been asked before, I can pose the question to the large and generally helpful MATLAB user community.

But, as a researcher in the southern hemisphere I started to detect an interesting pattern in the number of responses to questions at different times of the year. To test if my suspicions were true I looked through 12113 users from MATLAB’s community to see if their contributions were more likely to occur at different times of the year.


The results are clear. Users are more likely to contribute in January, February and March than other times of the year. My suspicion is that this is because most users are in the northern hemisphere and these are the months when the weather is worst. So, there’s nothing much to do with your time but contribute to the MATLAB user community.

So, the moral of the story is: if you want you question answered, ask it when it starts to snow.

Do early career centuries make for a better batsman?

As a practicing scientist early in my career I’m often told that to ensure a long and prosperous career, I need to win a big grant soon because the modern world of science doesn’t have the time to wait for me to grow on the job. So, I have some sympathy for Australia’s most scrutinized cricketer – Shane Watson. Several commentators have made the argument that while Watson is clearly a talented player, he never made enough big scores early enough in his career. In particular, attention has been drawn to the fact that he has only made four centuries in test matches.

But is a big score early in a career really that important for a long and prosperous career as a batsman? Anecdotally we might think of cases where batsmen have impressed us with a brilliant debut. I was at the Adelaide Oval in 1991 and saw Mark Waugh’s century on debut against England. Its a great thing for a sports fan to see someone play with what can only be called great talent. It makes us think that if this is what he can do now, he’ll surely go on to do even greater things with time.

Luckily for us, we can check if this is in fact true. Data on cricketers is readily available from sites such as I collected the data on the top 60 batsmen in cricket history by career average and looked to see how long they played for before making a century. These are the results…(click the image to open a zoomable version)


If scoring an early century was indeed predictive of a great career with a high average then the data would line up on a negatively sloped line from the top left to the bottom right. But this is clearly not the case. Of those 60 batsmen with the highest career averages, went 20 or more innings before making it to three figures. This includes current wonder boy Steve Smith.

Most interesting though is Mark’s broth Steve Waugh, who eventually made a century in his 120th innings with a 177 not out, also against England, but at Headingley in Leeds. I grew up admiring both the Waugh brother’s and that great day at Adelaide Oval was wonderful to watch. But there is no doubting the numbers: Steve went on to play 260 innings at an average of 51 to Mark’s 209 innings at an average of 42.

Still, I thought it might be worth investigating this a little further. What if things have changed in recent years. Is it still possible to do a Steve Waugh?


In the above graph, career average is represented by the size of the circle. So, if anything, there are more great players scoring maiden centuries late in their now than ever. That said, there are more players scoring more early career centuries now as well. So, the data really doesn’t suggest any association between big early scores and high career averages at any time in history. And this is true of all test playing nations.

So, I still have sympathy for Watson. He gives me hope that if I keep working away, I’ll score it big eventually.

My summer holiday report

I got a great Christmas present this year: two books by Nathan Yau from Flowing Data. They include some great suggestions for thinking about the design of figures, inventive ways to analyse data and some nice new sources of data as well. One delightful idea was a neat way to view the overall story from a large number of photos in a single graph. Nathan uses the example of his wedding photos. By extracting the most common colour from each photo and then arranging them by the time they were taken you get a lovely impression of what the focus of the celebration was at each moment.

After Christmas my family and I went to the beach for a week and I took a bunch of pictures. So, I thought I could apply the skills I learnt from Nathan’s book to these pictures. It was a fun process and I think the result is not only beautiful but tells the story of our summer holiday really well.

The Process

1) The first step was choosing the most common colour in each photo. The first time I tried this I came up with a week’s worth of white photos. Then I realised that white (or possibly black) would always win because there is very little chance of any other colour being well represented among the 16,777,216 potential colours that are recorded in an RGB image. So, I reduced the number of colours used in each photo using a process called colour quantization. With a bit of trial and error, mapping each photo with 16 colours removed most cases in which white was the most common colour but still gave some nice variation between the photos.

2) Below the graph of photograph colours I’ve included some data showing the weather for each day. I got this from

The Result

Summer holidayV6-01

As you can see – the photographs are dominated by the colours of the beach: sandy brown, blue sky and green sea. It was definitely a summer beach holiday. There are several parts of the story from the week that are nicely shown:

1) I didn’t take the most photographs on the days with the best weather. In fact, Wednesday had clearly the best weather (highest temperature and lowest humidity) and I left the camera in the beach house. We were having too much fun to take the time to take pictures.

2) A cold front came through on the Thursday (you can see the spike in humidity and the decrease in temperature) but we still played at the beach. Interestingly, the front seems to have changed the colour of the water, from blue/grey to green.

It was a great summer holiday and we had a lot of fun. Thanks to Nathan Yau’s books for suggesting a great way to remember the fun.