# Jeromy Anglim's Blog: Psychology and Statistics

## Thursday, February 18, 2010

### Analysis of Winter Olympic Medal Data Using R

The Winter Olympics are on. The Guardian's DataBlog has graciously compiled a database on Winter Olympic Medals. Thus, I thought I'd run a few quick analyses on the data in R. In this post I was hoping to show how one could quickly churn out some basic analyses (and answer some interesting questions) using R.

First, a disclaimer: I ran these analyses in about 45 minutes. Thus, I make no claims of perfect accuracy or in the source data provided by the Guardian. The data also does not include 2010 medals.

Below you will see:
1. The R Console input and output
2. The plots
3. The source code on its own

CONSOLE INPUT AND OUTPUT
> # tips on reading a Google Spreadsheet:
> # http://www.guardian.co.uk/news/datablog/2010/feb/11/winter-olympics-medals-by-country
>
> savePlot <- TRUE # optional variable used to save or not save plots in code
>
> # remove rows that do not contain data
> medals$Year <- as.numeric(medals$Year)
Warning message:
NAs introduced by coercion
> medals <- medals[!is.na(medals$Year), ] > > > # Quick look at data > head(medals) Year City Sport Discipline NOC Event Event.gender 1 1924 Chamonix Skating Figure skating AUT individual M 2 1924 Chamonix Skating Figure skating AUT individual W 3 1924 Chamonix Skating Figure skating AUT pairs X 4 1924 Chamonix Bobsleigh Bobsleigh BEL four-man M 5 1924 Chamonix Ice Hockey Ice Hockey CAN ice hockey M 6 1924 Chamonix Biathlon Biathlon FIN military patrol M Medal 1 Silver 2 Gold 3 Gold 4 Bronze 5 Gold 6 Silver > sapply(medals, function(x) cbind(sort(table(x), decreasing = TRUE)))$Year
[,1]
2006  252
2002  234
1998  205
1994  183
1992  171
1988  138
1984  117
1980  115
1976  111
1968  106
1972  105
1964  103
1960   81
1956   72
1948   68
1952   67
1936   51
1924   49
1932   42
1928   41

$City [,1] Turin 252 Salt Lake City 234 Innsbruck 214 Nagano 205 Lillehammer 183 Albertville 171 Lake Placid 157 Calgary 138 Sarajevo 117 St. Moritz 109 Grenoble 106 Sapporo 105 Squaw Valley 81 Cortina d'Ampezzo 72 Oslo 67 Garmisch-Partenkirchen 51 Chamonix 49$Sport
[,1]
Skiing     1060
Skating     758
Biathlon    162
Bobsleigh   133
Luge        108
Ice Hockey   69
Curling      21

$Discipline [,1] Speed skating 455 Cross Country S 399 Alpine Skiing 367 Figure skating 207 Biathlon 162 Bobsleigh 115 Ski Jumping 114 Luge 108 Short Track S. 96 Nordic Combined 84 Ice Hockey 69 Freestyle Ski. 54 Snowboard 42 Curling 21 Skeleton 18$NOC
[,1]
NOR  280
USA  216
URS  194
AUT  185
GER  158
FIN  151
CAN  119
SUI  118
SWE  118
GDR  110
ITA  101
FRA   83
NED   78
RUS   76
FRG   41
CHN   33
JPN   32
KOR   31
TCH   25
EUN   23
GBR   21
EUA   19
CZE   10
LIE    9
POL    8
CRO    7
AUS    6
BLR    6
BUL    6
EST    6
HUN    6
BEL    5
KAZ    5
UKR    5
SLO    4
YUG    4
ESP    2
LUX    2
PRK    2
DEN    1
LAT    1
NZL    1
ROU    1
SVK    1
UZB    1

$Event [,1] individual 195 500m 133 1500m 111 downhill 97 slalom 96 1000m 94 giant slalom 90 5000m 78 singles 72 ice hockey 69 10km 60 50km 60 K90 individual (70m) 60 pairs 60 10000m 57 two-man 57 four-man 55 4x10km relay 51 15km 48 20km 45 4x7.5km relay 42 alpine combined 42 3000m 39 30km mass start 39 doubles 36 K120 individual (90m) 36 super-G 36 5km 30 moguls 30 4x5km relay 27 ice dancing 27 aerials 24 curling 21 10km pursuit 18 18km 18 Half-pipe 18 K120 team (90m) 18 Team 18 15km mass start 15 3000m relay 15 30km 15 3x5km relay 15 5000m relay 15 7.5km 15 Giant parallel slalom 12 Combined 10km + 15km pursuit 9 Combined 5km + 10km pursuit 9 12.5km pursuit 6 Alpine combined 6 giant-slalom 6 Snowboard Cross 6 Sprint 1,5km 6 sprint 1.5km 6 Team pursuit 6 Team sprint 6 12,5km mass start 3 3x7.5km relay 3 4x6km relay 3 5km pursuit 3 combined (4 events) 3 Combined 15 + 15km mass start 3 Combined 7.5 + 7.5km mass start 3 five-man 3 Individual 3 Individual sprint 3 military patrol 3 sprint 3$Event.gender
[,1]
M 1386
W  802
X  123

$Medal [,1] Gold 774 Silver 773 Bronze 764 > > > # How many medals have been awarded in each Olympics? > medalsByYear <- aggregate(medals$Year, list(Year = medals$Year), length) > if (savePlot == TRUE) png("fig1.png") > plot(x ~ Year, medalsByYear, ylim = c(0,max(x)), + ylab = "Total Medals Awarded", bty="l", + main = "Total Medals Awarded in Winter Olympics by Year") > if (savePlot == TRUE) dev.off() windows 2 > > # How has the amount of medals awarded to males and females changed over the years? > # Get data. > medalsByYearByGender <- aggregate(medals$Year,
+     list(Year = medals$Year, Event.gender = medals$Event.gender), length)
> medalsByYearByGender <- medalsByYearByGender[medalsByYearByGender$Event.gender != "X", ] > > # Plot results. > if (savePlot == TRUE) png("fig2.png") > plot(x ~ Year, medalsByYearByGender[medalsByYearByGender$Event.gender == "M", ],
+     ylim = c(0,max(x)), pch = "m", col = "blue",
+     ylab = "Total Medals Awarded", bty="l",
+     main = "Total Medals Awarded in Winter Olympics\n by Gender and by Year")
> points(medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "Year"], + medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"],
+     col = "red", pch = "f")
> if (savePlot == TRUE) dev.off()
windows
2
>
> # Table of proportion female
> propFemalePerYear <- medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"] / ( + medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"] +
+       medalsByYearByGender[medalsByYearByGender$Event.gender == "M", "x"]) > propFemalePerYear <- round(propFemalePerYear, 2) > cbind(Year = medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "Year"],
+     PropFemale = propFemalePerYear)
Year PropFemale
[1,] 1924       0.07
[2,] 1928       0.08
[3,] 1932       0.08
[4,] 1936       0.12
[5,] 1948       0.18
[6,] 1952       0.23
[7,] 1956       0.26
[8,] 1960       0.38
[9,] 1964       0.37
[10,] 1968       0.37
[11,] 1972       0.36
[12,] 1976       0.35
[13,] 1980       0.34
[14,] 1984       0.36
[15,] 1988       0.37
[16,] 1992       0.43
[17,] 1994       0.43
[18,] 1998       0.44
[19,] 2002       0.45
[20,] 2006       0.46
>
>
> # Which countries have won the most medals?
> sort(table(medals$NOC), dec = TRUE) NOR USA URS AUT GER FIN CAN SUI SWE GDR ITA FRA NED RUS FRG CHN JPN KOR TCH EUN 280 216 194 185 158 151 119 118 118 110 101 83 78 76 41 33 32 31 25 23 GBR EUA CZE LIE POL CRO AUS BLR BUL EST HUN BEL KAZ UKR SLO YUG ESP LUX PRK DEN 21 19 10 9 8 7 6 6 6 6 6 5 5 5 4 4 2 2 2 1 LAT NZL ROU SVK UZB 1 1 1 1 1 > > > # Of the countries that have won more than 50 medals, > # which have the highest percentage of gold medals? > NOC50Plus <- names(table(medals$NOC)[table(medals$NOC) > 50]) > medalsSubset <- medals[medals$NOC %in% NOC50Plus, ]
> medalsByMedalByNOC <- prop.table(table(medalsSubset$NOC, medalsSubset$Medal), margin = 1)
> medalsByMedalByNOC <- medalsByMedalByNOC[order(medalsByMedalByNOC[, "Gold"],
+         decreasing = TRUE), c("Gold", "Silver", "Bronze")]
> round(medalsByMedalByNOC, 2)

Gold Silver Bronze
RUS 0.43   0.32   0.25
URS 0.40   0.29   0.30
GER 0.37   0.37   0.26
SWE 0.36   0.26   0.37
USA 0.36   0.37   0.27
ITA 0.36   0.31   0.34
GDR 0.35   0.33   0.32
NOR 0.35   0.35   0.30
SUI 0.32   0.31   0.36
NED 0.32   0.38   0.29
CAN 0.32   0.32   0.36
FRA 0.30   0.29   0.41
AUT 0.28   0.35   0.38
FIN 0.27   0.38   0.34
>
>
> # How many different countries have won medals by year?
> listOfYears <- unique(medals$Year) > names(listOfYears) <- unique(medals$Year)
> totalNocByYear <- sapply(listOfYears,  function(X)
+       length(table(medals[medals$Year == X, "NOC"]))) > > # Table > totalNocByYear 1924 1928 1932 1936 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 10 12 10 11 13 14 13 14 14 15 17 16 19 17 17 20 1994 1998 2002 2006 22 24 24 26 > > # Plot > if (savePlot == TRUE) png("fig3.png") > plot(x= names(totalNocByYear), totalNocByYear, + ylim = c(0, max(totalNocByYear)), + xlab = "Year", + ylab = "Total Number of Countries", + bty = "l", + main = "Total Number of Countries\n Winning Medals By Year") > if (savePlot == TRUE) dev.off() windows 2 > > # Which Countries have won a medal at every Olympics? > propYearsOnePlusMedals <- apply(table(medals$NOC, medals$Year) > 0, 1, mean) > > #Answer > names(propYearsOnePlusMedals[propYearsOnePlusMedals == 1.0]) [1] "AUT" "CAN" "FIN" "NOR" "SWE" "USA" > > # Table Sorted by Proportion of Olympics with a Medal > cbind(sort(propYearsOnePlusMedals, decreasing = TRUE)) [,1] AUT 1.00 CAN 1.00 FIN 1.00 NOR 1.00 SWE 1.00 USA 1.00 FRA 0.95 SUI 0.95 ITA 0.80 GBR 0.65 NED 0.65 TCH 0.55 JPN 0.50 GER 0.45 URS 0.45 FRG 0.35 GDR 0.30 HUN 0.30 CHN 0.25 KOR 0.25 POL 0.25 AUS 0.20 BEL 0.20 BLR 0.20 BUL 0.20 LIE 0.20 RUS 0.20 CZE 0.15 EUA 0.15 UKR 0.15 CRO 0.10 ESP 0.10 EST 0.10 KAZ 0.10 PRK 0.10 SLO 0.10 YUG 0.10 DEN 0.05 EUN 0.05 LAT 0.05 LUX 0.05 NZL 0.05 ROU 0.05 SVK 0.05 UZB 0.05  THE PLOTS THE R SOURCE CODE # tips on reading a Google Spreadsheet: # http://blog.revolution-computing.com/2009/09/how-to-use-a-google-spreadsheet-as-data-in-r.html # Data taken from:"https://spreadsheets.google.com/ccc?key=0AgdO92JOXxAOdDVlaUpkNlB2WERtV3l1ZVFYbzllQWc" # http://www.guardian.co.uk/news/datablog/2010/feb/11/winter-olympics-medals-by-country googleLink <- "http://spreadsheets.google.com/pub?key=tsddww6vOYePkhPSxRpDeYw&single=true&gid=1&output=csv" medals <- read.csv(googleLink, stringsAsFactors = FALSE) savePlot <- TRUE # optional variable used to save or not save plots in code # remove rows that do not contain data medals$Year <- as.numeric(medals$Year) medals <- medals[!is.na(medals$Year), ]

# Quick look at data
sapply(medals, function(x) cbind(sort(table(x), decreasing = TRUE)))

# How many medals have been awarded in each Olympics?
medalsByYear <- aggregate(medals$Year, list(Year = medals$Year), length)
if (savePlot == TRUE)  png("fig1.png")
plot(x ~ Year, medalsByYear, ylim = c(0,max(x)),
ylab = "Total Medals Awarded", bty="l",
main = "Total Medals Awarded in Winter Olympics by Year")
if (savePlot == TRUE) dev.off()

# How has the amount of medals awarded to males and females changed over the years?
# Get data.
medalsByYearByGender <- aggregate(medals$Year, list(Year = medals$Year, Event.gender = medals$Event.gender), length) medalsByYearByGender <- medalsByYearByGender[medalsByYearByGender$Event.gender != "X", ]

# Plot results.
if (savePlot == TRUE)  png("fig2.png")
plot(x ~ Year, medalsByYearByGender[medalsByYearByGender$Event.gender == "M", ], ylim = c(0,max(x)), pch = "m", col = "blue", ylab = "Total Medals Awarded", bty="l", main = "Total Medals Awarded in Winter Olympics\n by Gender and by Year") points(medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "Year"],
medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"], col = "red", pch = "f") if (savePlot == TRUE) dev.off() # Table of proportion female propFemalePerYear <- medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"] / (
medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "x"] + medalsByYearByGender[medalsByYearByGender$Event.gender == "M", "x"])
propFemalePerYear <- round(propFemalePerYear, 2)
cbind(Year = medalsByYearByGender[medalsByYearByGender$Event.gender == "W", "Year"], PropFemale = propFemalePerYear) # Which countries have won the most medals? sort(table(medals$NOC), dec = TRUE)

# Of the countries that have won more than 50 medals,
# which have the highest percentage of gold medals?
NOC50Plus <- names(table(medals$NOC)[table(medals$NOC) > 50])
medalsSubset <- medals[medals$NOC %in% NOC50Plus, ] medalsByMedalByNOC <- prop.table(table(medalsSubset$NOC, medalsSubset$Medal), margin = 1) medalsByMedalByNOC <- medalsByMedalByNOC[order(medalsByMedalByNOC[, "Gold"], decreasing = TRUE), c("Gold", "Silver", "Bronze")] round(medalsByMedalByNOC, 2) # How many different countries have won medals by year? listOfYears <- unique(medals$Year)
names(listOfYears) <- unique(medals$Year) totalNocByYear <- sapply(listOfYears, function(X) length(table(medals[medals$Year == X, "NOC"])))

# Table
totalNocByYear

# Plot
if (savePlot == TRUE)  png("fig3.png")
plot(x= names(totalNocByYear), totalNocByYear,
ylim = c(0, max(totalNocByYear)),
xlab = "Year",
ylab = "Total Number of Countries",
bty = "l",
main = "Total Number of Countries\n Winning Medals By Year")
if (savePlot == TRUE) dev.off()

# Which Countries have won a medal at every Olympics?
propYearsOnePlusMedals <- apply(table(medals$NOC, medals$Year) > 0, 1, mean)

names(propYearsOnePlusMedals[propYearsOnePlusMedals == 1.0])

# Table Sorted by Proportion of Olympics with a Medal
cbind(sort(propYearsOnePlusMedals, decreasing = TRUE))


1. Hi,
Thank you for all of these and for the videos!
the input data from Google is no more available.
Could you please put also the data in a txt format or something. Thank you!
Marius

2. @Anonymous
I have a github project that you can check out here:

https://github.com/jeromyanglim/Sweave_Winter_Olympics

The data file is here:

https://github.com/jeromyanglim/Sweave_Winter_Olympics/blob/master/data/medals.csv