FlowingData reader Chris asks:
I was wondering, have you ever considered doing a Chernoff faces tutorial for R? I think Chernoff faces are pretty interesting and I haven't seen much about them on the web.
This wasn't the first time someone's asked how to make Chernoff faces, so I did a quick search. Guess what. There's an R library for that. This tutorial describes how to apply Chernoff faces to your own data.
Chernoff Faces
The point of Chernoff faces is to display multiple variables at once by positioning parts of the human face, such as ears, hair, eyes, and nose, based on numbers in a dataset. The assumption is that we can read people's faces easily in real life, so we should be able to recognize small differences when they represent data. Now that's a pretty big assumption, but debate aside, they're fun to make.
We've seen them applied to baseball players and judge ratings. In this tutorial, we'll look at US crime rate by state.
Download R
Like in previous tutorials, we'll be using R (surprise, surprise), the software environment for statistical computing and graphics, to make our Chernoff faces, so if you haven't already, download and install R first before moving on. It's free, open-source, and a one-click install. Go on, I'll wait for you.
Step 1. Install package
Once you've opened up R, the first thing we need to do is install the aplpack (Another Plot Package) package by Peter Wolf. Go to the the "Packages & Data" menu in R, and select the "Package Installer." Select "CRAN (binaries)" in the dropdown menu if it's not already on that, and then click on "Get List." Scroll down to "aplpack" and click on the "Install Selected" button and installation should begin.
Alternatively, you can also just type this in the R console:
install.packages("aplpack")
Step 2. Load the data
Next we need to load the data into the R environment. Like I said, we'll be looking at crime rates by state. I got the data from Infochimps, which is actually from Table 301 of the 2008 US Statistical Abstract, but it's typically a headache going through dot gov navigation, so I avoid it when I can.
I cleaned the datafile I got from Infochimps a little bit more so it only includes the numbers we're interested in. You can find it here, but you don't need to download it. We'll load it directly into R via the URL using the read.csv() command.
crime
To view the data, type the following:
crime[1:6,]
This shows you the first six lines of our dataset. Note that there are eight columns. The first column is state name, with the exception of the row for US average and District of Columbia later on. The rest of the columns are seven categories of crime.
Step 3. Make some faces
Once the data is in, it's actually really easy to make some faces using the faces() function from the aplpack package. So far we've only installed the package, so now we'll load it:
library(aplpack)
If you get errors when you try to load, you might want to check to see if you installed the library correctly.
Okay, let's make some faces:
faces(crime[,2:8])
Here we're telling R to use the faces() function, using columns 2 through 8 of our crime data. Remember, the first column is state name. You get something that looks like this:
Step 4. Change Features
This is pretty much what we want except for two things. The first is that the faces are labeled with numbers. That isn't of much use without a key. The second is that some of the faces are smiling. For more positive datasets like quality of life or baseball stats, that would make sense. The higher the value, the better. This is crime data though. The higher the value, the worse. Smiles for rate of larceny theft doesn't seem quite right.
Unfortunately, the faces() function doesn't let us choose what face parts to associate with each metric, so we need to find a workaround. According to the documentation (view by typing ?faces), the curve of the smile is applied to the sixth column in the input matrix, which is crime in this case.
Ah. Here's what we'll do. We make the sixth column in our data all the same value. That way all smile curves will be neutral. Here's how we can do that:
crime_filled
The cbind() function combines multiple columns to form a matrix. In the above, we combine the first six columns of crime, stick a column of zeros whose length matches the number of rows in our crime data, and then we end with the last two columns in crime. We save the new matrix into a variable called crime_filled. Similar to in Step 2, you can type the following to see the first rows of crime_filled.
crime_filled[1:6,]
Notice the new column of zeros?
Now use faces() with crime_filled:
faces(crime_filled[,2:8])
We get similar faces, but with no more smiles:
Step 5. Add labels
Instead of numbers, it'd be much more useful to include state names. Easy.
faces(crime_filled[,2:8], labels=crime_filled$state)
It's the same as previous, but we use the labels argument to use the state column in crime_filled to label with state names.
Much more useful now. We can easily associate the faces with a state. It's a little cluttered, but we can fix that up easy in Illustrator.
Step 6. Fix up in Illustrator (optional)
You can pretty much stop here if you like, but as most of you know, I like to save the image as a PDF, bring it into Adobe Illustrator (aff), and clean things up to make it more readable. You can also try Inkscape, the open-source alternative, although I've never tried it.
After some label cleanup and some annotation, here's our final result. What's going on there Washington, D.C.?
Not too bad, right?
Read the R documentation on faces() for more details on what else you can do with the function. Remember, documentation is your friend when it comes to making full use of R.
Now go on. Have some fun with your new Chernoff toy.
Got a visualization question? Post it in the forums.
Chernoff Faces to Display Baseball Managers From 2007 MLB Season
An Easy Way to Make a Treemap
How to Make a Heatmap – a Quick and Easy Solution