Blog Archives

Appropriate Mappings


Vox Article on viral memes and charitable giving

First, a disclaimer. This is not a post about the actual issues this article raises; just about the presentation of those claims. The image from the article has appeared in numerous places and been referenced by a number of news sources, as well as appearing in my Facebook and twitter feeds.

And it’s a bad image.

One minor issue is that it is hard to work out which circle relates to which disease, as the name of the disease only appears on the legend, so you are constantly moving your eyes from grey dot on left to the legend, to the grey dot on the right. Hard to make much sense. The fact that the legend doesn’t seem to have any order to it doesn’t help either. If this were 20 diseases instead of eight, the chart would be doomed!

Kudos for picking appropriate colors though. It helps that they used a natural mapping (pink <–> breast cancer; red <–> AIDS) that might help a bit.

The more worrying issue is that it makes a classic distortion mistake; look at the right side and rapidly answer the question, using just the images, not the text: “How many more deaths are there due to the purple disease than the blue disease?” 

Using the image as a guide, your answer is likely to be in the range 10 to 20 times as man, because the ratio of the areas is about that amount. When you look at the text, though, it’s actually only about four times. The numbers are not encoding the area, which is what we see, but they are encoding the radius (or diameter) which we do not immediately perceive.

The result is a sensationalist chart. It takes a real difference, but sensationalizes it by exaggerating the difference dramatically. If you want to use circles, map the variable of interest to AREA, not RADIUS. It fits our perceptions much more truthfully. It’s not actually perfect; we tend to see small circles as larger than they really are; but it’s much, much better).

So, here’s a reworking:

WhereWeDonate Vs. Diseases That Kill

I tried to keep close to the original color mappings, as they are pretty good, but have used width to encode the variable of interest, keeping the height of the rectangle fixed. I also labeled the items on both sides so we can see much more easily that heart disease kills about 4x as many people as Chronic Obstructive Pulmonary Disease. 

I also added some links between the two disease rankings to help visually link the two and aid navigation. The result is, I believe, not only more truthful, but easier to use. In short, it works.

Comics and Visualization

Understanding Comics book cover; Scott McCloudComics and Visualization

Although this book is over a decade old now (and Scott has a number of later books that follow on from this one), this is still a highly valuable book to read, getting great review from famous artists as a fundamental resource for comic book writers. I read this from the perspective of a visualization expert, and found a number of interesting points in the book, especially the earlier sections. He defines comics as “juxtaposed pictorial and other images in deliberate sequence, intended to covey information and/or to produce an aesthetic response in the viewer (p.9)”, which, to my mind, allows many visualizations to fits his definition! The concept of small multiples, when presented in a “deliberate order” such as via a trellis display, fits particularly well into this definition, so I was encouraged to read on. Some highlights of the book, from my point of view:

  • The use of simpler icons / symbols to make depictions of reality more universal; that argument resonates more strongly with me than Tukey’s data-ink concept. I feel more convinced by the argument that additional detail is bad when it makes it harder for us to understand the high-level picture because it draws us too much into the physicality of the shapes being used.
  • McCloud presents a triangular space, the vertices of which are “reality”, “language” and “the picture plane” into which comic styles can be placed. I think there is also value in looking at various styles of visualization and seeing where they fit in. Treemaps, for example, have more “realistic” versions using cushions, while keeping the same structure. Scientific, geographic or fluid display visualizations are more realistic than, say, statistical graphics.
  • Less is More” applied to the number of intermediate representations used — this argues that for visualizations of, say, a process evolving over time, we should not simply slice at even times, but instead look for important features we want to show, and show fewer frames.
  • Lots of good stuff on how time is perceived when displayed at a sequence.
  • Can Emotions be Visible?” is the motivating question for chapter five — I would be very curious to see if we could apply his ideas to visualizations — maybe people like pie charts because they seem warm, serene and quiet, whereas a line chart with gridlines is rational, conservative and dynamic?

As an aside, I included a comic in my book on Visualizing Time, more as a whimsy than anything else, but I’m glad that I have at least a tenuous link with Scott McClouds’s highly recommended book! comics

Visualizing Tennis

I’m a member of the American Statistical Association’s “Statistics in Sport” section ( and I’m also British by birth, so Andy Murray’s success at Wimbledon this year was interesting to me for two reasons. I took a look at some of the data on Murray (collected by IBM’s SlamTracker initiative — ) with a view to doing a little visual analysis, so now I have another reason to be interested …

I found some data on his performance over a few years leading up to Wimbledon 2013 and wanted to look at trends. Now usually I prefer to create several linked visualizations and look at them together, but for this data I found that several of the stats I was interested in worked nicely when plotted in the same system. Here’s what I came up with:


Read the rest of this entry

Vega: A New Grammar-Based Specification for Visualizations

I’m a big fan of using languages for visualization rather than canned chart types. I’ve been working with the Grammar of Graphics approach for a number of years within SPSS and now IBM, and my book “Visualizing Time” is composed 95% of Grammar-based visualizations. It’s pretty safe to say it’s my preferred approach.

Protovis (the forerunner of D3, to a great extent) was built on Grammar approach; Bostock and Heer’s 2009 article (on Heer’s site at gives a very good statement of the benefits of the Grammar-based approach as opposed to the “Chart Type” approach:

The main drawback of [the chart type] approach is that it requires a small, closed system. If the desired chart type is not supported, or the desired visual parameter is not exposed in the interface, no recourse is available to the user and either the visualization design must be compromised or another tool adopted. Given the high cost of switching tools, and the iterative nature of visualization design, frequent compromise is likely.

Read the rest of this entry

Chord Display (Music)

ITunes Music with a RAVE Chord visualization

ITunes Music with a RAVE Chord visualization

I took the data from my last post, aggregated up some fields and made a Chord Diagram for it, using RAVE. I was lazy and didn’t do a stellar job on rolling up years, so the year indicated is actually the center of a 4-year span — so 2007 is actually [2005.5, 2009.5] which is a little odd.

No big insights here — podcasts are all recent; alternative music is mostly recent too (Eels and Killers are artists with a large number of songs in my library). Interesting that I didn’t buy a lot of music form around 1999 …

I thought there were more packages that could do chord visualizations, but was only able to find some D3 examples.

iTunes Music to Data, via Python

Music Treemap

8000+ iTunes songs by genre and artist, colored by rating (ManyEyes version)

The track information stored in iTunes is pretty interesting from a visualization point of view, as it contains dates, durations, categories, groupings — all the sorts of things that make for complex, interesting data to look at.The only issue is … it’s in iTunes, and I’d like to get a CSV version of it so I can use it in a bunch of tools.

So, here is the result; a couple of Python scripts that use standard libraries to read the XML file exported by iTunes and convert it to CSV. It’s not general or robust code, just some script that worked for me and should be pretty easy to modify for you. I’m not a Pythonista, mostly doing Java, so apologies for non-idiomatic usage. Feel free to correct or suggest in the comments as this is also a learning exercise for me.

Read the rest of this entry

From the Vaults: Maps are Just Another Element

For the Grammar of Graphics language-based approach to visualization, and therefore in the RAVE visualization system, maps are simply another element that can be used within the grammatical formulation.

Although most people consider a map a very different entity from a bar chart, all that really differs between a bar chart and a map of areas like the one included here is that instead of representing a row of data by a bar, we use a polygon (or set of polygons) on a map. Otherwise their properties ought to be the same — we can apply color, patterns, labels, transparency. We can set a summary statistic when there are multiple values for each polygon to reflect min, max, mean, median, range, or any of the regular sets of items. We can flip, transpose and panel the charts. Essentially, from the grammatical point of view, if you can do it to a bar chart, you can do it to a map. The only limitation is that whereas the sizes of the bars can be set or determined by data, the map polygons cannot, so setting sizes on the map polygons has no effect.

US Chorlopleth

Orthogonality is also important — so we can say we want a point element instead of a polygon, as in the above where we’ve added a second element to a RAVE US Map conveying different data as well as being a good place to put labels