Vega: A New Grammar-Based Specification for Visualizations
I’m a big fan of using languages for visualization rather than canned chart types. I’ve been working with the Grammar of Graphics approach for a number of years within SPSS and now IBM, and my book “Visualizing Time” is composed 95% of Grammar-based visualizations. It’s pretty safe to say it’s my preferred approach.
Protovis (the forerunner of D3, to a great extent) was built on Grammar approach; Bostock and Heer’s 2009 article (on Heer’s site at http://hci.stanford.edu/jheer/files/2009-Protovis-InfoVis.pdf) gives a very good statement of the benefits of the Grammar-based approach as opposed to the “Chart Type” approach:
The main drawback of [the chart type] approach is that it requires a small, closed system. If the desired chart type is not supported, or the desired visual parameter is not exposed in the interface, no recourse is available to the user and either the visualization design must be compromised or another tool adopted. Given the high cost of switching tools, and the iterative nature of visualization design, frequent compromise is likely.
Within the Grammar-based systems (ggplot by Hadly Wickham is another popular one) most of the teams went with a programming model; you write programs in some computer language using the grammar components as building blocks. The SPSS approach of defining a specification language as a way to access the underlying programming language was unique. In fact, SPSS designed two languages – a “user friendly” one called GPL and a more detail-oriented one called ViZml. You can get an overview for these grammar-based specifications in SPSS’s documentation – here’s a direct link to the ViZml one:
So I was quite excited to take a look at Vega, the new layer on top of D3. It’s exactly what we had done in SPSS (and now IBM): “a higher-level visualization specification language”. The goals of the two systems are a little different – The IBM system is designed to focus attention on the high-level and not requiring low-level programming, whereas D3 and Vega focus on letting programmers work with low-level visualization entities better and faster – but they still serve roughly the same tasks, and so it’s informative to look at how the various specification languages describe things. I’ll take the Vega “try this first” tutorial bar chart as an example.
In GPL (2000) a bar chart was specified like this:
SOURCE: s = userSource(id("Employeedata")) DATA: jobcat=col(source(s), name("jobcat"), unit.category()) DATA: salary=col(source(s), name("salary")) SCALE: linear(dim(2), include(0)) GUIDE: axis(dim(2)) GUIDE: axis(dim(1)) ELEMENT: interval(position(jobcat*salary))
Pretty compact, but it did make a lot of decisions for you, such as default colors, bar appearance and so on. I’m going to skip the ViZml (2004) version (if you are interested, you can see examples in the ViZml docs linked above) and go directly to a comparison with the specification language we are currently working on in IBM, and using as a core visualization capability. The engine is called RAVE, and the language VizJSON.
I took the Vega example and recreated it as a VizJSON spec. Here are the two charts generated by the systems:
Sample Bar Charts (Vega first, VizJSON second)
A few minor differences; most notable are different default fonts and different default ticks on the vertical axis, but pretty much the same. Unfortunately, for legal reasons (that I do not 100% understand), I cannot show the exact VizJSON specification, so I have taken the specification and modified it a little so that the specification below, while not VizJSON, has the same structure and is almost identical in length. Mostly names have been changed.
Vega/VizJSON Comparison (pdf)
The Vega version is a bit longer; partly because in Vega each mark defines its own coordinate systems piecemeal by defining scales for each position (“x”, “y”, etc). VizJSON groups scales into coordinate systems and shares the coordinate systems among multiple elements, which is a little more compact.
Another reason is based on the difference in philosophy between the core engines; D3 assumes you are designing for a specific data set and so you have to tell it to allocate space for the axes, whereas RAVE adapts to the data unless you override it with a specific preference. This is why Vega has the padding element at the top; the data has y values in the range [0,100] and, for the default font, this requires a left padding of 30 pixels to make space for the ticks. RAVE works this out for you, so the padding is not needed in the VizJSON specification.
Vega also has finer control over the hover behavior; VizJSON currently does not allow such control and you have to use the programmatic interface to set the hover style.
These are minor details though; the main take-away here is that, despite a difference in the underlying engines and a different set of goals, the languages are very similar – it would be a simple job to write a translator from one to the other, for example. To a great extent this demonstrates that “Grammar of Graphics” approach is a very robust and powerful solution. It’s a language that works.
Which, considering I have been working on it for over a decade, is good news!