Vega: A New Grammar-Based Specification for Visualizations

I’m a big fan of using languages for visualization rather than canned chart types. I’ve been working with the Grammar of Graphics approach for a number of years within SPSS and now IBM, and my book “Visualizing Time” is composed 95% of Grammar-based visualizations. It’s pretty safe to say it’s my preferred approach.

Protovis (the forerunner of D3, to a great extent) was built on Grammar approach; Bostock and Heer’s 2009 article (on Heer’s site at gives a very good statement of the benefits of the Grammar-based approach as opposed to the “Chart Type” approach:

The main drawback of [the chart type] approach is that it requires a small, closed system. If the desired chart type is not supported, or the desired visual parameter is not exposed in the interface, no recourse is available to the user and either the visualization design must be compromised or another tool adopted. Given the high cost of switching tools, and the iterative nature of visualization design, frequent compromise is likely.

Within the Grammar-based systems (ggplot by Hadly Wickham is another popular one) most of the teams went with a programming model; you write programs in some computer language using the grammar components as building blocks. The SPSS approach of defining a specification language as a way to access the underlying programming language was unique. In fact, SPSS designed two languages – a “user friendly” one called GPL and a more detail-oriented one called ViZml. You can get an overview for these grammar-based specifications in SPSS’s documentation – here’s a direct link to the ViZml one:

So I was quite excited to take a look at Vega, the new layer on top of D3. It’s exactly what we had done in SPSS (and now IBM): “a higher-level visualization specification language”. The goals of the two systems are a little different – The IBM system is designed to focus attention on the high-level and not requiring low-level programming, whereas D3 and Vega focus on letting programmers work with low-level visualization entities better and faster – but they still serve roughly the same tasks, and so it’s informative to look at how the various specification languages describe things. I’ll take the Vega “try this first” tutorial bar chart as an example.

In GPL (2000) a bar chart was specified like this:

SOURCE: s = userSource(id("Employeedata"))
DATA: jobcat=col(source(s), name("jobcat"), unit.category())
DATA: salary=col(source(s), name("salary"))
SCALE: linear(dim(2), include(0))
GUIDE: axis(dim(2))
GUIDE: axis(dim(1))
ELEMENT: interval(position(jobcat*salary))

Pretty compact, but it did make a lot of decisions for you, such as default colors, bar appearance and so on. I’m going to skip the ViZml (2004) version (if you are interested, you can see examples in the ViZml docs linked above) and go directly to a comparison with the specification language we are currently working on in IBM, and using as a core visualization capability. The engine is called RAVE, and the language VizJSON.

I took the Vega example and recreated it as a VizJSON spec. Here are the two charts generated by the systems:


Sample Bar Charts (Vega first, VizJSON second)

A few minor differences; most notable are different default fonts and different default ticks on the vertical axis, but pretty much the same. Unfortunately, for legal reasons (that I do not 100% understand), I cannot show the exact VizJSON specification, so I have taken the specification and modified it a little so that the specification below, while not VizJSON, has the same structure and is almost identical in length. Mostly names have been changed.

Vega/VizJSON Comparison (pdf)

The Vega version is a bit longer; partly because in Vega each mark defines its own coordinate systems piecemeal by defining scales for each position (“x”, “y”, etc). VizJSON groups scales into coordinate systems and shares the coordinate systems among multiple elements, which is a little more compact.

Another reason is based on the difference in philosophy between the core engines; D3 assumes you are designing for a specific data set and so you have to tell it to allocate space for the axes, whereas RAVE adapts to the data unless you override it with a specific preference. This is why Vega has the padding element at the top; the data has y values in the range [0,100] and, for the default font, this requires a left padding of 30 pixels to make space for the ticks. RAVE works this out for you, so the padding is not needed in the VizJSON specification.

Vega also has finer control over the hover behavior; VizJSON currently does not allow such control and you have to use the programmatic interface to set the hover style.

These are minor details though; the main take-away here is that, despite a difference in the underlying engines and a different set of goals, the languages are very similar – it would be a simple job to write a translator from one to the other, for example. To a great extent this demonstrates that “Grammar of Graphics” approach is a very robust and powerful solution. It’s a language that works.

Which, considering I have been working on it for over a decade, is good news!

About workingvis

Visualization is the science of making pictures out of data so that they inform the viewer and allow them to understand the data and take action based on what can be seen. I create new methods of interacting with data using a computer interface and try to understand what tools help people model their data and find patterns and unusual features. I have a background in statistics and statistical graphics, and work with computer scientists as well as statisticians. My particular interests include research into: * Fundamental methods for interaction with data views * Statistical methods to improve or motivate visualization design * The interface between statistical models and statistical graphics * Visualization of large weighted graphs * Ways to use knowledge discovery techniques with visualization Specialties:visualization, research, statistics, statistical modeling, graphics, information visualization, agile development, spatial statistics, time series

Posted on 2013/06/26, in Design and tagged , , , , , . Bookmark the permalink. 4 Comments.

  1. GGPLOT (Wickham) for R is another grammar-based language that is making big inroads over the older “chart types” and “programmers only” models.

  2. Do you know where one can get a look at the VizJSON specification or any details of the grammar. I’ve been unable to locate it.

    If you can point me somewhere – please send the answer to as well as whatever you post on this blog.

  3. Did Thom ever receive an answer? I am also interested in seeing documentation on the VizJSON but it seems to be non-existent.

    Can you please share the details – otherwise will have to determine it was never meant for public utilization.

    • After a lot of work internally, it looks like answer (B) is the eventual one — it is not going to get released for public consumption.

      On the strongly more positive note, I have been working on a new, simpler system and have just received permission to put this into Open Source, so in a fw days you should see a post with full details including docs and specs and how to grab and use it.

      It’s a slightly different take on a visualization language than VizJSON, but has the huge advantage that you can actually get hold of it!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: