Visualization seems pretty easy, so it’s often left as an afterthought. But visuals can be an immensely effective—or destructive—form of communication. To that end, many, if not most, people fail to tap into its power because of they make prominent mistakes. (Sorry for the ego blow, homies.).
But that need not be the case. Although visualization is a process, not a result, once you know what to look for, you can easily cut down on those big mistakes and make graphs that—while not perfect—will be consistently good.
For our purposes, they can be summed up as “think about your readers while recognizing your practical limitations.”
1. Why are you making this graph?
If you’re reading this, I imagine it’s usually one of these two.
B) You want to “facilitate understanding” of an insight you came across for your audience. (e.g. blogging on hockey hyphen graphs dot com)
I also imagine it’s the second, most of the time.
2. Who is the audience? What will be most effective for them?
We usually, but not always, have a layperson audience.
Put yourself in your readers’ shoes for a minute. How likely are you to retweet a graph you don’t understand? Continue reading an article with figures as foreign-looking as Arabic? Revisit a site where every post is a struggle to get through because you the visuals are so unhelpful you really have to buckle down and read slowly?
If you don’t have significant interest, you’ll pass. Hell, sometimes, even if I’m interested in the material, I’ll give up instead of continuing to slog through something I find abstruse.
Now, there are certain situations where it will take time to read and interpret a chart (generally when there is a lot of information presented—WOWY and game on-ice shot head-to-heads, for example). But if you’re trying to illustrate a particular point and your visual is more Guernica than Sistine Chapel-Madonna in terms of figuring out exactly what’s going on, then there’s a problem.
Ultimately, graphs that are too simple are not going to lose you readership. Graphs that are too complicated will (unless your intent is to show that something is complex or confusing—in that case, go ahead). So make a quick sketch. Show a relative or friend (or complete stranger) a draft of your graph. Tweet out early stages and ask for feedback, or create a draft-chart account if you’re that much of a boss.
When in doubt, simplify your charting game. Tough-to-read, brooding sexiness only works in movies.
3. What should you graph?
From before, we know we want our graphs to be accessible. We also want them to be useful. With that in mind:
- Make sure the data you plot illustrate what you want them to. Does the graph actually show what you want it to, or does it merely show something related? Make sure it’s the former.
- Make sure the data you plot do not show more than is necessary to the point of distraction.
The possession charts from hockeystats.ca (and others) are great examples of hitting the sweet spot between ambition and restraint. It’s easy to go overboard. They could have different colors or labels for different strength situations and shot types. But instead (in the case of hockeystats specifically), they just stick to CF for each team, add a little dot for goals, and have a dropdown for changing strength setting. That’s it. The graph gives you enough information to be satisfied, but not so much that you’re left feeling overwhelmed.
- Label your axes, give the chart a title, and include a legend if necessary. These are basic elements of style that help readers and make you seem more like a pro (even though you’re probably not literally one).
Asmae: “I get a little frustrated when there’s no legend explaining acronyms. When I was new to hockey twitter, I really liked the way Micah prefaced every graph with a conclusion or something to watch for. It really teaches you to pick up on the right things when you face other charts. Also helps if reading charts is not easy for you at first.”
- Make the variables plotted as easily understood as possible. Maybe even throw in a little bubble with a clarifying annotation, if readers are unlikely to understand a variable you used and its explanation is short. Have a caption or caption-like sentence below the graph to give readers a quick jist of what they’re supposed to see or how to read the graph. Hell, you could even replace the 10-character “abbreviation” of a variable with its actual formula or a four-word description.
4. What tools should you use?
You’re not lacking for options.
Excel is usually fine. You can reproduce simple Excel charts in Google Sheets, which has the added bonus of interactivity. (Google Sheets also has a few other types of charts, like moving scatterplots—”motion charts”—which can be handy sometimes.) Tableau is also good for interactivity and flexibility, is better at handling larger amounts of data, and gives viewers flexibility to manipulate filters to explore the data on their own.
If you need more flexibility, there are several well-resourced options: R (ggplot2 and various packages for other types of visuals), MATLAB, and Python (matplotlib/seaborn) are a few of the most popular ones. If you have big website/visualization plans, you might want to check out D3.js.
Of these tools, MATLAB (as well as statistical packages like Stata and SPSS) are not ordinarily free, but if you’re in university, it may be worth checking whether your school provides current students with free licenses.
Some other guidelines, from me and the rest of the H-G crew:
- “Make sure it’s readable – text size shouldn’t be tiny.” — Carolyn
- Use a color scheme that is colorblind-friendly. Red-green colorblindness is common, affecting more than 1 in 20 men and 1 in 200 women. So if you color scheme is red and green, you should definitely change it.
Orange and blue is generally reliable. You can also use different shades of the same color or a colorblind-safe spectrum. More here (and elsewhere).
- Avoid dual axes. They may look sexy, but unless they’re linear transformations of each other—shots per minute and shots per 60 minutes, for example—they’ll be confusing. Remember, you usually want the reader doing as little work as possible and for your message to be as clear as possible. (And, frankly, I find understanding and insight far sexier than general appearance.)
Instead of dual axes, stack your graphs, like Micah does in his (wonderful) player overview charts. There are five graphs sharing a single x-axis scale (game number). It’s easy enough to make vertical comparisons and none of the five feels cluttered.
- Avoid 3D. Never use it purely for aesthetic purposes—it should display another dimension of the data. Even then, if it makes the graph confusing or difficult to read, then go with another 2D graph instead of stuffing all the information into a 3D graph.
Here’s an example of a good 3D graph. Most 3D graphs are not good.
- The basics are sexy. If a line graph, bar graph, or scatterplot could be effective, don’t forego them simply because you’re tired of them. Only forego them if you think your idea will better “facilitate understanding” for your readers.
- Go easy on the decoration.
Micah: “Don’t include anything that’s not data. That’s the most important.”
Conor: Don’t overdo the gridlines or background or helper images/graphics. “Know what you want to communicate, make sure it is of value, and graph it simply and clearly.”
- Go easy on how much information you present.
Ben: “At least for a general audience, tread carefully whenever you go beyond two variables (not counting, like, time variables). Unless they are very clearly delineated, and clearly expressing their signals, the meaning will get lost or be overwhelming.”
Don’t put too much into your graphs. If you’re thinking about using color, size, and opacity in a scatterplot, just don’t—try to limit yourself to three variables (of which two are ‘x’ and ‘y’). Your graph should be readable in a glance (unless it’s reference) and the reader’s attention should be quickly (or, at least, easily) drawn to the parts that are important.
On a related note, not everything needs to be a different color. If you’re plotting a sextuple bar graph (that is, a bar graph with clusters of six bars), if your point is to highlight the height of one of the bars, you can use just two colors: one for the one you’re interested in and one for the other five.
In short, don’t make the graph too cluttered or visually busy. Or, to put it another way, if you don’t need something in your graph, consider removing it.
- Don’t be misleading. If you’re willfully dishonest, we’re revoking your fancystats card.
- Start your axes at “zero” whenever possible. This is related to the previous point. You can portray 50% Corsi vs 51% Corsi as a huge difference by starting your y-axis at 49% and ending it at 52%, but that would be misleading to your readers.
You need to think about the point you’re trying to make. If it’s that there’s a whale of a difference between 50% and 51%…there’s just not, so don’t force it.
I put zero in quotes because, when appropriate, it’s not the worst thing to place your x-axis at y = 50% (which is a zero Corsi differential). Just use a sensible baseline so you’re not misleading.
- Be careful with pie charts. If there are lots of divisions, your message might be lost in all the visual clutter. In that case, Google Sheets’ interactive treemap might be helpful (since you can aggregate items that only show up individually when you click on them).
Pie charts can also distort area if you try to make them 3D or give them shading. That’s visually misleading.
You can also use stacked bar graphs (where the y-axis is “% of total”) instead of pie charts.
- Pick the right kind of chart. Here are four common mistakes:
- Line graphs are for time series. If your x-axis is not time, date, or game, you should probably use a bar graph.
- If your variables don’t intuitively go together, don’t use stacked bars—use grouped bars.
- If your heat map barely shows any differences in color and you’re trying to show that there are, in fact, differences, then a heat map was not the right choice. On the other hand, if your scatterplot has large dense regions, you may be better off with a heat map.
- Radar charts are best if the data are cyclical (e.g. showing a player’s performance by month throughout their career). Otherwise, there are better options available.
- If the chart might be difficult to decipher on small screens, throw in a link to a bigger version. Remember, many people read on mobile.
- Get someone else to look at your work. At the very least, look at it with a fresh mind after a day or two. If you find yourself having to decipher your graph, then that’s a sign that your graph may be too complicated.
- “Make sure your chart is telling a story – clearly. Just like in writing you should omit needless words, you should omit needless viz.” — Carolyn