Unnecessary or poorly planned visual elements can confuse or distort your messages. Graphs can easily be produced by various software packages, but be careful not to let the bells and whistles lure you into poor practices – just because you can doesn’t mean you should.
Pie charts
Circle or ‘pie’ charts use pie segments to represent relative proportions of a total measure (i.e. a part-to-whole relationship). However, because of their shape, the size of these segments is difficult for readers to judge, mentally assign a value to, and compare (Few 2012). The choice of colour can also influence how well readers can compare the size of sections – sometimes a smaller section will appear larger because it is in a strong colour.
Although pie charts have undeniable visual appeal and can be useful in certain situations, many authoritative sources therefore advise not to use them. Use a simple bar graph if accuracy is important (i.e. bars representing each proportion of the total measure). Ensure that all bars add up to 100% or the total absolute value.
Sometimes, you will need to compare a series of related part-to-whole relationships – for example, the proportion of a total species’ population located in each Australian state or territory, for several species. You probably want the reader to compare the size of populations in the same state, across species. Visually, this is very difficult to do with pie charts. Instead, consider using a stacked bar graph, where the colour and order of like parts (e.g. states or territories) is identical, and each bar represents a whole group (e.g. species):
Converting a pie chart to a bar graph
Radar graphs
Radar graphs, also known as spider or web charts, are designed to plot 1 or more series of values over multiple quantitative categories by providing an axis for each category, arranged radially as equi-angular spokes around a central point. Values increase from the centre outwards, and each axis can have its own quantitative measure. The values for each category are plotted as points. Often, all the category values in a single series are connected by lines, forming an irregular polygon:
These graphs are unfamiliar to most readers and make it difficult to compare parts of the data. Readers may infer correlations or make comparisons across adjacent categories that are invalid.
Instead, plot the data as bar graphs with clustered columns, or a trellis graph for multiple data series.
Clutter and distraction
Garish, multicoloured or zebra-striped graphs with large amounts of bolded text, heavy lines and effects (e.g. 3D, perspective, shadow) can bamboozle and repel readers, as well as distort the data. Stick to the principle ‘less is more’ and strip all unnecessary ink off the page.
Eliminate ‘graph junk’ – remove citations, caveats, logos, background shading, borders, and other nonrelevant data or graphic elements within the graph. This helps to reduce ‘ink on the page’ and minimise the visual impact of nondata elements:
A graph that uses too much colour and contains visual clutter
The same graph as above, cleared of clutter
Meaningless colours
Be restrained with use of colour. It can be more powerful and effective to use a single splash of a strong colour to highlight a key point than to colour the entire graph.
Adjust colours to suit the design concept, provided that this will not interfere with the integrity of the data (such as a map key, or colours relating directly to data or existing schemes). List colours and patterns in the legend in the same order in which they appear in the figure. In a series of figures, use the same colours for the same categories. Ensure that there is enough contrast between colours – steps of 20% tint for monochrome shading – or use patterns or patterned lines (e.g. dotted, dashed). Avoid zebra striping and rainbow colours:
A graph using an unnecessary rainbow of meaningless colours
Look for opportunities to use colour meaningfully, and be careful not to imply meaning where there is none. For example, you could use 2 shades of a single colour to indicate different years of data, rather than red and green, unless you want to imply a judgement about the data series. The graph below uses 2 shades of blue to distinguish between 2 related data series (samples 1 and 2), where there is no need to colour each of the categories (laboratories B to I) differently:
A graph using limited colour in a meaningful way
Also keep in mind an individual’s ability to perceive particular colours – a common form of colourblindness makes red and green both appear brown to affected people. The graphs below show how someone with red–green colourblindness might perceive a red and green figure. For many people, colours with insufficient contrast are also difficult to perceive. This is a component of accessibility for figures:
A graph using red and green, as it appears to a person with full colour vision
The same red and green graph as above, viewed by someone with red–green colourblindness
Multiple graphs in a single layout
Sometimes it is better to show more than 1 graph rather than squeeze a large amount of information into a single graph. Think about the overall layout and how the reader will view the graphs in relation to other graphs. You may want readers to notice a similar pattern of data values across 2 or more graphs, or make a series of comparisons for the same population across multiple graphs. These similarities or comparisons are encouraged when you use the same design and scale for all graphs, and align the graphs vertically or horizontally on the page.
The figure below shows a clustered bar graph representing a multivariate dataset – that is, the data include a series of groups (clinics) and subcategories within these groups (procedures). The multitude of colours used to differentiate subcategories is overwhelming, and the number of columns makes it almost impossible to compare values for the subcategories:
Multivariate dataset squeezed into a single cluster bar graph
The figure below shows the same data as a stacked bar graph. It looks as though this saves space and is technically appealing. However, because the y axis scale must accommodate total patients, it is difficult to determine or compare values for the various subcategories:
Multivariate dataset squeezed into a single stacked bar graph
A clearer, alternative approach to datasets such as this is a trellis graph. A trellis graph presents a series of identically designed graphs in a single layout. These graphs are most effective for showing similarities or differences in the pattern of data values across multiple groups or categories. Avoid using trellis graphs if readers need to identify specific data values.
To make a trellis graph, identify the 2 variables in your dataset that have the most dependent relationship (e.g. procedures and patient numbers), and plot these as the x and y axes. Then repeat this graph as many times as required for the third grouping variable (e.g. clinics; see figure below). Align the graphs in a vertical or horizontal trellis, depending on which variable you want readers to compare most readily. Overall patient numbers are not needed now that the reader can clearly see patient numbers in each procedure:
Multivariate dataset presented as a trellis of simple graphs
Axes that distort the data
Discontinuous or exponential scales are sometimes used on axes when the range of data values is very wide – that is, very small data values need to be compared with very large data values. A discontinuous axis skips a number of intervals at some point along the axis before continuing. An exponential axis has unevenly spaced intervals, becoming smaller with distance away from the origin. These axis scales are not readily understood by most readers and may give a distorted visual impression or, worse, misrepresent the message of the data.
Find another way to depict the values so that their full scale and the contrast between them is clear – perhaps as 2 different graphs, or a combination of an overall graph and a zoomed-in graph of the critical portion that you want the reader to notice.
Axes that do not start at zero can also distort a reader’s perception of the information; however, they can be useful when the variation between data points occupies only a small range at large values. Make sure the axis labels are clear and, if comparing multiple graphs of similar datasets, ensure that the scale and divisions are consistent, to allow accurate comparison.
The most common reason for presenting dual-scaled axes is to show the reader similarities in the pattern of values for both measures. However, dual-scaled axes require readers to override their natural inclination to compare data values across the 2 measures – a comparison that is invalid. Consequently, it is almost always better to plot 2 separate graphs (Few 2008b). Use the same design for these graphs to draw attention to similarities in the patterns of data values across the 2 measures, and place the graphs close together on the page. Consider using labels on only one of the x axes, to further link the graphs in the reader’s mind. That is, remove labels from the top or bottom graph (but keep the axis line and tick marks).
Graphs can contain negative values (below zero). For negative values, the axis extends down (vertical y axis) or left (horizontal x axis) from zero. Zero should be clearly marked on the axis, often with a heavier or darker tick line. If both axes are quantitative and span zero, they should intersect at zero. If only 1 axis is quantitative and spans zero, it is customary to place the category axis at the end of the quantitative axis so that the axis labels do not obscure the data. An exception is graphs displaying deviation data, where the goal of the graph is to highlight differences between recorded measurements and some meaningful baseline. This baseline is often represented by the x axis, which intersects the y axis at a zero point that sits approximately halfway up the y axis.
Inaccurate displays
We do not perceive area as accurately as we perceive length, particularly when comparing areas of different sized circles or irregular shapes. For example, we can fairly accurately judge whether a bar is twice as tall as another, but cannot judge whether a circle is twice as large as another. Comparative circle size (‘bubble’ charts) may be acceptable for conveying a general impression – that is, ‘this one is bigger than that one’ – but not for accuracy. Consider using bars instead if accurate perception of the relative sizes is critical.
Using pictures to replace bars on graphs (i.e. an illustrated graph) should also be done with care – if a picture of a tree is scaled up proportionally, are readers expected to consider only the change in height or the change in area as well? Also, it may be unclear where the actual data value is in the picture – is it the tip of the highest blade on a wind turbine or is it the top of the main tower?