Dataviz logo
Oxford Consultants for Social Inclusion logo

Case study: Practical steps for improving visualisation

Principles of good visualisation

In the previous guide to what is good visualisation, we highlighted three key principles of visualisation:

  • Design for your audience: Think about how to emphasise the key point(s) that you are trying to convey to this audience with this particular visualisation
  • Accurately represent the data: The visualisation should show the underlying data without distortion, and avoid common pitfalls that obscure the real information.
  • Keep it clear: The visualisation should focus on the message(s) for the audience, and all visual clutter kept to a minimum (except where useful to highlight key points).

Practical steps for good visualisation

In this guide, we set-out 13 practical steps for good visualisation. Some of the steps are straightforward to implement, for example ensuring that you are not using 3D effects that hide the data. Others require more work, for example testing your visualisation with key audiences. For each, we show before and after visualisations.

Design for your audience
  1. Test your visualisation with the key audience
  2. Know when to use charts, and when to use tables
  3. Limit the number of categories shown in a visualisation - be selective in what you present and emphasise the key message(s)
  4. Try to avoid using pie charts (unless your audience really does not like bar-charts!)

Accurately represent the data

  1. Keep the zero on the axis scale
  2. For bar-charts, set the base of the bars to zero (not the lowest value)
  3. Avoid varying the size of objects in graphs, except to convey difference in values
  4. Avoid using line charts where data is only available for a small number of timepoints
Keep it clear
  1. Avoid using visualisation effects such as 3D that can hide the data
  2. When choosing colours to use, limit the number of colours used and ensure that different colours can be distinguished
  3. Where colour is needed, use solid blocks of colour and avoid fill patterns
  4. Avoid using strong or bold colours for the background in a visualisation
  5. When creating choropleth maps, choose colours to help users identify patterns and relationships between areas

1. Test your visualisation with the key audience

In Leicestershire, crime data at ward level was presented to local decision makers as a bar-chart showing rates per 1,000 population, with annual targets shown as an overlaid line-chart (see figure below). Leicestershire crime data at Ward level

The graphic presents detailed data on crime levels and targets, however several weaknesses were identified when using this visualisation to present data to decision-makers

  • The graphic only shows one crime type, for one year - so there is no information on trends, or how this crime data relates to other crime data
  • Although easy to pick out the highest crime levels (tallest bars), it is less easy to quickly identify the picture in other areas
  • Key points picked-up by decision-makers from the graphic were the actual crime figures ward-by-ward. As a result, discussions could focus on the data values rather than the broader trends and patterns.

Leicestershire 'dot' graphic

Leicestershire researchers identified that using a 'dot' graphic was more appropriate for decision-makers (see figure above). The graphic represents crime levels by how much the dot is filled in, and shows data at ward level for 5 major crime types (and all crime), with 3 year trends shown.

The dot graphic has the following advantages over the bar-chart for presenting crime data to decision-makers:

  • Shows trends in crime levels over time, and allows meaningful comparisons between different crime types, for all wards (6 crime types, 15 wards, 3 years data mean there are 270 pieces of information on display)
  • Easy to identify where (and when) crime levels are high - "more orange means more crime"
  • Enables decision-makers to focus on trends and priorities, rather than making sense of actual crime figures ward-by-ward

This example was taken from Audit Commission (2008), In the know: Using information to make better decisions, a discussion paper.

2. Know when to use graphs, and when to use tables

Gaphs and tables both have their uses. Tables are useful when:

  • You want to show individual data values to the audience, and allow them to compare between individual values
  • Precise values are required
  • The quantitative information to be communicated involves more than one unit of measure

Graphs are useful when:

  • You want to reveal relationships between multiple values, for example broad comparison of trends over time or differences between areas
  • General patterns are the key point you want to present, rather than the exact data values (as in the Leicestershire example of presenting crime data, shown above)

3. Limit the number of categories shown in a visualisation - be selective in what you present and emphasise the key message(s)

Viewers have difficulty discriminating between too many categories on a single visualisation. Where there is a large amount of data to be compared a data-table may be more useful, with clustered bar-charts used to illustrate some of the key findings. If it is difficult for users to pick out the key points from your visualisation, consider showing a subset of the important data.

Before

After

Before: Bar chart with a large number of categories After: Bar chart with fewer categories for easier comparison
  • The tables above both compare cancer mortality rates against other key causes of mortality, for a set of areas.
  • "Before": each of the key causes of mortality is presented on a single graph and it is difficult for the viewer to compare across areas or groups.
  • "After": The researcher has decided to present only the cancer and heart disease mortality rates for the areas, and moved the other datasets into a data-table

4. Try to avoid using pie charts (unless your audience really does not like bar-charts!)

Simple or stacked bar-charts are more effective than pie charts for presenting data that shows how indicators are composed of sub-indicators. Some information visualisation researchers suggest that pie-charts are not useful for comparing between data values, and should not be used at all!

Before

After

Before: Comparison is difficult using pie charts After: Comparison is easier using a 'scarf' chart
  • The two figures above compare the ethnic composition of two areas
  • "Before": uses two pie charts. However, it can be difficult to compare the composition of the two areas using the size of the pie segments, unless the data values are read from the chart. In addition, the designer has shown the data to five decimal places, making it harder for viewers to quickly read the data values.
  • "After": using a form of stacked bar-chart. The addition of the comparison lines turns this into a 'scarf' chart. Parallel comparison lines show that the two groups are equal-sized, while comparison lines that diverge or converge show that one group is larger than the other. For example, it is easy to see that Asian groups make up more of the population in area B (even without referring to the data labels)
  • An alternative option would be to use a clustered bar-chart showing the four ethnic groups side-by-side for the areas.

5. Keep the zero on the axis scale

Visualisations that do not show zero on the axis are likely to exaggerate differences between values

Before

After

Before: Non-zero axis After: Axis that starts at zero
  • The two figures above both show the change in overall employment rate over the period 1991 to 2009 for two hypothetical areas
  • "Before": does not show zero on the axis. As a result, small differences are exaggerated - and the small changes from 1991 to 2009 can appear very significant.
  • "After": more accurately presents the data, showing that the changes for Area A and B are very small over the period, relative to the indicator values.

6. For bar-charts, set the base of the bars to zero (not the lowest value)

Bars that represent data values should always start at zero. Basing the bar on any other data value either exaggerates or hides differences between data

Before

After

Before: Chart that does not use zero as base After: Chart that uses zero as base
  • The two figures above both show the change in the number of Jobseekers Allowance (JSA) claimants between 2007 and 2008 for a set of neighbourhoods.
  • "Before": the designer has set the lowest value as the base of the graph. This results in a misleading representation of the data, hiding the small reductions in some areas and emphasising the increases in other areas.
  • "After": uses zero as the base of the bars, so accurately represents the data values as the height of the bars, and allows the positive and negative values to be easily distinguished. In addition, the bars have been sorted by data value rather than the name of the area, again making it easier to quickly identify those areas faring particularly well or particularly badly over the 12 month period.

7. Avoid varying the size of objects in graphs, except to convey difference in values

Varying the size of chart objects, such as the area or volume, implies to the user that something is different about the data

Before

After

Before: Bar chart with varied bar widths Afer: Bar chart with equal bar widths
  • The two figures above both show the proportion of people travelling to work by car and train for the same set of areas.
  • "Before": the wider bars can lead users to think that the data values are larger for the car dataset, even where height is the only dimension used to represent data values.
  • "After": correctly displays the same width for both sets of values.

8. Avoid using line charts where data is only available for a small number of timepoints

Trends should not be plotted with 2 or 3 data points - as viewers are likely to generalise to a long-term trend. If there are more data points, these should be shown. If there are no more data points, designers should consider using a bar-chart to present the data.

Before

After

Before: Small number of Timepoints After: Greater number of Timepoints to clarify trend
  • The examples above show change over time in the proportion of people who have been victims of crime in the last year.
  • "Before": the data is shown for three timepoints only. The data suggests that the level of crime is rising sharply.
  • "After": presents the data for a greater number of timepoints, which makes it easier to see how the upward trend relates to past fluctuations in the data.

9. Avoid using visualisation effects such as 3D that can hide the data

Adding effects like 3D should only be used when absolutely necessary. The extra dimension can hide data trends, and distract viewers from the data values. "Data graphics should draw viewers attention to the scene and substance of the data not the quality of the graphical art"[1]

Before

After

Before: Bar chart with 3D After: Bar chart without 3D
Before: Line chart with 3D Before: Line chart without 3D
  • The two figures above both compare housing tenure rates for the same set of areas.
  • "Before": 3-D shading is used to add an additional dimension of depth. This might look nice, but does not add any information - and can obscure differences in the data. For example, can you identify which year the population of Outer London overtook that of Outer London?
  • "After": the values are easier to interpret without the perspective illusion.

Using a third axis allows users to explore an extra variable in bar-charts. However, it can be very difficult to read and interpret values. "Simulating 3-D space on a 2-D surface works nicely for paintings or technical illustrations, but quantitative values cannot effectively be communicated in this manner"[2].

Before

After

Before: Bar chart with third axis After: Bar chart with no third axis
  • The two figures above both compare housing tenure rates for the same set of areas.
  • "Before": A third axis is used to differentiate between the values of the three different datasets across each of the areas. The middle row effectively obscures the row behind, and the perspective makes it difficult to read the data values for those columns that are visible.
  • "After": A clustered bar-chart is used to present the data, with colours used to differentiate between the three different datasets across each of the areas. It is easier to compare the characteristics of the areas on each of set of datasets.

10. When choosing colours, limit the number of colours used and ensure that different colours can be distinguished

"...avoiding catastrophe becomes the first principle in bringing color to information: Above all, do no harm." (Envisioning Information, Edward Tufte, 1990).

Limit the number of colours shown on a visualisation. The human eye can only distinguish between a few different colour shades on a single visualisation. Explore using hue (colour), value (the lightness or shade of colour), and chroma (the intensity or saturation) to highlight differences in datasets. A useful source to look at is " Colors for Data Visualization", from http://www.perceptualedge.com/articles/b-eye/choosing_colors.pdf

When using colours in visualisations, ensure that colours are sufficiently contrasting to distinguish between data values. For example, when comparing two values on a bar char, it is important that the colours that are used are sufficiently distinct from one another (particularly if users may print off in black-and-white). Contrast can be used to highlight certain aspects of the data, but works best when only you are only highlighting one object. 10% of males and 1% of females are red-green colour-blind, so avoid using both of these colours in visualisations where colour is significant.

Before

After

Before: Similar colours for different categories After: More distinct colours for different categories
  • The two figures above both compare the qualification levels of males and females for a set of areas.
  • "Before": uses a similar colour scheme for both gender groups, as a result it is difficult to distinguish between the two datasets being compared. This is likely to be a greater problem if the graph is printed out in black and white.
  • "After": the colours are more distinct and it is easier to compare between gender groups, even when printed in black and white.

11. Where colour is needed, use solid blocks of colour and avoid fill patterns

Avoid fill patterns (e.g. diagonal lines) because they can create disorientating visual effects

Before

After

Before: Disorientating fill pattern effects After: Solid colour fills
  • The examples above both show the proportion of pupils achieving A grades in English, Maths and Science GCSEs for a set of areas.
  • "Before": uses fill patterns to distinguish between the performance in different exam subjects. The (extreme) fill patterns used distract the viewer, hiding the data. Even simple fill patterns such as diagonal lines can create "Moire" effects, causing the viewer to see a strobing effect and distracting from the data.
  • "After": uses blocks of solid colours to distinguish between the subject areas

12. Avoid using strong or bold colours for the background

Using bold strong colours for the background in a visual image can draw the users eye to the background, and distract from the data being presented. Below is an extreme example, but there are many similar examples where charts are designed, eg with organisation logos as a backdrop.

Before

After

Before: Bold background colours distract from the data After: Plain background helps data comparison
  • The two examples both show the proportion of people age 0-15 and aged 80 and over for a set of areas
  • "Before": the background and the gridlines are bold, while the colours used in the bars are faint and fade into the background.
  • "After": the background and the gridlines are faint whereas the colours used in the bars are bolder, so helping the viewer compare data values (and sufficiently distinct when printed in black and white).

13. When creating choropleth maps, choose colours to help users identify patterns and relationships between areas

The colour scheme used on choropleth (and isoline) maps should help users identify patterns and relationships between areas. For example, if a map is used to show values on a scale, viewers should easily be able to tell which areas have high or low data values.

Before

After

Before: Map with a random, high-contrast colour scheme After: Map relating data to colour intensity
  • The two examples above show the projected increase in the numbers of older people in London and the surrounding area.
  • "Before": A mixed high-contrast colour scheme is used. This can be overwhelming for the user who cannot easily identify patterns in the data, and needs to check back against the key to see whether particular colours represent high or low data values
  • "After": uses a fading red-to-grey colour scheme. Areas can be compared, and patterns identified, once the simple relationship "more red means more older people in future" is understood.
  • However, the mixed high-contrast colour scheme map might be appropriate to use, where there is no rank relationship between areas - for example if the map was used to visualise categorical data such as ONS Local Authority classifications.

Footnotes:

[1] Tufte (2001). The Visual Display of Quantitative Information

[2] Stephen Few (2004). Show me the numbers