Box plots show the distribution or variation of a measure across multiple categories, groups or time intervals.
Typical box plot
Box plots are typically a rectangular shape (like a bar), with a line somewhere within the box that represents the centre of a distribution of data values – typically the median. The length of the box represents the range of values in the distribution, with the upper bound equalling the highest value and the lower bound equalling the lowest value. Thus, long boxes indicate a wide distribution of data values for that measurement. Boxes can be drawn horizontally or vertically.
Box plots are used in many scientific fields for multiple trials of the same experiment, to show consistencies (or inconsistencies), and multiple trials of different experiments, to show changes, trends or patterns.
Box plots may have lines extending from the boxes called whiskers. Box-and-whisker plots are a specific kind of box plot that show the distribution of data values with box points – the first middle point (median) and the middle points of the 2 halves (25th and 75th percentiles) – and also show other values, such as the minimum and maximum of all the data:
Alternatives to box plots
If the distribution or variation of a measure is not a key message of a graph, and mean or median values will sufficiently and accurately convey the necessary information (e.g. change over time or differences across groups), consider presenting these summary values in ways that are more easily understandable to readers. These are horizontal bar graphs for discrete groups, vertical bar graphs for small time series and line graphs for extended time series.
Box plot versus bar or line graph
Typically, box plots should only be considered when more than 1 data point is available for the group or time point, and it is misleading or inappropriate to simply display summary data, such as mean or median values. Box plots can also be considered when the distribution or variation of data for a group or time point is of particular importance to the data message – for example, when there are differences in data variability over time or across groups.