Mean and Median – Part 2
Back in December I wrote a post about using the arithmetic mean and median when analyzing data. This is a follow up post that shares some insights from Naomi Robbins, author of Creating More Effective Graphs. The following paragraph was Naomi’s response to my question, “what would you provide if asked what is a typical customer size?”
“I’d give the median together with the graph below. The graph, modeled after Figure 4.1 of Creating More Effective Graphs, shows all the data, as Jon suggested. I’d say something like:
"I provided the median (shown by the black line in the figure) rather than the mean (shown by the light cyan line) since as you can see from the figure, the mean is not a typical value. There are no actual customers who have revenues near to the mean value because customer 5 influences the mean so strongly since its revenues are so much higher than the others. Half of our customers have revenues less than the median while the other half have revenues greater than the median. The middle half of our customers have revenues that are between the dotted lines."
For a slightly larger customer base I’d jitter the points. For a much larger customer base I’d replace this strip plot (also called a one-dimensional scatter plot) with a box plot. By box plot, I mean a Tukey box plot. I object to every software program and every author redefining box plots so that the reader can’t read them without an explanation.
The figure was drawn with R. However, it is easy to reproduce it using Excel or other software. [Robbins]
I think this goes back to one of my original points, which was that many people just to provide the mean and it can be very misleading. The graph above that Naomi provided illustrates this point clearly. The light cyan line [mean] isn’t even close to the majority of the data points. The median is much more representative of a typical customer value, but also not perfect. Combine the median, mean, quartiles and actual values and now you’re providing real value. Looking at this chart, it clearly shows the grouping of typical customers, outlier and where the median and mean fall. Thank you Naomi for the insights!
If you are interested in providing a guest post, please contact me for more information or to submit a proposal.
Most Commented Posts

January 19th, 2010 at 7:56 am
[...] This post was mentioned on Twitter by Tony (SA), David Gerbino. David Gerbino said: Required reading RT @dsainsights: New: Mean and Median Part 2 http://bit.ly/5hjkiB #analytics [...]
January 19th, 2010 at 10:05 am
My black line appears gray here so please make the appropriate substitutions when reading this post.
[Reply]
February 2nd, 2010 at 5:23 am
Another reason I prefer medians to means is that means don’t lead anywhere very interesting, while medians lead you easily into the wonderful world of quartiles, percentiles, and other quantiles. The thing about quantiles in general is that they can provide an arbitrary level of complexity between the simple measure of central tendency, and the full account of every data point, in a way that is easy to explain in English (upper quartile = “a quarter of all points were greater than this and three quarters were less”).
I exaggerate slightly about means, since you can add standard deviation, but it gets very hard to explain the meaning of such things even at that level, let alone any extra measures of statistical distribution.
[Reply]
February 2nd, 2010 at 9:01 am
Hi Derek, thanks for commenting. I miss the posts on your site.
I don’t think you exaggerate all that much on the means. I may not always present standard deviation, but I always look at it to understand the data better. I think this has been a great discussion and hopefully opens people up to going beyond average/mean. Thanks again Derek!
[Reply]