Data Analysis – Do You Really Mean Average?
Thursday, December 17th, 2009In the corporate world I see this issue quite frequently. Specifically, I will hear a request where the verbiage doesn’t align to what the requestor is ultimately looking for. To illustrate, I have included an example below that shows ten different customers within a territory. For each customer the total revenue year-to-date is listed. To make the illustration relevant for this example, I listed Customer 5 with revenue that is exponentially higher than the rest.
Now here’s the question I typically hear:
"What is the average customer size (revenue) for Territory A?"
Here is what that really means most of the time:
"What is a typical customer size (revenue) for Territory A?"
You may think it’s semantics, but it’s really not. I don’t want to turn this into a statistics lesson, but average (mean) doesn’t always translate into typical. Because Customer 5 is such an outlier, the average (sum of all customer revenue divided by count of customers) will be higher than if that customer fell into the typical range like the rest.
I have included the median revenue amount for the ten customers, which I think is probably a better predictor (in general) than the mean or average. The median is simply defined as the number in the middle. In reality, Customer 5’s revenue could be 875 zillion dollars and the median amount wouldn’t change. When there are thousands of records and you need to know what the typical amount is, it’s often safer to choose median unless you want to take the time to calculate min, max, median, std deviation and mean to compare.
"In probability theory and statistics, a median is described as the numeric value separating the higher half of a sample, a population, or a probability distribution, from the lower half." [source]
Now the real question that would need to be answered is, can a typical territory have one very large customer or is this a unique situation and should not be considered normal? Answering the preceding question will make all the difference in what calculation to use. Most often I will include both.
It’s my belief that most people are simply familiar with the term average because it’s so commonly used. The underlying reason that average is more prevalent in analysis is probably due to the fact that it’s very easy to calculate. Before spreadsheet software was available that automated the median calculation, it was much more difficult to get a median amount even with a calculator.
As a data analyst, it’s prudent to know the difference between mean and median and when each is applicable. Telling the CEO/CFO that the typical customer is roughly $131,000 when one customer is atypical and the true amount is more like $57,000 can be a career changer.
