Analyzing a Stacked Bar Chart

March 8th, 2010

Below you will see a stacked column (vertical bars) chart that has nine different segments covering eight months.  I have absolutely nothing against HubSpot and actually think they have some great services, tools and products.  I am simply using their chart to illustrate the problems with stacked bar charts and some alternatives.

HubSpot_Reach Stacked Bar Chart 

I can think of a few reasons off the top of my head as to why people would use stacked bar charts. 

  1. To show how each segment changes over time
  2. To illustrate parts of the whole at any given time
  3. A combination of 1 and 2 above

The problem is that a stacked column chart is not good for either of these requirements.  My feelings on these charts is not really anything new, as I’ve stated before on this post and also this post.  In my opinion, after you get beyond two series with like scales, a stacked column chart is pretty and pretty useless. 

Read the rest of this entry »

Waterfall Charts

March 1st, 2010

The two charts below show the S&P 500 Net Income by Sector for both 2008 and 2009 and recently appeared in BusinessWeek.  What really caught my attention is that these column charts are a little bit like a waterfall with the only difference being that these start from zero.  Prior to using the Waterfall chart utility, I created these by hand.  The workaround to get a hidden or shaded set of bars is more difficult than it really needs to be.

2008 S&P Chart

2009 S&P Chart

[source]

Below you will see my versions of the charts using the same data and Excel.  I didn’t include the gray shaded series because I do not think that it adds any value.  Also, I do not have the text box calling out the title and final value because there is a column at the end that shows what the value is upon finish.

Read the rest of this entry »

Support Analytics e-Store

February 22nd, 2010

Over the past few years I have accumulated a bunch of different resources that I find extremely valuable.  Most of these utilities/products are for Excel, but not all of them.  Historically, you could find an ad or link to these products scattered throughout my blog pages.  I have combined all of the affiliate products I support onto one page that is called e-Store.  Now the main pages of this blog should appear less cluttered.

estore

You can find this page by clicking here or by clicking on the e-Store link in the header of this blog.  There are some great utilities, Excel add-in products and e-books that you should check out.  Many of them will make life easier and some will enlighten.  Here is a summary list of what is available.

  1. Waterfall Chart (Excel Add-in)
  2. Box and Whisker Chart (Excel Add-in)
  3. Dot Plot Chart (Excel Add-in)
  4. Learn Excel Formulas (Tutorial)
  5. Dashboard Reporting with Excel (Tutorial)
  6. MicroCharts (Excel Add-in)

If you have any products that you would like to have included on this new page, please contact me.  The links above contain affiliate products that, if purchased, generate a small commission for Support Analytics.

Online Data Visualization: Tableau Public Released Today

February 11th, 2010

A special thank you goes to Elissa Fink of Tableau for providing me with a demo of Tableau Public.  With being a big fan of Tableau desktop, it should be no surprise that Tableau Public impressed me for many of the same reasons I like the desktop version.

tableau-public-homepage-screen-shot

Today is the official release of Tableau Public and the latest version of Tableau Desktop 5.1.  Below you will find some quotes from today’s press release:

SEATTLE, WA, February 11, 2010 – Tableau Software today launched a new product that brings public data to life on the web. Tableau Public, available for free, lets anyone who posts content to the web easily create interactive visualizations and publish them to blogs, web sites, Twitter feeds or anywhere online. Instead of viewing static charts or tables, Tableau Public lets people answer questions and share data interactively on the web.

Current alternatives for sharing data online are clumsy. Typically, data is pasted into tables and lists, or posted as files or catalogs that are difficult to use. Available at Tableaupublic.com, Tableau Public is helping to solve this challenge – bringing data to life on the web for ordinary people. With its interactive visualizations and dashboards, Tableau Public helps people start conversations based on data that is useful, beautiful and shareable. No special plug-ins are required, all that’s needed to see and use the data is a web browser.

In conjunction with the general availability of Tableau Public, the company is also releasing today Version 5.1 of its Tableau Desktop and Tableau Server product suite. Version 5.1 provides more analytic richness, better publishing, and increased scalability and performance. Analytical features include reference bands that provide context to a user’s analysis, bullet charts to evaluate related data, and intelligent data labels to call out the most critical data. New publishing features include rich formatting, streamlined toolbar design, more filter options, and a flexible layout.

I have yet to try out the newest version, 5.1, but I did see that it will have the ability to produce Stephen Few’s famous Bullet Charts without any tricky workarounds.  I can tell you that I’m excited to start using version 5.1.

Read the rest of this entry »

Data Analysis & Visualization on Facebook

February 9th, 2010

I wanted to take a minute and announce the Support Analytics Facebook Fan site.  The screen shot below shows the link on all of the blog pages and it also shows the fans.  Simply click on the section of the sidebar to become a fan.

SA Fan Box

Below is a screenshot from the fan site.  One of the reasons I created this site was to bridge the gap between twitter and this blog.  By that I mean the following. Twitter only allows 140 characters, making any kind of meaningful interaction limited.  On the other end, this blog has much lengthier posts.  By now having a Facebook fan page, this will allow another medium to exchange ideas that can be short and quick. 

In the screen shot below, you can see the tabs across the top of the fan page that contain a News tab and Twitter tab.  The News tab streams this blog’s RSS feed to Facebook while the Twitter feed goes to the Twitter tab.  Now all of the information is streamed to one source that many people are already using.

image

On the Fan page I will be sharing ideas, tips and insights that I may not post here on the blog.  I recently posted a link to a page on Tableau’s new product that will be released soon. Please take a minute to check out the Fan page and become a fan!

Information Visualization or Data Visualization, You Decide [Poll]

February 4th, 2010

In the February 1, 2010 edition of BusinessWeek, I found the illustration shown below.  It certainly contains some good information and data that can be easily read.  My question is, would you consider this a Information Visualization, Data Visualization, Information Graphic, Statistical Graphic, Chart/Graph or something else and why?

President Data

Sorry about all the polls lately, but I think this is an easy way to respond without having to comment. 

What would you consider this illustration from BusinessWeek?

View Results

Loading ... Loading ...

Also, please feel free to comment with additional insights.

Note: RSS readers have to go to the website to view the poll.

Online Data Analysis and Visualization Tool [Poll]

February 2nd, 2010

Not too long ago, I got a tip from someone on Twitter with a link to a site called Verifiable.com.  Upon further investigation, I learned that this site is similar to Manyeyes.com in that you can upload a data set and, using the tools on the site, create data visualizations. 

At first glance, the site seems somewhat plain.  After digging into what the site is about, I quickly learned that they utilize sound and popular theory in the data visualization field.  On their about page, the first line that explains the features is:

“A clean, low-chartjunk philosophy — no shadows, no pie charts, no 3-D bar graphs, just the ink you need. [verifable.com]”

Well, simply reading that peaked my interest because they use similar principles that I follow when creating charts/graphs.  No frills.  I like the fact that you can create charts that don’t have excessive grid lines, shadows, weak labeling and limited charting options.  Below you will see a few examples from their site.  You can also follow the links to see the visualizations in an interactive environment.  As you will see, there is a lot of data (hover over), many different options and some good visuals.  Granted some of the charts I had no idea what they were trying to show, but in general this site gives you a seemingly good tool to apply charting/graphing best practices.

Major League Baseball Payroll Efficiency 2006-2008 

[Interactive version]

U.S. Unemployment Rates by Education, 1992-Latest

[Interactive version]

Verifiable also offers a Pro version of their tool where you can keep your data and visualizations private and receive premium support.  The cost is minimal with the Pro version going for $29.95/year. 

I didn’t try to upload a data set to give the site a full trial, but it definitely looks interesting.  I am not sure how much demand there is for online data visualization using a tool like Verifiable.

Mean and Median – Part 2

January 19th, 2010

Back in December I wrote a post about using the arithmetic mean and median when analyzing data.  This is a follow up post that shares some insights from Naomi Robbins, author of Creating More Effective Graphs.  The following paragraph was Naomi’s response to my question, “what would you provide if asked what is a typical customer size?”

“I’d give the median together with the graph below. The graph, modeled after Figure 4.1 of Creating More Effective Graphs, shows all the data, as Jon suggested. I’d say something like:

Median-Mean Chart

"I provided the median (shown by the black line in the figure) rather than the mean (shown by the light cyan line) since as you can see from the figure, the mean is not a typical value. There are no actual customers who have revenues near to the mean value because customer 5 influences the mean so strongly since its revenues are so much higher than the others. Half of our customers have revenues less than the median while the other half have revenues greater than the median. The middle half of our customers have revenues that are between the dotted lines."

For a slightly larger customer base I’d jitter the points. For a much larger customer base I’d replace this strip plot (also called a one-dimensional scatter plot) with a box plot. By box plot, I mean a Tukey box plot. I object to every software program and every author redefining box plots so that the reader can’t read them without an explanation.

The figure was drawn with R. However, it is easy to reproduce it using Excel or other software. [Robbins]

I think this goes back to one of my original points, which was that many people just to provide the mean and it can be very misleading.  The graph above that Naomi provided illustrates this point clearly.  The light cyan line [mean] isn’t even close to the majority of the data points.  The median is much more representative of a typical customer value, but also not perfect.  Combine the median, mean, quartiles and actual values and now you’re providing real value.  Looking at this chart, it clearly shows the grouping of typical customers, outlier and where the median and mean fall.  Thank you Naomi for the insights!

If you are interested in providing a guest post, please contact me for more information or to submit a proposal.

Happy Holiday’s, Merry Christmas and Happy New Year!

December 23rd, 2009

There is quite a lot going on this Holiday season and I hope to be back with a more regular posting schedule after New Year’s.  I hope you have a safe and happy Holiday(s)! 

Data Analysis – Do You Really Mean Average?

December 17th, 2009

In the corporate world I see this issue quite frequently.  Specifically, I will hear a request where the verbiage doesn’t align to what the requestor is ultimately looking for.  To illustrate, I have included an example below that shows ten different customers within a territory.  For each customer the total revenue year-to-date is listed.  To make the illustration relevant for this example, I listed Customer 5 with revenue that is exponentially higher than the rest. 

Now here’s the question I typically hear:

"What is the average customer size (revenue) for Territory A?"

Here is what that really means most of the time:

"What is a typical customer size (revenue) for Territory A?"

You may think it’s semantics, but it’s really not.  I don’t want to turn this into a statistics lesson, but average (mean) doesn’t always translate into typical.  Because Customer 5 is such an outlier, the average (sum of all customer revenue divided by count of customers) will be higher than if that customer fell into the typical range like the rest.

I have included the median revenue amount for the ten customers, which I think is probably a better predictor (in general) than the mean or average.  The median is simply defined as the number in the middle.  In reality, Customer 5’s revenue could be 875 zillion dollars and the median amount wouldn’t change.  When there are thousands of records and you need to know what the typical amount is, it’s often safer to choose median unless you want to take the time to calculate min, max, median, std deviation and mean to compare.

"In probability theory and statistics, a median is described as the numeric value separating the higher half of a sample, a population, or a probability distribution, from the lower half." [source]

Now the real question that would need to be answered is, can a typical territory have one very large customer or is this a unique situation and should not be considered normal?  Answering the preceding question will make all the difference in what calculation to use.  Most often I will include both.

Median vs. Average Example

It’s my belief that most people are simply familiar with the term average because it’s so commonly used.  The underlying reason that average is more prevalent in analysis is probably due to the fact that it’s very easy to calculate.  Before spreadsheet software was available that automated the median calculation, it was much more difficult to get a median amount even with a calculator.

As a data analyst, it’s prudent to know the difference between mean and median and when each is applicable.  Telling the CEO/CFO that the typical customer is roughly $131,000 when one customer is atypical and the true amount is more like $57,000 can be a career changer.