Archive for the ‘Chart Review’ Category

Analyzing a Stacked Bar Chart

Monday, March 8th, 2010

Below you will see a stacked column (vertical bars) chart that has nine different segments covering eight months.  I have absolutely nothing against HubSpot and actually think they have some great services, tools and products.  I am simply using their chart to illustrate the problems with stacked bar charts and some alternatives.

HubSpot_Reach Stacked Bar Chart 

I can think of a few reasons off the top of my head as to why people would use stacked bar charts. 

  1. To show how each segment changes over time
  2. To illustrate parts of the whole at any given time
  3. A combination of 1 and 2 above

The problem is that a stacked column chart is not good for either of these requirements.  My feelings on these charts is not really anything new, as I’ve stated before on this post and also this post.  In my opinion, after you get beyond two series with like scales, a stacked column chart is pretty and pretty useless. 

(more…)

Waterfall Charts

Monday, March 1st, 2010

The two charts below show the S&P 500 Net Income by Sector for both 2008 and 2009 and recently appeared in BusinessWeek.  What really caught my attention is that these column charts are a little bit like a waterfall with the only difference being that these start from zero.  Prior to using the Waterfall chart utility, I created these by hand.  The workaround to get a hidden or shaded set of bars is more difficult than it really needs to be.

2008 S&P Chart

2009 S&P Chart

[source]

Below you will see my versions of the charts using the same data and Excel.  I didn’t include the gray shaded series because I do not think that it adds any value.  Also, I do not have the text box calling out the title and final value because there is a column at the end that shows what the value is upon finish.

(more…)

Rainbow Chart – Twitter Messages Per Day

Monday, December 14th, 2009

Below is a great example of the wrong use of color in a column chart.  Use color to differentiate between segments, but don’t use it when time is on the x-axis for the different days.

A better use of color may be for each quarter within the year.  Using the chart below, it would make more sense to have every first week of the month always in one color, like blue.  Then, at least you could easily compare the first week of each month quickly.  I’m not even going to touch the chart title not matching to what is actually being displayed in the graph – days vs. weeks.

You really can’t make the color mistake if you used a line graph, just saying.

image

[Source]

Business Intelligence Vendor Size is Important

Tuesday, December 1st, 2009

The most recent copy of Information Management had the image below on page 8.  What’s funny is the person figure on the left looks like it’s wearing pants.  Oh wait, those aren’t pants, the blue is part of the data visualization.  The person on the right looks to be wearing orange work boots or ski boots for that matter.  The article by Julie Langenkamp is interesting and discusses how small vendors tend to rank much higher than large vendors in product support and other areas.

Person chart

[image source]

112009_pendse_fig2 

[image source]

It appears that small vendors scored better than large vendors in every single category of complaints as shown in the chart above.  In the chart below, you will see that small vendors appeared to provide more benefit to the customer/client than large or medium vendors.

Benefits

[image source]

There’s a lot more to the article if you are interested in business intelligence. 

Investment Growth Chart

Tuesday, November 24th, 2009

One of the benefits I truly enjoy is having USAA as my insurance company.  It only took me one phone call to their customer service center to know why they consistently rank in the top of companies for customer service.  I can think of a few big companies that could learn a lot from how USAA treats their customers/policy holders. 

In their recent magazine, I quickly noticed the chart below that is called, "The Snowball Effect" while flipping through.  The heading that was cut off states the following:

"What’s the hardest-working investment tool you can use? The power of time. Beth, Bob and Bridget all invested $2,500 at the same 6 percent rate of return.  But see how compounding made Beth’s account grow? That’s the value of starting early."

Going back to high school and college one of the first lessons one learns is the time value of money and compounding effect.  I won’t get into that, but what I did want to touch on is the chart below that left me speechless for a bit.  There are so many things wrong with it that it wasn’t even worth taking out my red pen.  

Investment Growth Chart 

I recreated the data from the chart in Excel (shown below) and used fictitious numbers for the middle of the graph.  Each person starts with the same money and each has an end amount.  So I basically filled in the blanks.   I know my chart doesn’t have Beth with her arms raised in celebration or decimals, but it’s definitely a lot cleaner and easy to understand.  This isn’t rocket science, is it?

Investment Growth Chart

Gradient Fill and Deception with Charts and Graphs

Tuesday, November 10th, 2009

Below you will see a column chart that appeared in the weekend’s print edition of the Baltimore Sun.  It’s no secret that they used a gradient fill on the columns to give it the fading appearance.  I’m not a big fan of the gradient fill on the 2009 columns, but this could work for the previous year’s numbers (2008) if the intent was to minimize the prior year.  I doubt that was the case as I’m sure they were trying to make the chart "pretty" or different than the default setup.

BS Unemployment Chart 

Below you will see a replica that I made using Excel and the fill effects formatting option.  It looks alright, but something still isn’t right.  What is the problem with this chart?

BS Chart Replica

The problem is the y-axis and the scale that was used.  I don’t think this is a straight out misrepresentation in order to mislead, but it could be.  That’s the risk you face when manipulating the axis.  Yes, the columns take up a lot of space when the axis starts at zero, but that’s the correct method here.  To help illustrate my point, check out the exact same chart (below) with the y-axis starting at zero.

BS Chart Replica - Axis

This version using the correct axis setting accurately shows that October, year-over-year, is not three times as much, but only about 1.5 times greater.  Also, look at the trend of the first replica chart.  The upward trend definitely has a greater slope compared to the replica with the correct axis.  To help prove this visually, check out the side-by-side comparison below using a trendline in the chart.  The slope of the chart on the left is much greater than the one on the right.  If you were presenting this data in something like PowerPoint or SlideShare, and quickly went to the next slide, the audience might not catch the axis starting at 5 and the steep trendline would be the point taken from the data.

BS Chart Replica - Slope

Furthermore, forget the gradient fill and go with something like the chart below if you want to highlight the current year.

BS Chart Replica - Color 2

Visualizing Multiple Data Series’ in a Chart

Wednesday, October 28th, 2009

A little while back I featured the stacked column chart below in a post that discussed the ineffectiveness of its design.  More often than not, a simple multi-series line graph can do a better job at visualizing data compared to a stacked column chart.  The other option is to go with is a panel chart, also known as small multiples. In the R program, this type of visualization is more formally called a Trellis Display. 

Besides the overpowering and inconsistent labels, I think the Baltimore Sun did a good job with its recent display of the percent change in the number of passengers from 2008 to 2009.  It may have been better to stick with the airport codes, like BWI, instead of writing out the airport names, but let’s not nit pick.  I bring this topic up again because I think small multiples or panel charts can be much more effective at visualizing data and, in my opinion, are under utilized in the business world.

stacked-bars

[source]

BWI Chart

[source]

There has been some great work done by a few experts in the Excel and R fields on creating panel charts.  Here are a few resources that have examples and information on how to create panel charts in Excel and Trellis Displays in R.

Data Visualization – Don’t Overcomplicate Charts & Graphs

Thursday, September 24th, 2009

One blog that I visit regularly deals with creating better PowerPoint presentations and is written by Jan Schultink.  I am in the data analysis and visualization business, but typically use presentation software to run meetings, webinars or communicate data analysis results.  So, creating effective and powerful presentations is important so the hard work that goes into the analytics isn’t lost in the communication.

Below you will find the data visualizations that Jan showed on his site.  I would agree that the second image is better at showing the difference versus the first one, but not by much.  The probability of 1 in 76 is definitely more meaningful (puts it into context) than the 1.3% data point.  What I have an issue with is that the area of both columns (2nd image below) is almost the same with the one on right being slightly larger.  What makes the two columns represent different values is the number of spheres within each column.  The one on the left contains 8,000 circles (I’m assuming because they are not very countable) while the one on the right has 76 circles.

Honestly, I’ve never created a data visualization like this to even know how to make this chart with so many spheres that are microscopic.  My guess is that it could be done using Adobe Illustrator or similar software.

Risk of maternal death as a percentage:

Data-point-percentages

Risk of maternal death as 1 in x number

Data-point-ratios 

[source]

Here’s where I think we can improve the visualization without losing effectiveness.  In a fairly popular post that I did a while back, I suggested the best method for showing a single data point is the following image.  I think that the 1 in 8,000 compared to 1 in 76 data points are powerful enough and take up very little space to convey the message.  Another option would be to compare the two data points and only show the difference as a single metric.   It would go something like this:

Illustration for showing a single data point:

Single Comparison Point

So my question would be – which method do you think would be more effective in a presentation?

Here’s a Quick Way to Improve Your Charts

Friday, September 4th, 2009

In a recent BusinessWeek article, the chart below was show to visualize the difference between Q2 ‘08 and Q2 ‘09.  Looking at the chart, a reader can do a pretty good job at seeing the difference between Q2 from one year to the next.  Logically, the more recent value is on the right and in green with the older value on the left in black.  There is a bit of highlighting in gray done to illustrate the overall S&P500 value and change from year to year.  So far so good, right?

One question that I often ask myself when performing a year over year or period over period analysis, is if the values really matter or just the change.  Sometimes the values for Q2 of this year and last year are important and sometimes only the change matters.  A good example of when the values do matter is typically when showing market value changes.  It’s often important to know where a company falls in the overall market as well as how they are trending or changing over time.  Many times showing the values will make the most sense and give the reader a full picture.

On the other hand, sometimes the total values don’t matter as much and you just need to highlight the changes.  In this scenario, it’s often cleaner and more effective to simply show the change from year to year as a percentage or value.  This removes the extra data series and also gives a clear picture of the change where the graph below does not.  For example, can you tell if the year-over-year change for Information Technology is the same as Energy in the chart below?

2 Series Column Chart

[Source]

Now, if only the change was shown, the values would be both positive and negative.  You could then sort in descending order to see which sectors did the best and those that did the worst.  I often see charts like the one above, which are well designed.   I rarely see a chart with only the change.  To meet somewhere in the middle, I see charts like the one here, but with the percent change as a label for each sector.

Below the chart in BusinessWeek appeared this text:

“Profit margins at health-care, utility, consumer staples, and consumer discretionary companies have actually improved in the past year.”

By only showing the change in descending order, the sectors highlighted by the above text, would be much easier to see and would include change amounts.  One other note – I think the wider gap between Materials and Telecom is just an oversight. 

What do you think – show only the change, show the values or show both the values and change?

Think Before You Stack

Wednesday, June 24th, 2009

I hate to beat this to death, but I continue to see charts like the one below featured in excellent periodicals like BusinessWeek.  As I said in a previous post, I really dislike stacked bar charts, ESPECIALLY when time is on the x-axis.  To help illustrate my point, here are a few questions:

1) How much did the Subprime (yellow) dollars change from month to month or between any points?

2) Is there even a value for Subprime after about March 2013?

3) How many times do you have to reference the legend to figure out which color is what label?

The only value that is easy to visualize and analyze is the Agency data that is in black and the first data series.  Beyond Agency, this data visualization is useless.  I tried to get the data to make a better chart, but couldn’t locate it.  Instead I ask that you visualize 7 lines in a line chart, one for each value and the 7th one for the total.  Using that chart, you could easily see the trend in each value and also the overall change. 

Yes, this chart is colorful and caught my eye, but it’s also worthless. 

stacked bars

[source]

Below is another example using similar data where the data visualization is completely ineffective. 

CS_Default

[source]