Archive for the ‘Chart Review’ Category

Visualizing Multiple Data Series’ in a Chart

Wednesday, October 28th, 2009

A little while back I featured the stacked column chart below in a post that discussed the ineffectiveness of its design.  More often than not, a simple multi-series line graph can do a better job at visualizing data compared to a stacked column chart.  The other option is to go with is a panel chart, also known as small multiples. In the R program, this type of visualization is more formally called a Trellis Display. 

Besides the overpowering and inconsistent labels, I think the Baltimore Sun did a good job with its recent display of the percent change in the number of passengers from 2008 to 2009.  It may have been better to stick with the airport codes, like BWI, instead of writing out the airport names, but let’s not nit pick.  I bring this topic up again because I think small multiples or panel charts can be much more effective at visualizing data and, in my opinion, are under utilized in the business world.

stacked-bars

[source]

BWI Chart

[source]

There has been some great work done by a few experts in the Excel and R fields on creating panel charts.  Here are a few resources that have examples and information on how to create panel charts in Excel and Trellis Displays in R.

Data Visualization – Don’t Overcomplicate Charts & Graphs

Thursday, September 24th, 2009

One blog that I visit regularly deals with creating better PowerPoint presentations and is written by Jan Schultink.  I am in the data analysis and visualization business, but typically use presentation software to run meetings, webinars or communicate data analysis results.  So, creating effective and powerful presentations is important so the hard work that goes into the analytics isn’t lost in the communication.

Below you will find the data visualizations that Jan showed on his site.  I would agree that the second image is better at showing the difference versus the first one, but not by much.  The probability of 1 in 76 is definitely more meaningful (puts it into context) than the 1.3% data point.  What I have an issue with is that the area of both columns (2nd image below) is almost the same with the one on right being slightly larger.  What makes the two columns represent different values is the number of spheres within each column.  The one on the left contains 8,000 circles (I’m assuming because they are not very countable) while the one on the right has 76 circles.

Honestly, I’ve never created a data visualization like this to even know how to make this chart with so many spheres that are microscopic.  My guess is that it could be done using Adobe Illustrator or similar software.

Risk of maternal death as a percentage:

Data-point-percentages

Risk of maternal death as 1 in x number

Data-point-ratios 

[source]

Here’s where I think we can improve the visualization without losing effectiveness.  In a fairly popular post that I did a while back, I suggested the best method for showing a single data point is the following image.  I think that the 1 in 8,000 compared to 1 in 76 data points are powerful enough and take up very little space to convey the message.  Another option would be to compare the two data points and only show the difference as a single metric.   It would go something like this:

Illustration for showing a single data point:

Single Comparison Point

So my question would be – which method do you think would be more effective in a presentation?

Here’s a Quick Way to Improve Your Charts

Friday, September 4th, 2009

In a recent BusinessWeek article, the chart below was show to visualize the difference between Q2 ’08 and Q2 ’09.  Looking at the chart, a reader can do a pretty good job at seeing the difference between Q2 from one year to the next.  Logically, the more recent value is on the right and in green with the older value on the left in black.  There is a bit of highlighting in gray done to illustrate the overall S&P500 value and change from year to year.  So far so good, right?

One question that I often ask myself when performing a year over year or period over period analysis, is if the values really matter or just the change.  Sometimes the values for Q2 of this year and last year are important and sometimes only the change matters.  A good example of when the values do matter is typically when showing market value changes.  It’s often important to know where a company falls in the overall market as well as how they are trending or changing over time.  Many times showing the values will make the most sense and give the reader a full picture.

On the other hand, sometimes the total values don’t matter as much and you just need to highlight the changes.  In this scenario, it’s often cleaner and more effective to simply show the change from year to year as a percentage or value.  This removes the extra data series and also gives a clear picture of the change where the graph below does not.  For example, can you tell if the year-over-year change for Information Technology is the same as Energy in the chart below?

2 Series Column Chart

[Source]

Now, if only the change was shown, the values would be both positive and negative.  You could then sort in descending order to see which sectors did the best and those that did the worst.  I often see charts like the one above, which are well designed.   I rarely see a chart with only the change.  To meet somewhere in the middle, I see charts like the one here, but with the percent change as a label for each sector.

Below the chart in BusinessWeek appeared this text:

“Profit margins at health-care, utility, consumer staples, and consumer discretionary companies have actually improved in the past year.”

By only showing the change in descending order, the sectors highlighted by the above text, would be much easier to see and would include change amounts.  One other note – I think the wider gap between Materials and Telecom is just an oversight. 

What do you think – show only the change, show the values or show both the values and change?

Think Before You Stack

Wednesday, June 24th, 2009

I hate to beat this to death, but I continue to see charts like the one below featured in excellent periodicals like BusinessWeek.  As I said in a previous post, I really dislike stacked bar charts, ESPECIALLY when time is on the x-axis.  To help illustrate my point, here are a few questions:

1) How much did the Subprime (yellow) dollars change from month to month or between any points?

2) Is there even a value for Subprime after about March 2013?

3) How many times do you have to reference the legend to figure out which color is what label?

The only value that is easy to visualize and analyze is the Agency data that is in black and the first data series.  Beyond Agency, this data visualization is useless.  I tried to get the data to make a better chart, but couldn’t locate it.  Instead I ask that you visualize 7 lines in a line chart, one for each value and the 7th one for the total.  Using that chart, you could easily see the trend in each value and also the overall change. 

Yes, this chart is colorful and caught my eye, but it’s also worthless. 

stacked bars

[source]

Below is another example using similar data where the data visualization is completely ineffective. 

CS_Default

[source]

Who Ate My Slice? [Chart Review]

Monday, April 13th, 2009

This isn’t your typical blog post bashing pie-charts.  The post here is an assessment of a chart that literally doesn’t add up.  A while back, I noticed something similarly wrong with the numbers in a different chart.  If you add up all of the numbers in the pie chart below, you don’t get to 100 or 100%.  This is the whole (no pun intended) point of a pie chart; parts of a whole

0409_chart3

[source]

Below, I added up the six slices of the pie and only came to 99 or 99%.  Am I the only one who does this for fun?  For whatever reason, I can’t pass up on doing these checks and math when seeing a chart like the one above.  Now, I’m sure the reason for the missing 1% is rounding.  Instead of 41%, I’m sure the actual number is 41.4% and so on.  Whenever making data visualizations, be sure to check the details and do the simple math to ensure everything adds up correctly.  Something as simple as the pie chart slices not adding up to 100% could lead to questioning the competency of the creator.

pie 99 

Is this being too critical or do you think the minor details matter?

Liquidations – Map Visualization [Chart Review]

Tuesday, April 7th, 2009

Maps with data represented using bubbles are all the rage these days.  Add some automation to show how they grow over time and you can really capture an audience.  

popup_44must_go

[source]

Update: Here is the story that accompanied this data visualization in BusinessWeek.

If you click on the source or image above you will get a larger view that is a bit easier to read.  Let’s start with the map on the left, which shows a flaw in the design of these maps in terms of accuracy.  Specifically, lets focus on the orange bubble in the middle of Florida.  Does this large bubble represent the state or the actual region (city) within Florida that has a large number of Circuit City closings?  Honestly, I don’t know.  By looking at all of the states, I guess I can logically deduce that the bubbles appear in the middle of each state, meaning the value is for the entire state. 

Now, look at the value for Texas and tell me with corresponding value in the legend it matches up with.  It looks like it’s about the same size as the 56+ bubble, but what would California’s value be?  It’s clearly the largest bubble on the map and must represent the largest value in the legend, right?  Something isn’t right.  Also, the smaller bubbles are nearly impossible to quickly tell apart on the map.

Another good example of a flaw is the map on the right that shows the number of KB Toys store closings.  Check out the Northeast region and tell me which value (bubble size) represents Massachusetts? It’s pretty hard to tell, huh.  The map in the middle is actually the easiest to read because there aren’t any states with a large number of closing leading to a large bubble. 

Now let’s put this data visualization in context.  These three maps appeared in a section of BusinessWeek a few weeks back.  What I do like about these maps is the ability for me to quickly visualize regions or states with large values, to some extent (Northeast).  For KB Toys, I can only really tell that the Northeast has a lot of stores.  I don’t think there’s much in depth analysis that can be done with these static bubble maps. 

Here’s where I contradict myself.

Benefit – Quickly see regions and states with large values making them the outliers.

Drawback – Can’t quickly differentiate the small bubbles from state to state and in reference to the legend.

Now a question – do you think these bubble maps are more valuable than the more traditional color shading (heat) maps? 

Highlighting Segments of Recession Data

Wednesday, March 25th, 2009

Below you will find some interesting ways to highlight sections in charts or graphs.  I typically like to know more about data visualizations when I see certain trends in the data.  This method of "calling out" certain sections is nothing new or earth shattering.  One aspect that I really like about the area chart below is that the sub charts utilize the unused space. 

popup_26bears

[source]

The series of area charts below also use a method of highlighting sections of the chart that denote specific events that took place in the history of the stock market. I’m not sure I like the color choices or texture, but the ability to highlight sections can help a reader get to the point of the chart(s) a little quicker.

popup_28rebound

[source]

Once a chart is created in Excel, these highlights can be done easily by using text boxes and setting the transparency higher.  In the first example above, these callouts can be created using the drawing features.  It’s fairly manual, but you can accomplish these feats without a graphic design program.

Do you think this is an effective method for "calling out" data?

Data Versus Information – Financial Bailout (Part 2 of 2)

Thursday, March 19th, 2009
  1. Data Versus Information – Financial Bailout (Part 1 of 2)
  2. Data Versus Information – Financial Bailout (Part 2 of 2)

In the first post on Tuesday, I discussed that the original set of data (shown below) didn’t go into some of the basic things I would do mentally when first seeing the matrix. 

Financial Lobbying

[source]

The figure below shows the same data set with the 2007 revenue spent on lobbying removed.  I don’t think there’s too much value in the trend from what I can see, especially with only two years worth of data.  What I think may be valuable is the ratio of money spent on lobbying to the bailout awards.

Financial 3

To better see how the bailout awards were given, I looked at the money spent by these companies in 2008 on lobbying compared to the awards.  I computed a lobbying return, which is highlighted below and sorted in descending order.  Bank of America faired the best by spending just over $4 million on lobbying and getting a $35 billion award.  I’m sure there are many other factors that come into play and data needed to do a full analysis, but these are just the types of things I do when looking at data.

Financial 4

I think sorting is one of the most useful formatting tools you can use when displaying data.  Yet, I often see it not used correctly.  In the original matrix, the sort is descending based on the 2008 lobbying spend.  Maybe it’s just me, but the first thing I did was visually sort the data by bailout award in descending order.  That’s really what we’re after, right.  I want to know who got the largest bailout.  Now the context of the article plays an important part of the way the data should be displayed, which I’m not going to get into.  The overall message is to be conscientious of the sort and ensure it makes sense for what you are trying to depict.

Data Versus Information – Financial Bailout (Part 1 of 2)

Tuesday, March 17th, 2009

The Financial Lobbying information below is a great example of the difference between giving someone data and providing them with information.  The designer stopped far too short when putting this matrix together because they left all the work for me to do.  If you’re like me and you see this grid, what are first few things you do?

Financial Lobbying

[source]

When I saw this, I immediately did these things:

  1. Quickly read the title and sub title
  2. Scanned the companies looking for a familiar one
  3. Started calculating percentages of each to the total
  4. Thought about how much these bailouts are of the total bailout package

I am only looking for some basic statistics and context for this data.  I need to put it into perspective and try to tell a story.  I recreated this data in Excel and added a few simple columns to illustrate my points.  Also, we aren’t even talking about charts or graphs, just a simple matrix.

First, I have the same matrix with one additional column for the percent each company is of the total financial bailout spend.  Also, you’ll notice I abbreviated the numbers in the millions to save space.  Finally, I removed the zebra striping because it really isn’t needed in such a small data set.

Financial 1

In the next example below, I added an additional column that represents the percent each company is of the total bailout package.  Now I can see that these eight large financial companies make up 26 percent of the total bailout spend assuming a $700 billion total.  What this does, is put the data in some perspective versus just showing a bunch of numbers. 

Financial 2

In part 2, I will show you a few more changes that I made to the matrix that speaks to the revenue columns.

Presenting Data on a Map

Thursday, March 5th, 2009

Below you will find a three different data series for countries across Europe.  The legend in the upper right corner states that:

  1. The orange data point is for the percent of unemployment
  2. The yellow data point is for the percent change in GDP
  3. The red data point shows the percent auto sales, which is atrocious

Typically, I find that overlaying data on a map doesn’t really add a lot of value and tends to dilute the message.  This chart is one exception that I found.  If you had asked me to name these countries without the labels back in the 8th grade, I may have gotten most right.  To be honest, I doubt I could name all of the ones on the right (Denmark to Bulgaria) besides maybe Poland, had the labels not been listed.

This chart takes up a lot of room, but I think it’s important to show the relationship (distance) these countries are from each other.  Looking at the data, I wonder why Poland has a fairly high unemployment rate with the best auto sales.  Then, looking at Poland’s neighbors, it further questions why the numbers are so different.  That conclusion would be much harder had the map not been shown.

Yes, you could list these countries out in a bar chart, but showing their proximity to each other adds a great deal of value, in my opinion.

Europe Chart

[source]