Archive for the ‘Reporting’ Category

Information Visualization or Data Visualization, You Decide [Poll]

Thursday, February 4th, 2010

In the February 1, 2010 edition of BusinessWeek, I found the illustration shown below.  It certainly contains some good information and data that can be easily read.  My question is, would you consider this a Information Visualization, Data Visualization, Information Graphic, Statistical Graphic, Chart/Graph or something else and why?

President Data

Sorry about all the polls lately, but I think this is an easy way to respond without having to comment. 

[poll id="4"]

Also, please feel free to comment with additional insights.

Note: RSS readers have to go to the website to view the poll.

Online Data Analysis and Visualization Tool [Poll]

Tuesday, February 2nd, 2010

Not too long ago, I got a tip from someone on Twitter with a link to a site called Verifiable.com.  Upon further investigation, I learned that this site is similar to Manyeyes.com in that you can upload a data set and, using the tools on the site, create data visualizations. 

At first glance, the site seems somewhat plain.  After digging into what the site is about, I quickly learned that they utilize sound and popular theory in the data visualization field.  On their about page, the first line that explains the features is:

“A clean, low-chartjunk philosophy — no shadows, no pie charts, no 3-D bar graphs, just the ink you need. [verifable.com]”

Well, simply reading that peaked my interest because they use similar principles that I follow when creating charts/graphs.  No frills.  I like the fact that you can create charts that don’t have excessive grid lines, shadows, weak labeling and limited charting options.  Below you will see a few examples from their site.  You can also follow the links to see the visualizations in an interactive environment.  As you will see, there is a lot of data (hover over), many different options and some good visuals.  Granted some of the charts I had no idea what they were trying to show, but in general this site gives you a seemingly good tool to apply charting/graphing best practices.

Major League Baseball Payroll Efficiency 2006-2008 

[Interactive version]

U.S. Unemployment Rates by Education, 1992-Latest

[Interactive version]

Verifiable also offers a Pro version of their tool where you can keep your data and visualizations private and receive premium support.  The cost is minimal with the Pro version going for $29.95/year. 

I didn’t try to upload a data set to give the site a full trial, but it definitely looks interesting.  I am not sure how much demand there is for online data visualization using a tool like Verifiable.

Mean and Median – Part 2

Tuesday, January 19th, 2010

Back in December I wrote a post about using the arithmetic mean and median when analyzing data.  This is a follow up post that shares some insights from Naomi Robbins, author of Creating More Effective Graphs.  The following paragraph was Naomi’s response to my question, “what would you provide if asked what is a typical customer size?”

“I’d give the median together with the graph below. The graph, modeled after Figure 4.1 of Creating More Effective Graphs, shows all the data, as Jon suggested. I’d say something like:

Median-Mean Chart

"I provided the median (shown by the black line in the figure) rather than the mean (shown by the light cyan line) since as you can see from the figure, the mean is not a typical value. There are no actual customers who have revenues near to the mean value because customer 5 influences the mean so strongly since its revenues are so much higher than the others. Half of our customers have revenues less than the median while the other half have revenues greater than the median. The middle half of our customers have revenues that are between the dotted lines."

For a slightly larger customer base I’d jitter the points. For a much larger customer base I’d replace this strip plot (also called a one-dimensional scatter plot) with a box plot. By box plot, I mean a Tukey box plot. I object to every software program and every author redefining box plots so that the reader can’t read them without an explanation.

The figure was drawn with R. However, it is easy to reproduce it using Excel or other software. [Robbins]

I think this goes back to one of my original points, which was that many people just to provide the mean and it can be very misleading.  The graph above that Naomi provided illustrates this point clearly.  The light cyan line [mean] isn’t even close to the majority of the data points.  The median is much more representative of a typical customer value, but also not perfect.  Combine the median, mean, quartiles and actual values and now you’re providing real value.  Looking at this chart, it clearly shows the grouping of typical customers, outlier and where the median and mean fall.  Thank you Naomi for the insights!

If you are interested in providing a guest post, please contact me for more information or to submit a proposal.

Rainbow Chart – Twitter Messages Per Day

Monday, December 14th, 2009

Below is a great example of the wrong use of color in a column chart.  Use color to differentiate between segments, but don’t use it when time is on the x-axis for the different days.

A better use of color may be for each quarter within the year.  Using the chart below, it would make more sense to have every first week of the month always in one color, like blue.  Then, at least you could easily compare the first week of each month quickly.  I’m not even going to touch the chart title not matching to what is actually being displayed in the graph – days vs. weeks.

You really can’t make the color mistake if you used a line graph, just saying.

image

[Source]

iPhone Data Visualization Application?

Thursday, November 5th, 2009

I recently came across a few iPhone applications (Apps) that allow a user to view or edit spreadsheets in Excel.  Some have pretty good reviews and others, well, not so good.  I think there is some benefit to being able to view data visualizations, charts, graphs, spreadsheets and reports on your phone.  I think the capability is probably limited as it would be near impossible to do large scale spreadsheets on a phone.  Also, the screen size would limit the size and amount of data that could be displayed. 

  Excel iphone app

Here are a few spreadsheet type apps for the iPhone:

In my opinion, the best option is still to view web-published visualizations from a company like Spotfire or Tableau to see near real-time data, trends and visualizations.  Let’s forget about trying to build spreadsheets on your phone, because that isn’t going to happen.

Visualizing Multiple Data Series’ in a Chart

Wednesday, October 28th, 2009

A little while back I featured the stacked column chart below in a post that discussed the ineffectiveness of its design.  More often than not, a simple multi-series line graph can do a better job at visualizing data compared to a stacked column chart.  The other option is to go with is a panel chart, also known as small multiples. In the R program, this type of visualization is more formally called a Trellis Display. 

Besides the overpowering and inconsistent labels, I think the Baltimore Sun did a good job with its recent display of the percent change in the number of passengers from 2008 to 2009.  It may have been better to stick with the airport codes, like BWI, instead of writing out the airport names, but let’s not nit pick.  I bring this topic up again because I think small multiples or panel charts can be much more effective at visualizing data and, in my opinion, are under utilized in the business world.

stacked-bars

[source]

BWI Chart

[source]

There has been some great work done by a few experts in the Excel and R fields on creating panel charts.  Here are a few resources that have examples and information on how to create panel charts in Excel and Trellis Displays in R.

Data Visualization – Don’t Overcomplicate Charts & Graphs

Thursday, September 24th, 2009

One blog that I visit regularly deals with creating better PowerPoint presentations and is written by Jan Schultink.  I am in the data analysis and visualization business, but typically use presentation software to run meetings, webinars or communicate data analysis results.  So, creating effective and powerful presentations is important so the hard work that goes into the analytics isn’t lost in the communication.

Below you will find the data visualizations that Jan showed on his site.  I would agree that the second image is better at showing the difference versus the first one, but not by much.  The probability of 1 in 76 is definitely more meaningful (puts it into context) than the 1.3% data point.  What I have an issue with is that the area of both columns (2nd image below) is almost the same with the one on right being slightly larger.  What makes the two columns represent different values is the number of spheres within each column.  The one on the left contains 8,000 circles (I’m assuming because they are not very countable) while the one on the right has 76 circles.

Honestly, I’ve never created a data visualization like this to even know how to make this chart with so many spheres that are microscopic.  My guess is that it could be done using Adobe Illustrator or similar software.

Risk of maternal death as a percentage:

Data-point-percentages

Risk of maternal death as 1 in x number

Data-point-ratios 

[source]

Here’s where I think we can improve the visualization without losing effectiveness.  In a fairly popular post that I did a while back, I suggested the best method for showing a single data point is the following image.  I think that the 1 in 8,000 compared to 1 in 76 data points are powerful enough and take up very little space to convey the message.  Another option would be to compare the two data points and only show the difference as a single metric.   It would go something like this:

Illustration for showing a single data point:

Single Comparison Point

So my question would be – which method do you think would be more effective in a presentation?

Think Before You Stack

Wednesday, June 24th, 2009

I hate to beat this to death, but I continue to see charts like the one below featured in excellent periodicals like BusinessWeek.  As I said in a previous post, I really dislike stacked bar charts, ESPECIALLY when time is on the x-axis.  To help illustrate my point, here are a few questions:

1) How much did the Subprime (yellow) dollars change from month to month or between any points?

2) Is there even a value for Subprime after about March 2013?

3) How many times do you have to reference the legend to figure out which color is what label?

The only value that is easy to visualize and analyze is the Agency data that is in black and the first data series.  Beyond Agency, this data visualization is useless.  I tried to get the data to make a better chart, but couldn’t locate it.  Instead I ask that you visualize 7 lines in a line chart, one for each value and the 7th one for the total.  Using that chart, you could easily see the trend in each value and also the overall change. 

Yes, this chart is colorful and caught my eye, but it’s also worthless. 

stacked bars

[source]

Below is another example using similar data where the data visualization is completely ineffective. 

CS_Default

[source]

Heat Map for Excel

Thursday, June 4th, 2009

Question: Would you pay $1,295 to be able to create this heat map (below) from an Excel spreadsheet?   If so, a company called Lab Escape has a product that will do the trick.  Oh yeah, if you want to be able to view [interactively] the output of the standard version software, you need the viewer version that only runs a mere $495.

Taken directly from their site, they claim the benefits of heat maps are:

  • Increase Agility – Improve business agility through quicker analysis, better decisions and more effective communication.
  • Reduce Risk – Rapidly identify trouble spots, before they are out of control
  • Maximize Value – Ensure that attention and resources go where they bring the best return.
  • Identify Opportunities – Discover underlying trends that point to high-value opportunities.

Heat Map

[source]

I don’t really mean to pick on this company; it’s just the one that I got an email about today.  I think the power of heat maps is actually part of its ineffectiveness, which is too much data.  The benefit of a heat map is suggested to be that you can fit a lot of information in a relatively small visualization.  If you were to create the same data using a bar chart, it would take up a few pages.  The downside of this heat map is there is too much going on with it.  The only information I can make out of it are the large outliers.  Ironically, the same is probably true if you were to create a visualization using a bar chart.  At least you would save $1295 by using a standard bar chart in Excel versus this software.

This heat map shows a ton of data where size and color of the boxes matter.  Intuitively, the bigger the box, the larger the number must be, right?  But what the heck to the colors stand for?  I cannot tell.  Also, note the logo images within the boxes – they make the label and value very hard to read.

Do you see value in heat maps?  Is this just a bad example that uses too many data points?  Please share your thoughts.

Where To Draw The Line: Line Graph vs. Area Chart

Monday, May 4th, 2009

I can honestly say that I have seldom, if ever, used an area chart in the business world.  Both a line graph and area chart shows the same type of information, which is data over time.  The way Excel 2007 describes the two is by noting:

Line Graph: "display trend over time…. useful when there are many data points and the order is important"

Area Chart: "display the trend of values over time"

Below I have taken two separate data visualizations showing the same type of data.  The first is an area chart with the second being a line graph.  The data sets are different, but the use is nearly the same – showing a trend of a percent over months or years.  There is the Tufte concept of data/ink to consider if you really want to get technical.  Me, I just stick to a simple well designed line graph.

area chart

[source]

Line Graph

[source]

Simply put, I think you can’t go wrong with either of the two above if your intention is to visualize the trend of data over time.  Even though I rarely use area charts, I would much rather have someone use one of these over the misuse of trying to fit a column chart in its place.  One use of a variation of the area chart that I typically see is to show the range of average temperatures throughout the year, where the area represents the average high and average low temperature.  Conversely, I’ve also seen the temperature data represented with two line graphs for the average high/low (www.weather.com uses this method and can be seen here).  You can also see an example of an area combination chart on Jon Peltier’s site found here.

Which chart do you prefer when showing data over time, the line graph, area chart or other?  Does one sector tend to use area charts over line graphs?  My experience in the business world leads me to think lines are preferred over area charts.