Archive for the ‘Excel’ Category

Analyzing a Stacked Bar Chart

Monday, March 8th, 2010

Below you will see a stacked column (vertical bars) chart that has nine different segments covering eight months.  I have absolutely nothing against HubSpot and actually think they have some great services, tools and products.  I am simply using their chart to illustrate the problems with stacked bar charts and some alternatives.

HubSpot_Reach Stacked Bar Chart 

I can think of a few reasons off the top of my head as to why people would use stacked bar charts. 

  1. To show how each segment changes over time
  2. To illustrate parts of the whole at any given time
  3. A combination of 1 and 2 above

The problem is that a stacked column chart is not good for either of these requirements.  My feelings on these charts is not really anything new, as I’ve stated before on this post and also this post.  In my opinion, after you get beyond two series with like scales, a stacked column chart is pretty and pretty useless. 

(more…)

Waterfall Charts

Monday, March 1st, 2010

The two charts below show the S&P 500 Net Income by Sector for both 2008 and 2009 and recently appeared in BusinessWeek.  What really caught my attention is that these column charts are a little bit like a waterfall with the only difference being that these start from zero.  Prior to using the Waterfall chart utility, I created these by hand.  The workaround to get a hidden or shaded set of bars is more difficult than it really needs to be.

2008 S&P Chart

2009 S&P Chart

[source]

Below you will see my versions of the charts using the same data and Excel.  I didn’t include the gray shaded series because I do not think that it adds any value.  Also, I do not have the text box calling out the title and final value because there is a column at the end that shows what the value is upon finish.

(more…)

Support Analytics e-Store

Monday, February 22nd, 2010

Over the past few years I have accumulated a bunch of different resources that I find extremely valuable.  Most of these utilities/products are for Excel, but not all of them.  Historically, you could find an ad or link to these products scattered throughout my blog pages.  I have combined all of the affiliate products I support onto one page that is called e-Store.  Now the main pages of this blog should appear less cluttered.

estore

You can find this page by clicking here or by clicking on the e-Store link in the header of this blog.  There are some great utilities, Excel add-in products and e-books that you should check out.  Many of them will make life easier and some will enlighten.  Here is a summary list of what is available.

  1. Waterfall Chart (Excel Add-in)
  2. Box and Whisker Chart (Excel Add-in)
  3. Dot Plot Chart (Excel Add-in)
  4. Learn Excel Formulas (Tutorial)
  5. Dashboard Reporting with Excel (Tutorial)
  6. MicroCharts (Excel Add-in)

If you have any products that you would like to have included on this new page, please contact me.  The links above contain affiliate products that, if purchased, generate a small commission for Support Analytics.

Data Analysis – Do You Really Mean Average?

Thursday, December 17th, 2009

In the corporate world I see this issue quite frequently.  Specifically, I will hear a request where the verbiage doesn’t align to what the requestor is ultimately looking for.  To illustrate, I have included an example below that shows ten different customers within a territory.  For each customer the total revenue year-to-date is listed.  To make the illustration relevant for this example, I listed Customer 5 with revenue that is exponentially higher than the rest. 

Now here’s the question I typically hear:

"What is the average customer size (revenue) for Territory A?"

Here is what that really means most of the time:

"What is a typical customer size (revenue) for Territory A?"

You may think it’s semantics, but it’s really not.  I don’t want to turn this into a statistics lesson, but average (mean) doesn’t always translate into typical.  Because Customer 5 is such an outlier, the average (sum of all customer revenue divided by count of customers) will be higher than if that customer fell into the typical range like the rest.

I have included the median revenue amount for the ten customers, which I think is probably a better predictor (in general) than the mean or average.  The median is simply defined as the number in the middle.  In reality, Customer 5’s revenue could be 875 zillion dollars and the median amount wouldn’t change.  When there are thousands of records and you need to know what the typical amount is, it’s often safer to choose median unless you want to take the time to calculate min, max, median, std deviation and mean to compare.

"In probability theory and statistics, a median is described as the numeric value separating the higher half of a sample, a population, or a probability distribution, from the lower half." [source]

Now the real question that would need to be answered is, can a typical territory have one very large customer or is this a unique situation and should not be considered normal?  Answering the preceding question will make all the difference in what calculation to use.  Most often I will include both.

Median vs. Average Example

It’s my belief that most people are simply familiar with the term average because it’s so commonly used.  The underlying reason that average is more prevalent in analysis is probably due to the fact that it’s very easy to calculate.  Before spreadsheet software was available that automated the median calculation, it was much more difficult to get a median amount even with a calculator.

As a data analyst, it’s prudent to know the difference between mean and median and when each is applicable.  Telling the CEO/CFO that the typical customer is roughly $131,000 when one customer is atypical and the true amount is more like $57,000 can be a career changer.

Gradient Fill and Deception with Charts and Graphs

Tuesday, November 10th, 2009

Below you will see a column chart that appeared in the weekend’s print edition of the Baltimore Sun.  It’s no secret that they used a gradient fill on the columns to give it the fading appearance.  I’m not a big fan of the gradient fill on the 2009 columns, but this could work for the previous year’s numbers (2008) if the intent was to minimize the prior year.  I doubt that was the case as I’m sure they were trying to make the chart "pretty" or different than the default setup.

BS Unemployment Chart 

Below you will see a replica that I made using Excel and the fill effects formatting option.  It looks alright, but something still isn’t right.  What is the problem with this chart?

BS Chart Replica

The problem is the y-axis and the scale that was used.  I don’t think this is a straight out misrepresentation in order to mislead, but it could be.  That’s the risk you face when manipulating the axis.  Yes, the columns take up a lot of space when the axis starts at zero, but that’s the correct method here.  To help illustrate my point, check out the exact same chart (below) with the y-axis starting at zero.

BS Chart Replica - Axis

This version using the correct axis setting accurately shows that October, year-over-year, is not three times as much, but only about 1.5 times greater.  Also, look at the trend of the first replica chart.  The upward trend definitely has a greater slope compared to the replica with the correct axis.  To help prove this visually, check out the side-by-side comparison below using a trendline in the chart.  The slope of the chart on the left is much greater than the one on the right.  If you were presenting this data in something like PowerPoint or SlideShare, and quickly went to the next slide, the audience might not catch the axis starting at 5 and the steep trendline would be the point taken from the data.

BS Chart Replica - Slope

Furthermore, forget the gradient fill and go with something like the chart below if you want to highlight the current year.

BS Chart Replica - Color 2

Visualizing Multiple Data Series’ in a Chart

Wednesday, October 28th, 2009

A little while back I featured the stacked column chart below in a post that discussed the ineffectiveness of its design.  More often than not, a simple multi-series line graph can do a better job at visualizing data compared to a stacked column chart.  The other option is to go with is a panel chart, also known as small multiples. In the R program, this type of visualization is more formally called a Trellis Display. 

Besides the overpowering and inconsistent labels, I think the Baltimore Sun did a good job with its recent display of the percent change in the number of passengers from 2008 to 2009.  It may have been better to stick with the airport codes, like BWI, instead of writing out the airport names, but let’s not nit pick.  I bring this topic up again because I think small multiples or panel charts can be much more effective at visualizing data and, in my opinion, are under utilized in the business world.

stacked-bars

[source]

BWI Chart

[source]

There has been some great work done by a few experts in the Excel and R fields on creating panel charts.  Here are a few resources that have examples and information on how to create panel charts in Excel and Trellis Displays in R.

Excel Formulas From a Microsoft MVP!

Tuesday, June 9th, 2009

Chandoo, who writes a blog called Pointy Haired Dilbert for charting and Excel tips, created a tutorial that makes learning Excel (2003 and 2007) formulas fast, easy and fun.  This workbook contains 75 of the most frequently used formulas that are explained in plain wording.  If you are new to Excel, want to brush up on some rusty formulas or want to learn some new ways to look at data, buy this ebook today.  For just $10, you can be using formulas that will save you hours worth of manual work in Excel.

Click here to view more details

excel-formula1-aff-ad-1

I personally use 63 or 84% of these formulas on a regular basis (yes, I counted).  It’s amazing to see the results when I show someone how to use the NETWORKDAYS, VLOOKUP. CONCATENATE or TRIM formula.

This post contains an affiliate link.

Heat Map for Excel

Thursday, June 4th, 2009

Question: Would you pay $1,295 to be able to create this heat map (below) from an Excel spreadsheet?   If so, a company called Lab Escape has a product that will do the trick.  Oh yeah, if you want to be able to view [interactively] the output of the standard version software, you need the viewer version that only runs a mere $495.

Taken directly from their site, they claim the benefits of heat maps are:

  • Increase Agility – Improve business agility through quicker analysis, better decisions and more effective communication.
  • Reduce Risk – Rapidly identify trouble spots, before they are out of control
  • Maximize Value – Ensure that attention and resources go where they bring the best return.
  • Identify Opportunities – Discover underlying trends that point to high-value opportunities.

Heat Map

[source]

I don’t really mean to pick on this company; it’s just the one that I got an email about today.  I think the power of heat maps is actually part of its ineffectiveness, which is too much data.  The benefit of a heat map is suggested to be that you can fit a lot of information in a relatively small visualization.  If you were to create the same data using a bar chart, it would take up a few pages.  The downside of this heat map is there is too much going on with it.  The only information I can make out of it are the large outliers.  Ironically, the same is probably true if you were to create a visualization using a bar chart.  At least you would save $1295 by using a standard bar chart in Excel versus this software.

This heat map shows a ton of data where size and color of the boxes matter.  Intuitively, the bigger the box, the larger the number must be, right?  But what the heck to the colors stand for?  I cannot tell.  Also, note the logo images within the boxes – they make the label and value very hard to read.

Do you see value in heat maps?  Is this just a bad example that uses too many data points?  Please share your thoughts.

Excel and Project Gemini

Thursday, May 7th, 2009

Project Gemini is due to be released with the 2010 version of Excel as a free add-on.  The two videos below are a little dry and hard to see.  But there are a few things that caught my attention very quickly.  Also, Mr.Excel did a podcast on this a little while back in episode #975.  The things I liked were:

  • Ability to slice and dice using 20 million rows of data on a fairly basic PC (not server)
  • The other was how quickly you can filter on 20 million rows
  • Create Pivot tables quickly with 20 million rows and on the fly
  • It’s free and will be included in Excel 2010

Here is another demo using Gemini and SharePoint.

[thanks to SmartData Collective]

What do you think? Do you think this is a huge breakthrough or just Microsoft/Excel getting with the times?

Data Versus Information – Financial Bailout (Part 1 of 2)

Tuesday, March 17th, 2009

The Financial Lobbying information below is a great example of the difference between giving someone data and providing them with information.  The designer stopped far too short when putting this matrix together because they left all the work for me to do.  If you’re like me and you see this grid, what are first few things you do?

Financial Lobbying

[source]

When I saw this, I immediately did these things:

  1. Quickly read the title and sub title
  2. Scanned the companies looking for a familiar one
  3. Started calculating percentages of each to the total
  4. Thought about how much these bailouts are of the total bailout package

I am only looking for some basic statistics and context for this data.  I need to put it into perspective and try to tell a story.  I recreated this data in Excel and added a few simple columns to illustrate my points.  Also, we aren’t even talking about charts or graphs, just a simple matrix.

First, I have the same matrix with one additional column for the percent each company is of the total financial bailout spend.  Also, you’ll notice I abbreviated the numbers in the millions to save space.  Finally, I removed the zebra striping because it really isn’t needed in such a small data set.

Financial 1

In the next example below, I added an additional column that represents the percent each company is of the total bailout package.  Now I can see that these eight large financial companies make up 26 percent of the total bailout spend assuming a $700 billion total.  What this does, is put the data in some perspective versus just showing a bunch of numbers. 

Financial 2

In part 2, I will show you a few more changes that I made to the matrix that speaks to the revenue columns.