Sunday, April 05, 2009

Dealing with large spreadsheets

Special techniques are required to cope with large volumes of data and/or to have many Excel Reports based on one spreadsheet.

Introduction
What is a large spreadsheet?

Is it the sheer size, the number of used rows in a particular sheet or having lots of worksheets?

All and any of these make for large unwieldy spreadsheets prone to error and known to consume lots of a user’s time and effort.

This article discusses techniques to deal with each of these situations and the general problems that they are prone to.

Dealing with large Spreadsheets

Why do spreadsheets grow very large?
Often they are used as databases, and with Excel 2007 now being able to cater for a millions rows, they can hold quite a lot of data.

The typical scenario is to have a worksheet with the data and other worksheets which use pivot tables and lookup functions like vlookup to analyse and report on the data.

Another situation that is quite common is monthly reporting. You start off with reporting for a number of branches for one month and then you end up with reporting month after month for a year or two and you now have a very complex set of linked workbooks.
e.g. 10 branches by 24 months  240 linked workbooks plus the 24 data sheets….!

Ghost rows can also bloat a workbook. Especially if it came from a very early version of excel….There are techniques to eliminate these.

Vlookup versus Index/Match

Let’s examine this vlookup formula.
=VLOOKUP(E2,A:C,2,FALSE)
There are four parts:

1: Cell Value to use – on some versions of excel there were major issues with dashes in the value to lookup.

2: The range to look-in – Note: How we used the columns, rather than a selected range.
e.g. We could have used
=VLOOKUP(E2,A1:C6,2,FALSE) instead with the range A1 to C6 specified

3: The Column with the value. In this case the 2nd column of the range is specified

4: Exact or Approximate match…beware this can cause major issues!
Setting this to false makes it do an exact match.
* One trick to getting values is to do an approximate match on a list with ever increasing values and it stops when the value is larger than the one it is trying to find. This enables you to use vlookup to get a value with a specific range.

Often people will use the more complicated index/offset and match formulae to get the same or a more accurate result.

e.g. =INDEX(B:C,MATCH(E2,A:A,0),1)
or =OFFSET(B:C,MATCH(E2,A:A,0)-1,0,1,1)

Noting that the index/match combination is simpler and easier than the offset one.

The match part looks up the column with the value and returns the row number.
The index formula then uses the row number and simply retrieves the value from the column specified in the range selected.

Here’s an example doing it across sheets:
=VLOOKUP(A2,Sheet1!A:C,2,FALSE)
=INDEX(Sheet1!B:C,MATCH(A2,Sheet1!A:A,0),2)

Or
=VLOOKUP(A2,'Large Data Worksheet'!A:C,2,FALSE)
=INDEX('Large Data Worksheet'!B:C,MATCH(A2,'Large Data Worksheet'!A:A,0),1)

We often have had to replace every formula for a client changing the vlookup to an index / match for better accuracy.


Linked Workbooks

The story gets even better when you have a linked workbook!
What if the workbook gets moved? You get those nice messages says that it cannot update the links, when you open the workbook.

=VLOOKUP(A2,'[My Data.xlsx]Large Data Worksheet'!$A:$C,2,FALSE)
=INDEX('[My Data.xlsx]Large Data Worksheet'!$B:$C,MATCH(A2,'[My Data.xlsx]Large Data Worksheet'!$A:$A,0),1)

You can select “Updated” and “Edit Links” when it cannot find the source workbook. This then allows you to select the new workbook.
If you don’t select the workbook and just say Continue, then you see the last set of values that it remembered.

In most cases, it is the monthly reporting scenario were most of these large spreadsheet systems are first created and then the whole system becomes a very complex and prone to error. E.g. The example above of 10 branches by 24 months  240 linked workbooks plus the 24 data sheets….!

What should you do?

Use Index/Match rather than vlookup if looking up text values.

Copy and past values into your monthly reports, so you don’t have links everywhere.

Get a developer in to assist you to get the data directly out of a database using a little vba.

Pump the data back into a database for monthly reporting.
* Makes comparing months and years a lot easier.


Summary
Often spreadsheets start small and grow to become very large and hard to use. By using inbuilt lookup functions and knowing when to separate large lists, you can save a lot of time and improve the usability and accuracy of your data.

Being aware of issues with certain lookup functions such as vlookup and linked workbooks, is half the battle with gaining control again.

Know when to use a database for your data and not excel.

Questions?
If you have any questions about ways to deal with huge spreadsheets, let us know.

http://www.macroview.com.au