Drive Time Calculations in Excel

Building on my most popular post, Getting Information From the Web Using Excel VBA, I had the need in a recent project to calculate drive times on many rows of data, and decided to build a function in Excel to handle the dirty work for me using Google Maps.

Essentially, we had 200 or so latitude/longitude points and needed to see which of six addresses were closer from a drive time perspective. Luckily, Google came to the rescue once again, because one can use latitude/longitude coordinates in lieu of an address in order to get directions (which includes drive time) to a physical address (or, I suppose, a second set of latitude/longitude coordinates). What’s more, the URL for google maps to give you this information is fairly simple:

http://maps.google.com/maps?q=from: [PointA] to: [PointB]

So, once I dug through the HTML code behind the google maps directions results to find the div element containing the drive time of the quickest route, it was easy enough to create the following function. The two parameters are the addresses, zip codes, coordinates, or whatever else Google will allow you to use to approximate the starting and ending points.

Function DriveTime(PointA As String, PointB As String)

  Dim myURL As String
  myURL = _
    "http://maps.google.com/maps?" & _
    "&q=from: " & PointA & " to: " & PointB

  Dim inet1 As Inet
  Dim mypage As Variant

  Set inet1 = New Inet
  With inet1
    .Protocol = icHTTP
    .URL = myURL
    mypage = .OpenURL(.URL, icString)
  End With
  Set inet1 = Nothing

  Dim intStart As Double, intEnd As Double
  intStart = InStr(mypage, "<div class=""altroute-rcol altroute-info"">") + 41
  intEnd = InStr(intStart, mypage, "</div>") - intStart
  DriveTime = Mid(mypage, intStart, intEnd)

End Function

It took about 45 seconds to calculate the drive time for appx. 6 * 200 or 1,200 routes. My only other option was to copy and paste those 1,200 coordinates one-by-one into Google maps and retype the drive time. Which probably would have taken at least half a day. I call that a win.

Feel free to use this function for any number of drive time calculations, but I would assume that at some point Google will pitch a fit that your IP is sending so many requests to its map server. However, it didn’t bat an eye at my 1,200 queries, so who knows? Enjoy!

Next Unique and Previous Unique

Why this isn’t already a function of Excel baffles me.  Perhaps my work is somewhat unique.  I generally work with very large datasets (over 100,000 rows, often times nearing the 1.08 million row limit of Excel 2007), and I often find myself needing to scroll through the data in order to find the next value in a series.

Consider a spreadsheet with three columns: Fruit, Name, and Score.  Imagine this is populated with the data of a survey of 100,000 people and their opinions of Apples, Bananas, and Oranges.  You can assume that with three fruit types, there would be 300,000 total rows.  Now imagine opening this file in Excel, and imagine it is already sorted by Fruit.  How would you go about locating the first entry of Bananas?

The obvious method is to simply use the scroll bar and drag down until Bananas appear.  A second method would be to press and hold Page Down until this section appears.  In either case, it is far too easy to surpass the intended row, requiring a similar method in the opposite direction.  This can sometimes force you to go back and forth a few times until homing in on that row.

Now imagine that you just want to access the last row in the data, regardless of the contents of the cells.  Just type Ctrl + Down Arrow, and you’re there.  Wouldn’t it be nice if there were a similar function for skipping down to the next unique value?  Well, there can be!

Consider the following VBA functions:

Sub findFirst()
   Dim targetString as String
   targetString = ActiveCell.Text

   If ActiveCell.row = 1 Then
      Exit Sub    
   ElseIf Cells(ActiveCell.row - 1, ActiveCell.Column).Text <> targetString Then
      Cells(ActiveCell.row - 1, ActiveCell.Column).Select
   Else
      ActiveSheet.Columns(ActiveCell.Column).Find(targetString, 
        LookIn:=xlValues).Select
   End If
End Sub
Sub findLast()
   Dim targetString As String
   targetString = ActiveCell.Text

   If Cells(ActiveCell.row + 1, ActiveCell.Column).Text <> targetString Then
      Cells(ActiveCell.row + 1, ActiveCell.Column).Select
   Else
      ActiveSheet.Columns(ActiveCell.Column).Find(targetString,
        SearchDirection:=xlPrevious, LookIn:=xlValues).Select
   End If
End Sub

In both instances, we’re using the built-in function Excel uses to find values, similar to using Find or Find/Replace.  In the first function, we first check to make sure we’re not already at the first row, and if not, we use the find function to locate the first instance of the current value in the current column.  So, using our original example, if we were midway through the entries of Apples and wanted to return to the first Apple entry, this function would look at the Fruit column and search for the first instance of “Apple”.

The second function does a similar act, but uses the xlPrevious search direction.  Therefore, it starts at the first cell in the current column, and searches backwards for the current value.  This requires Excel to start from the bottom of the spreadsheet and search upwards.  So, if we are halfway through the “Apple” entries and use this function, Excel will start at cell A1048576 and search upward until reaching cell A200001, which would be the last hypothetical instance of “Apple”.

In either function, if the cell directly above or below (respectively) the active cell has a different value than the current value, that cell is simply selected.

Create Your Own Excel Add-Ins

Yesterday I showed you how to create your own functions using Excel VBA, or Visual Basic for Applications. Today’s post will take this to the next level.

If you’ve played with creating and using your own functions, you may have noticed that once you close that workbook, you lose the ability to use that function. This is because Excel uses its built-in functions first, then looks to what is saved in the workbooks that are open. Excel doesn’t by nature save VBA code into itself; it only reads what’s been saved into the individual files.

Here’s an example. Say you’re working on a physics project for school. You’re working out of your spiral notebook from which you’ve been working all year. When you need to remember the formula for momentum, you can just flip through your notebook, find where you wrote it down, and use it to calculate the problem you’re currently working on. If you were working in a different book, you’d have to go back to this book to flip through and find the formula. If you didn’t have the book with the formula written on it (assuming you lack the cognitive capacity to memorize it), you wouldn’t be able to solve the problem.

So the excel file in which we saved our CAGR formula yesterday is the spiral notebook you’ve been taking to class every day, and in order to remember how to solve that formula, Excel needs to keep the spiral notebook open.

But, there is a way to make Excel memorize the formula. It is done with add-ins.

Generally speaking, when a program accepts add-ins, they have to be programmed using a higher-level programming code than VBA, and they have to be compiled in a very specific manor such that the host program is able to read and employ them. In the case of Excel, you are able to create your own add-ins just by saving a certain Excel file as a “Microsoft Excel Add-In (*.xla)”, which is the last type in the “Save As Type” drop-down when saving a file.

The methodology here is that any VBA code you write – be it custom functions or macros – will be saved in this .xla file, and when you include it as an add-in for Excel…wait, I’m getting ahead of myself.

Let’s start from the beginning. Go ahead an open up a new spreadsheet in Excel. Don’t worry about the cells within this worksheet. You can type whatever you want there, or you can type nothing at all. What you want to focus on is the VBA window.

On the menu, select Tools -> Macro -> Visual Basic Editor. You want a spot to type stuff, right? So click Insert -> Module. Here is our blank canvas where we enter in our custom formulas and whatnot. We’ll use a simple formula for today’s example. Type this into your module:

Function stupid_formula (myinput as integer)
   stupid_formula = myinput ^ 3
End Function

If you can’t tell, all this function will do is cube what you pass to it. If you were to close this window out, go back into your spreadsheet and type “=stupid_formula(2)” into a cell and hit enter, the cell’s value would be 8. However, if you were to close this workbook, open a new workbook, and type that formula in, the cell’s value would be “#NAME?” because Excel already forgot what you wrote in it’s spiral notebook.

Ok, so you’ve written “stupid_formula” in your module, and you haven’t closed your Excel file. Here’s where the fun starts. Close your module window to get back to the spreadsheet with which you began this journey. Save it (via File -> Save) as a “Microsoft Excel Add-In”. Once you select that as the file type, Excel will default to the “Add-Ins” directory, which is probably a pretty good place to save these. For this post, I’m going to save mine as “guj_formulas.xla”.

Now we just need to register our file as an add-in. This will tell Excel to open this file each time Excel is started. So to follow through with the previous metaphor, every time Excel gets ready to work, it grabs its trusty spiral notebook with all of your formulas in it.

To do this, go to Tools -> Add-Ins…

Click “Browse”, and double click your file. It adds it to the list of Add-Ins, and goes ahead and checks it for you.

And that’s it! Every time you open excel you can use the “stupid_forumla” formula. Alternatively, you can enter the VBA window at any time to view the formulas you have saved as an add-in, and alter or add to them. You will need to save your work using the save function on the VBA window in order to realize these changes in future instances of Excel.

Have fun with it, and remember, Excel works for you, not the other way around.

Create Your Own Excel Formulas

Having just completed a very large project at work involving six horrendous weeks of brutal data pulling, formatting, and analysis, and resulting in two 400+ page books and two 150+ page books of stuff that nobody will ever really care about, I am now free to write a new post! Today’s topic is creating custom formulas for Excel.

Excel is nothing if not configurable, but few people have the technical acumen, the desire, or the time to spend customizing it. Hopefully, this post will help to boost the technical acumen of the reader, but as for the desire and time, you will have to be the judge. I think you will find that performing some simple tasks such as this will free up much more time in the future, and such should feed your desire.

Ok, let’s get crackin’. The example I’m going to use throughout this tutorial is for a formula which we use frequently around the office, but of which few people are aware. It is for the Compounded Annual Growth Rate, or CAGR. When you have two figures occuring in different years, say, the population of a county in Florida, and you want to know how much this figure grew or will grow in each year in between, you use the CAGR.

Now I know what you’re thinking. The growth rate should be as simple as taking the quotient of the the difference between the first and last numbers over the first number, i.e. (xy)/x, and then dividing the result by the number of years in between.

What that will give you is, in fact, a simple solution to this problem. If the population of Orange County is 100,000 in 2007 and is slated to be 200,000 in 2012, then it will grow (200,000-100,000)/100,000, or 100%.  Divide this by the number of years, and you get 100%/5, or 20% per year. However, if the population truly grew 20% per year, then the interest would compound each year, resulting in a final population of 248,832. This is the same way credit card companies make money. This is also the way anyone can make money just by having money.

Anyway, what we want to get at is what percentage of the total population would grow each year in order to reach 100% in five years. This is calculated using the CAGR. The formula looks like this:

CAGR(t0,tn) = (V(tn) / V(t0))^(1 / (tn – t0)) – 1

Where V(t0) is the start value, V(tn) is the finish value, and (tn – t0) is the number of years in between.

Written as a function, accepting start_value, finish_value, and number_years, it would look like this:

CAGR = ((finish_value / start_value)^(1 / number_years)) – 1

Sort of a mouthful, and quite a bit to remember. Even when memorized, it’s quite a bit to have to type every time you want to use the formula. It would be nice of Excel to include this as a standard formula, but they don’t, and that’s probably because only a small amount of users would ever need it. There are probably thousands of formulas that some people use every day that I will never even need to know. That’s why Excel is customizable. So, on to the point of all of this.

Wouldn’t it be nice if instead of writing that formula over and over again, one could just type

=CAGR(A1,B1,5)

into a cell and get the yearly growth rate? Well, you can! And you do it using Excel VBA, which stands for Visual Basic for Applications. Visual Basic is a programming language, but as the name implies, it is very simple and intended for the average or slightly above-average user. Once you understand some of the basic syntax, you can do just about anything.

You get to the VBA section of Excel by clicking on Tools -> Macro -> Visual Basic Editor. Once there, you need to create a module, or a file containing programming code, in your workbook. Do this by clicking Insert -> Module. You now have a blank canvas in front of you, and you are ready to create magic.

For our CAGR formula, type the following into this blank canvas:

Function CAGR (start_value as double, end_value as double, num_years as integer) AS double
   CAGR=((end_value / start_value)^(1/num_years))-1
End Function

What does this mean? Well, the first line tells VBA what the name of your function is, and what parameters, or variables, to expect. We know that in our calculation of CAGR, we need to know both values and the number of years that separate them. So we tell VBA that we will be using these values. The first two are accepted as double, which means they are registered as double-precision floating point value, which is a fancy way of saying the number can be really big, or with a lot of numbers after the decimal point. It’s the biggest number type that VBA can work with, and it’s probably way more than most people will ever need. The number of years is registered as an integer, which means it can be any number up to 32,767 and can only contain whole numbers (it will be rounded if necessary). This should be more than sufficient for any CAGR calculations we will do.

The second line uses the name of the function (CAGR) and defines its value based on the parameters. Since it is the only line within our function, the last line (“End Function”) tells VBA to stop working, and just return the value of CAGR to the cell in which the function was entered.

Just close your VBA window to get back into Excel and start to have fun! Create a spreadsheet like this:

Then type in the formula to calculate the five-year growth rate (btw, this could be stored as a function in VBA as well, but you will have to name it something other than “growth” because Excel has a built-in function with that name):

Then type in our CAGR function in the adjactent cell:

And hit enter. Ta-da! We have our calculation!

Turns out it would actually grow 14.9% per year, and after five years, it will have grown 100%.

That’s it for now. The thing to remember about these saved formulas is that they’re only available in the workbook in which they are saved (or in other workbooks so long as that workbook is open), so you’ll have to recreate the VBA code each time you want to use it (however, once it is calculated, the value will remain in the cell whether or not the function is present). In the next post, I’ll show you how to save your VBA code as an Excel plug-in, so it will be available every time you open Excel.

Calculated Items in Pivot Tables

Work with Excel pivot tables long enough and you will find yourself suffering many frustrations. For starters, there’s no way to create your own AutoFormats, so unless your company wants to use Microsoft’s idea of good-looking tables as its standard format, you’re prett much stuck with copy/pasting and doing your own formatting. This is as of Excel 2003, of course.

But one of my biggest pet peeves, and one that I’ve actually been able – to some extent – to overcome, is the idea of creating my own formulas that will pivot and calculate the way any other data element in the report would. That’s a little vague, so I’ll give an example.

Right now my department is working on yearly assesments of the market shifts and demographics for 2007 (yes, it’s half-way through 2008, but that’s how long it takes to get the needed data from the state). There are a few calculations that have to be done on this data, including, but not limited to:

  1. Since the data is still only current through the third quarter of 2007, the year of 2007 needs to be multiplied by 4/3 to annualize it for proper trending.
  2. Each campus has its own overview, and groups competing hospitals differently. For my books, I need to look at our main campus, all of our campuses as a group, the three main competing hospitals as one group, and all the competition as one group.
  3. For many of the campuses, we are looking at the tri-county and seven-county regions surrounding our hospital as a grouping for high-level analyses. The smaller campuses define primary and secondary service areas by zip codes (far more annoying).

All of these caluculations and groupings can be done using “Calculated Fields” and “Calculated Items”. Seems simple enough, but this feature hid itself from me for a long time.

Ok, so lets say you have a pivot table, created from a very simple data set of foodstuffs (carrots, milk, oranges, peas, tofu, and yogurt), some states where people eat these foodstuffs, and the number of people from each state who eat each foodstuff. It might look like this:

A very simple pivot table.

Sample Pivot Table

Now, say you want to group these states differently. You want to see FL, GA, AL, and LA grouped as the southeastern states. If your dataset is small enough and is already housed in excel, you can just overwrite the values of the states FL, GA, AL, and LA as “Southeast”, or create another column titled “Region” and fill in “Southeast” wherever applicable. But if you’re using a pivot table with over 65k records, this isn’t an option (Excel 2007 features an unrestricted number of rows, which will make a lot of this obsolete).

Pivot Table Toolbar

PivotTable Toolbar

Ever seen this window before? It’s the pivot table toolbar. It’s shown here as a modal window, but it can also be embedded into the main toolbar, of course, which is where I usually have it. If you don’t have this showing, just right-click on your pivot table and select “Show PivotTable Toolbar”.

Even if you have used this toolbar before, you might not have noticed that “PivotTable” drop-down on the left. There are some interesting things in there, like this:

PivotTable drop-down

You may have to click on appropriate fields in your pivot table first in order to access both Calculated Fields and Calculated Items. In my case, I clicked on one of the states first. Once you select Calculated Item, you get the following dialog box:

Calculated Field options

This shows the fields in my pivot table on the left, and the values available for the selected field on the right. This is not the same dialog box you get for a calculated item, because calculated fields assume you are using the values of a field to create a calculation, whereas calculated items use the items themselves. It’s a very fine line between the two, and I urge you to examine for yourself the subtle differences.

Anyway, on with the show. This is how I would create a grouping of the southeastern states:

Modified Calculated Item options

Essentially, the field “Southeast” will contain the added values of whatever would have shown for AL, FL, GA, and LA. This remains true no matter how the table is pivotted (another distinction between this and calculated items). Once this is created, it shows as a regular item under the “State” field, as such:

Completed pivot table

Updated PivotTable

Now I am free to pivot as I see fit, and the “Southeast” value will always act as the simple sum of those four states, whether nested or not.

You may have noticed that adding a calculated item causes the totals to be “double-dipped”. This is yet another annoyance of pivot tables, but, really, how else would it be done? You are creating basically another item, and it needs to be included in the “Grand Total”. In our situation, we don’t need to see the southeastern states twice, we only need to see their grouped value. Simple enough, all we need to do is filter out the individual states by clicking the “State” drop-down and unchecking them, as such:

Pivot Table Filter

Now your pivot table will only show the non-southeast states and the Southeast grouping, ensuring that your totals will be correct, and all will be right with the universe.