Getting Information From the Web Using Excel VBA

What we’re trying to do today is get a value from a webpage and return it to a custom function within Excel. I’m going to use a very simple example, and it should be inferred that this is only for very simple tasks. The information you seek to return to your function will need to be presented on the webpage the same way every time you visit it, because all you’re doing is skipping the web browser, bringing the background contents of the webpage into a string, and parsing the string to pull out the data you want.

What I mean by the “background contents” is the code behind all webpages. When your browser calls up a URL, it receives a bunch of code, which it uses to render the page to you. If you’re unfamiliar with HTML code, you may have a bit more difficulty with this, but all you really need to be able to do is recognize where your information is always kept within this code. This sounds more complicated than it is, really.

For my example, I want to create a function called “temperature” to which I can pass a city and state, and from which I can gather the current temperature of that city. I live in Orlando, FL, so that is the temperature I want to know.

Important: The first stumbling block in this project is the need for a special reference from within Excel VBA — the Microsoft Internet Transfer Control reference. You may need to download msinet.ocx (you can find it pretty easily through popular search engines) in order to add this reference. To add a reference, click Tools -> References from within the VBA window, locate the reference, and select it. If the reference is not available, download msinet.ocx, put it somewhere easy like \Windows\System, and then locate the file via “Browse…” from the references window.

The code for my example function follows, and I will explain everything in the following paragraphs:

Function temperature(strCity as string)

strCity = Replace(strCity, ", ", "%2C+")

  Dim myURL As String
myURL="http://search.msn.com/results.aspx?q=current+temperature+" _
& strCity

  Dim inet1 as inet
Dim mypage As String

  Set inet1 = New Inet
With inet1
.Protocol = icHTTP
.URL = myURL
mypage = .OpenURL(.URL, icString)
End With
Set inet1 = nothing

  Dim intStart As Integer, intEnd As Integer
intStart = InStr(mypage, "<span class=""wea_temp""></span>") + 22
intEnd = InStr(intStart, mypage, "&")

  temperature = mid(mypage, intStart, (intEnd - intStart))

End Function

I’ll take you line-by-line through this somewhat lengthy example function. To begin, the name “temperature” and the parameter strCity means that when you input this function into a cell within Excel, you would type “=temperature” followed by the city and state in quotes and parenthesis, i.e. ‘=temperature(“Orlando, FL”)’.

The second line takes strCity and replaces the comma with “%2C+”. The reasoning for this is to properly construct the URL. In the language of a URL, %2C means “comma”, and the + denotes a space. You can’t have spaces or punctuation marks in URLs.

Then we declare myURL, which is what we would have typed into the web browser. If you were to go to msn.com and search for something (like ‘current temperature Orlando, FL’), then looked at the address bar, you would see something like this. In fact, that’s how I constructed the string, and that’s how I urge you to go about this as well. We add our city and state – gathered from the function parameter – to the end of the string.

The next part gets a little vague for me, because I haven’t had much experience with the Internet Transfer Control. Like most controls, you need to first point a variable to it (“Dim inet1 as inet”), and then initialize it (“Set inet1 = New Inet”). In our example, we need to worry about three aspects of the Inet control – Protocol, URL, and OpenURL. The first two are parameters, and in our case will be icHTTP for the protocol, and the URL we have already constructed as the URL.

The third (OpenURL) is the method we’re using to hit the URL. The two parameters you need to pass to it are the URL to hit and what to do with the result. Since we’re only interested in the webpage itself and not any files that might be downloaded, we just want to return the value to a string, hence, we tell it to use the icString data type. icString is a data type unique to the Internet Transfer Control, and unfortunately, I don’t know much about it except that it can be transferred to a standard string data type.

We complete the web hit with the statement “Set inet1 = nothing” in order to clear out our memory. This is just good practice.

This is the fun and somewhat difficult part. If you search in msn for the temperature in Orlando, FL and then take a look at the source of the website through a standard browser, you’ll see a lot of stuff that means nothing to you. Buried in all of this stuff is the current temperature. The easiest thing to do in this case was to search msn, get the actual temperature, and then search for that number in the source. I then noticed that the temperature occurs directly after the first instance of the phrase ‘<span class=”wea_temp”>’. The InStr function tells me what character within the HTML code begins that phrase, and as the phrase is 23 characters long, we add 22 to that number. Immediately after the temperature was the character “&”, so we perform another InStr function (passing our first value as the start point for that function) to get the first instance of “&” following the temperature. The temperature is then recorded using the mid function, which asks for the string, the start value, and the length of the string you’re asking for (which would be the position of the “&” minus the position of the first character of the temperature).

Like I said, that’s the difficult part. You have to be able to read through all that code to find your value, and you have to play around with the InStr and mid functions to get the right numbers. This is also why it’s vital that you only do this with static pages. I know msn.com will always return the temperature of the city and state I search for at the top of the search results, so I am confident that this function won’t break. But if msn.com put the temperature arbitrarily in the list of search results, this would be substantially more difficult.

UPDATE: For a second example of how this might be put to use in order to calculate drive times between two addresses, latitude and longitude pairings, zip codes, etc., see my post http://guj800.wordpress.com/2011/04/27/drive-time-calculations-in-excel/

About these ads

10 thoughts on “Getting Information From the Web Using Excel VBA

  1. Hey JD, I noticed the following line of code:

    Dim intStart, intEnd As Integer

    This is translated by VBA as

    Dim intStart As Variant, intEnd As Integer

    You probably meant

    Dim intStart As Integer, intEnd As Integer

    You can share a ‘Dim’ statement but not a declaration.

    HTH,
    JP

  2. Pingback: Getting Information From the Web Using Excel VBA - Learn Excel

  3. Hi J.D.

    You’ve mixed your variables a little – creating myhttp but using mypage in your inStr()

    Also, they may have changed the site or it’s different for Brits:””
    works for me:

    Function Temp(strCity As String)
    strCity = Replace(strCity, “,”, “&2C+”)

    Dim myURL As String
    myURL = “http://search.msn.com/results.aspx?q=current+temperature+” & strCity

    Dim inet1 As Inet
    Dim myhttp As String

    Set inet1 = New Inet
    With inet1
    .Protocol = icHTTP
    .URL = myURL
    myhttp = .OpenURL(.URL, icString)
    End With
    Set inet1 = Nothing

    Dim intStart As Integer, intEnd As Integer
    intStart = InStr(myhttp, “”) + 39
    intEnd = InStr(intStart, myhttp, “&”)

    Temp = Mid(myhttp, intStart, (intEnd – intStart))

    End Function

  4. @JP, @Al, thank you for your input. I have changed the code in the post per your suggestions.

    @Al, you were right about the variable mismatch, but the function still will not work from my office computer. It just exits the function after reaching the line “Set inet1 = New Inet”. They lock these computers down pretty hard, so it may be a user restriction, or it could be a firewall issue. As you know, the part that you found different on the Brittish msn site did not post, and I am interested to know the difference. If you would be so kind as to repost, you can get the html string to post as text – rather than hypertext – by using { } instead of < >. It will not be a literal representation, but we will get the point.

  5. Hi JD,

    Thanks for this example. I was writing some similar code in MS Access (VBA) and was getting the error “Error 429 – cannot create Activex object” when I issue the command:

    Set inet1 = New Inet

    I then found and decided to test your sample code, and the same statement resulted in the 429 error.

    I do have the reference set up to msinet.ocx in VB references.

    Do you have any idea what is missing here?

    Thanks in advance,

    Dennis

  6. Hi,
    Following your instruction, I downloaded the MSINET.OCX file, and added it to VBA. Now I can see the Inet class in object library but when using it in the code, it gives me error ‘429’, not being able to create a new inet object!
    I’ve checked many threads on this, but I have no idea what they say about class registry or .dll files, etc.

    What should I do? Could you explain me this problem in a simple, new user word?

  7. I found this post and adjusted your code to create a VBA call to a Web API service for SEO purposes. It worked perfectly… just had to adjust the code for parsing my specific results. In my case, I encountered an issue where the results were all one long text string of key-value pairs, with each pair in quotation marks!

    Because VBA uses quotation marks to identify text strings, I had to build in a workaround that replaced all quotation marks with a separate character (I used the ~ character as it isn’t ever included in the results). The side benefit is that these made it much easier to parse the data based on the unique identifiers.

    Thanks for sharing this with us!

    • I am a rank novice with VBA functions but it seems as if a well crafted function is what I am looking for.

      I need to grab specific info from tables in a webpage and move it to excel so that I can then ultimately import into a database.

      The unfortunate thing is that the data that I am looking to import sits in a single column which needs to be moved to rows for my database export.

      I am eager enough to see if this can be done that I would be glad to retain an expert to write a macro and do a little instruction so that I can import information as needed.

      Any takers?

      If so please contact me.

      Tom

    • Michael, could I get a copy of what you did, code I’ve used with inet for years is not working now that I downgraded to Windows 7 and Internet 8.

      I get the following error when I do .openurl()

      runtime error 35750
      Unable to open internet handle

      Perhaps your work around would work for me.

      candkrus@yahoo.com

  8. Pingback: Drive Time Calculations in Excel « Guj800

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s