Just a short, simple blog for Bob to share his thoughts.
20 November 2013 • by Bob • Classic ASP, IIS, SEO, URL Rewrite
Last year I wrote a blog titled Using Classic ASP and URL Rewrite for Dynamic SEO Functionality, in which I described how you could combine Classic ASP and the URL Rewrite module for IIS to dynamically create Robots.txt and Sitemap.xml files for your website, thereby helping with your Search Engine Optimization (SEO) results. A few weeks ago I had a follow-up question which I thought was worth answering in a blog post.
Here is the question that I was asked:
"What if I don't want to include all dynamic pages in sitemap.xml but only a select few or some in certain directories because I don't want bots to crawl all of them. What can I do?"
That's a great question, and it wasn't tremendously difficult for me to update my original code samples to address this request. First of all, the majority of the code from my last blog will remain unchanged - here's the file by file breakdown for the changes that need made:
| Filename | Changes | 
|---|---|
| Robots.asp | None | 
| Sitemap.asp | See the sample later in this blog | 
| Web.config | None | 
So if you are already using the files from my original blog, no changes need to be made to your Robot.asp file or the URL Rewrite rules in your Web.config file because the question only concerns the files that are returned in the the output for Sitemap.xml.
The good news it, I wrote most of the heavy duty code in my last blog - there were only a few changes that needed to made in order to accommodate the requested functionality. The main difference is that the original Sitemap.asp file used to have a section that recursively parsed the entire website and listed all of the files in the website, whereas this new version moves that section of code into a separate function to which you pass the unique folder name to parse recursively. This allows you to specify only those folders within your website that you want in the resultant sitemap output.
With that being said, here's the new code for the Sitemap.asp file:
<%
    Option Explicit
    On Error Resume Next
    
    Response.Clear
    Response.Buffer = True
    Response.AddHeader "Connection", "Keep-Alive"
    Response.CacheControl = "public"
    
    Dim strUrlRoot, strPhysicalRoot, strFormat
    Dim objFSO, objFolder, objFile
    strPhysicalRoot = Server.MapPath("/")
    Set objFSO = Server.CreateObject("Scripting.Filesystemobject")
    
    strUrlRoot = "http://" & Request.ServerVariables("HTTP_HOST")
    
    ' Check for XML or TXT format.
    If UCase(Trim(Request("format")))="XML" Then
        strFormat = "XML"
        Response.ContentType = "text/xml"
    Else
        strFormat = "TXT"
        Response.ContentType = "text/plain"
    End If
    ' Add the UTF-8 Byte Order Mark.
    Response.Write Chr(CByte("&hEF"))
    Response.Write Chr(CByte("&hBB"))
    Response.Write Chr(CByte("&hBF"))
    
    If strFormat = "XML" Then
        Response.Write "<?xml version=""1.0"" encoding=""UTF-8""?>" & vbCrLf
        Response.Write "<urlset xmlns=""http://www.sitemaps.org/schemas/sitemap/0.9"">" & vbCrLf
    End if
    
    ' Always output the root of the website.
    Call WriteUrl(strUrlRoot,Now,"weekly",strFormat)
    ' Output only specific folders.
    Call ParseFolder("/marketing")
    Call ParseFolder("/sales")
    Call ParseFolder("/hr/jobs")
    ' --------------------------------------------------
    ' End of file system loop.
    ' -------------------------------------------------- 
    If strFormat = "XML" Then
        Response.Write "</urlset>"
    End If
    
    Response.End
    ' ======================================================================
    '
    ' Recursively walks a folder path and return URLs based on the
    ' static *.html files that it locates.
    ' 
    ' strRootFolder = The base path for recursion
    '
    ' ======================================================================
    Sub ParseFolder(strParentFolder)
        On Error Resume Next
        Dim strChildFolders, lngChildFolders
        Dim strUrlRelative, strExt
        ' Get the list of child folders under a parent folder.
        strChildFolders = GetFolderTree(Server.MapPath(strParentFolder))
        ' Loop through the collection of folders.
        For lngChildFolders = 1 to UBound(strChildFolders)
            strUrlRelative = Replace(Mid(strChildFolders(lngChildFolders),Len(strPhysicalRoot)+1),"\","/")
            Set objFolder = objFSO.GetFolder(Server.MapPath("." & strUrlRelative))
            ' Loop through the collection of files.
            For Each objFile in objFolder.Files
                strExt = objFSO.GetExtensionName(objFile.Name)
                If StrComp(strExt,"html",vbTextCompare)=0 Then
                    If StrComp(Left(objFile.Name,6),"google",vbTextCompare)<>0 Then
                        Call WriteUrl(strUrlRoot & strUrlRelative & "/" & objFile.Name, objFile.DateLastModified, "weekly", strFormat)
                    End If
                End If
            Next
        Next
    End Sub
    ' ======================================================================
    '
    ' Outputs a sitemap URL to the client in XML or TXT format.
    ' 
    ' tmpStrFreq = always|hourly|daily|weekly|monthly|yearly|never 
    ' tmpStrFormat = TXT|XML
    '
    ' ======================================================================
    Sub WriteUrl(tmpStrUrl,tmpLastModified,tmpStrFreq,tmpStrFormat)
        On Error Resume Next
        Dim tmpDate : tmpDate = CDate(tmpLastModified)
        ' Check if the request is for XML or TXT and return the appropriate syntax.
        If tmpStrFormat = "XML" Then
            Response.Write " <url>" & vbCrLf
            Response.Write " <loc>" & Server.HtmlEncode(tmpStrUrl) & "</loc>" & vbCrLf
            Response.Write " <lastmod>" & Year(tmpLastModified) & "-" & Right("0" & Month(tmpLastModified),2) & "-" & Right("0" & Day(tmpLastModified),2) & "</lastmod>" & vbCrLf
            Response.Write " <changefreq>" & tmpStrFreq & "</changefreq>" & vbCrLf
            Response.Write " </url>" & vbCrLf
        Else
            Response.Write tmpStrUrl & vbCrLf
        End If
    End Sub
    ' ======================================================================
    '
    ' Returns a string array of folders under a root path
    '
    ' ======================================================================
    Function GetFolderTree(strBaseFolder)
        Dim tmpFolderCount,tmpBaseCount
        Dim tmpFolders()
        Dim tmpFSO,tmpFolder,tmpSubFolder
        ' Define the initial values for the folder counters.
        tmpFolderCount = 1
        tmpBaseCount = 0
        ' Dimension an array to hold the folder names.
        ReDim tmpFolders(1)
        ' Store the root folder in the array.
        tmpFolders(tmpFolderCount) = strBaseFolder
        ' Create file system object.
        Set tmpFSO = Server.CreateObject("Scripting.Filesystemobject")
        ' Loop while we still have folders to process.
        While tmpFolderCount <> tmpBaseCount
            ' Set up a folder object to a base folder.
            Set tmpFolder = tmpFSO.GetFolder(tmpFolders(tmpBaseCount+1))
              ' Loop through the collection of subfolders for the base folder.
            For Each tmpSubFolder In tmpFolder.SubFolders
                ' Increment the folder count.
                tmpFolderCount = tmpFolderCount + 1
                ' Increase the array size
                ReDim Preserve tmpFolders(tmpFolderCount)
                ' Store the folder name in the array.
                tmpFolders(tmpFolderCount) = tmpSubFolder.Path
            Next
            ' Increment the base folder counter.
            tmpBaseCount = tmpBaseCount + 1
        Wend
        GetFolderTree = tmpFolders
    End Function
%>
It should be easily seen that the code is largely unchanged from my previous blog.
One last thing to consider, I didn't make any changes to the Robots.asp file in this blog. But that being said, when you do not want specific paths crawled, you should add rules to your Robots.txt file to disallow those paths. For example, here is a simple Robots.txt file which allows your entire website:
# Robots.txt # For more information on this file see: # http://www.robotstxt.org/ # Define the sitemap path Sitemap: //sitemap.xml # Make changes for all web spiders User-agent: * Allow: / Disallow:
If you were going to deny crawling on certain paths, you would need to add the specific paths that you do not want crawled to your Robots.txt file like the following example:
# Robots.txt # For more information on this file see: # http://www.robotstxt.org/ # Define the sitemap path Sitemap: //sitemap.xml # Make changes for all web spiders User-agent: * Disallow: /foo Disallow: /bar
With that being said, if you are using my Robots.asp file from my last blog, you would need to update the section of code that defines the paths like my previous example:
Response.Write "# Make changes for all web spiders" & vbCrLf Response.Write "User-agent: *" & vbCrLf Response.Write "Disallow: /foo" & vbCrLf Response.Write "Disallow: /bar" & vbCrLf
I hope this helps. ;-]
Note: This blog was originally posted at http://blogs.msdn.com/robert_mcmurray/
Tags: IIS, ASP, Classic ASP, SEO, URL Rewrite