Just a short, simple blog for Bob to share his thoughts.
31 December 2012 • by Bob • IIS, URL Rewrite, SEO, Classic ASP
I had another interesting situation present itself recently that I thought would make a good blog: how to use Classic ASP with the IIS URL Rewrite module to dynamically generate Robots.txt and Sitemap.xml files.
Here's the situation: I host a website for one of my family members, and like everyone else on the Internet, he wanted some better SEO rankings. We discussed a few things that he could do to improve his visibility with search engines, and one of the suggestions that I gave him was to keep his Robots.txt and Sitemap.xml files up-to-date. But there was an additional caveat - he uses two separate DNS names for the same website, and that presents a problem for absolute URLs in either of those files. Before anyone points out that it's usually not a good idea to host multiple DNS names on the same content, there are times when this is acceptable; for example, if you are trying to decide which of several DNS names is the best to use, you might want to bind each name to the same IP address and parse your logs to find out which address is getting the most traffic.
In any event, the syntax for both Robots.txt and Sitemap.xml files is pretty easy, so I wrote a couple of simple Classic ASP Robots.asp and Sitemap.asp pages that output the correct syntax and DNS-specific URLs for each domain name, and I wrote some simple URL Rewrite rules that rewrite inbound requests for Robots.txt and Sitemap.xml files to the ASP pages, while blocking direct access to the Classic ASP pages themselves.
All of that being said, there are a couple of quick things that I would like to mention before I get to the code:
That being said, let's move on to the actual code.
There are three files that you will need to create for this example:
You need to save the following code sample as Robots.asp in the root of your website; this page will be executed whenever someone requests the Robots.txt file for your website. This example is very simple: it checks for the requested hostname and uses that to dynamically create the absolute URL for the website's Sitemap.xml file.
<% Option Explicit On Error Resume Next Dim strUrlRoot Dim strHttpHost Dim strUserAgent Response.Clear Response.Buffer = True Response.ContentType = "text/plain" Response.CacheControl = "public" Response.Write "# Robots.txt" & vbCrLf Response.Write "# For more information on this file see:" & vbCrLf Response.Write "# http://www.robotstxt.org/" & vbCrLf & vbCrLf strHttpHost = LCase(Request.ServerVariables("HTTP_HOST")) strUserAgent = LCase(Request.ServerVariables("HTTP_USER_AGENT")) strUrlRoot = "http://" & strHttpHost Response.Write "# Define the sitemap path" & vbCrLf Response.Write "Sitemap: " & strUrlRoot & "/sitemap.xml" & vbCrLf & vbCrLf Response.Write "# Make changes for all web spiders" & vbCrLf Response.Write "User-agent: *" & vbCrLf Response.Write "Allow: /" & vbCrLf Response.Write "Disallow: " & vbCrLf Response.End %>
The following example file is also pretty simple, and you would save this code as Sitemap.asp in the root of your website. There is a section in the code where it loops through the file system looking for files with the *.html file extension and only creates URLs for those files. If you want other files included in your results, or you want to change the code from static to dynamic content, this is where you would need to update the file accordingly.
<% Option Explicit On Error Resume Next Response.Clear Response.Buffer = True Response.AddHeader "Connection", "Keep-Alive" Response.CacheControl = "public" Dim strFolderArray, lngFolderArray Dim strUrlRoot, strPhysicalRoot, strFormat Dim strUrlRelative, strExt Dim objFSO, objFolder, objFile strPhysicalRoot = Server.MapPath("/") Set objFSO = Server.CreateObject("Scripting.Filesystemobject") strUrlRoot = "http://" & Request.ServerVariables("HTTP_HOST") ' Check for XML or TXT format. If UCase(Trim(Request("format")))="XML" Then strFormat = "XML" Response.ContentType = "text/xml" Else strFormat = "TXT" Response.ContentType = "text/plain" End If ' Add the UTF-8 Byte Order Mark. Response.Write Chr(CByte("&hEF")) Response.Write Chr(CByte("&hBB")) Response.Write Chr(CByte("&hBF")) If strFormat = "XML" Then Response.Write "<?xml version=""1.0"" encoding=""UTF-8""?>" & vbCrLf Response.Write "<urlset xmlns=""http://www.sitemaps.org/schemas/sitemap/0.9"">" & vbCrLf End if ' Always output the root of the website. Call WriteUrl(strUrlRoot,Now,"weekly",strFormat) ' -------------------------------------------------- ' This following section contains the logic to parse ' the directory tree and return URLs based on the ' static *.html files that it locates. This is where ' you would change the code for dynamic content. ' -------------------------------------------------- strFolderArray = GetFolderTree(strPhysicalRoot) For lngFolderArray = 1 to UBound(strFolderArray) strUrlRelative = Replace(Mid(strFolderArray(lngFolderArray),Len(strPhysicalRoot)+1),"\","/") Set objFolder = objFSO.GetFolder(Server.MapPath("." & strUrlRelative)) For Each objFile in objFolder.Files strExt = objFSO.GetExtensionName(objFile.Name) If StrComp(strExt,"html",vbTextCompare)=0 Then If StrComp(Left(objFile.Name,6),"google",vbTextCompare)<>0 Then Call WriteUrl(strUrlRoot & strUrlRelative & "/" & objFile.Name, objFile.DateLastModified, "weekly", strFormat) End If End If Next Next ' -------------------------------------------------- ' End of file system loop. ' -------------------------------------------------- If strFormat = "XML" Then Response.Write "</urlset>" End If Response.End ' ====================================================================== ' ' Outputs a sitemap URL to the client in XML or TXT format. ' ' tmpStrFreq = always|hourly|daily|weekly|monthly|yearly|never ' tmpStrFormat = TXT|XML ' ' ====================================================================== Sub WriteUrl(tmpStrUrl,tmpLastModified,tmpStrFreq,tmpStrFormat) On Error Resume Next Dim tmpDate : tmpDate = CDate(tmpLastModified) ' Check if the request is for XML or TXT and return the appropriate syntax. If tmpStrFormat = "XML" Then Response.Write " <url>" & vbCrLf Response.Write " <loc>" & Server.HtmlEncode(tmpStrUrl) & "</loc>" & vbCrLf Response.Write " <lastmod>" & Year(tmpLastModified) & "-" & Right("0" & Month(tmpLastModified),2) & "-" & Right("0" & Day(tmpLastModified),2) & "</lastmod>" & vbCrLf Response.Write " <changefreq>" & tmpStrFreq & "</changefreq>" & vbCrLf Response.Write " </url>" & vbCrLf Else Response.Write tmpStrUrl & vbCrLf End If End Sub ' ====================================================================== ' ' Returns a string array of folders under a root path ' ' ====================================================================== Function GetFolderTree(strBaseFolder) Dim tmpFolderCount,tmpBaseCount Dim tmpFolders() Dim tmpFSO,tmpFolder,tmpSubFolder ' Define the initial values for the folder counters. tmpFolderCount = 1 tmpBaseCount = 0 ' Dimension an array to hold the folder names. ReDim tmpFolders(1) ' Store the root folder in the array. tmpFolders(tmpFolderCount) = strBaseFolder ' Create file system object. Set tmpFSO = Server.CreateObject("Scripting.Filesystemobject") ' Loop while we still have folders to process. While tmpFolderCount <> tmpBaseCount ' Set up a folder object to a base folder. Set tmpFolder = tmpFSO.GetFolder(tmpFolders(tmpBaseCount+1)) ' Loop through the collection of subfolders for the base folder. For Each tmpSubFolder In tmpFolder.SubFolders ' Increment the folder count. tmpFolderCount = tmpFolderCount + 1 ' Increase the array size ReDim Preserve tmpFolders(tmpFolderCount) ' Store the folder name in the array. tmpFolders(tmpFolderCount) = tmpSubFolder.Path Next ' Increment the base folder counter. tmpBaseCount = tmpBaseCount + 1 Wend GetFolderTree = tmpFolders End Function %>
Note: There are two helper methods in the preceding example that I should call out:
The last step is to add the URL Rewrite rules to the Web.config file in the root of your website. The following example is a complete Web.config file, but you could merge the rules into your existing Web.config file if you have already created one for your website. These rules are pretty simple, they rewrite all inbound requests for Robots.txt to Robots.asp, and they rewrite all requests for Sitemap.xml to Sitemap.asp?format=XML and requests for Sitemap.txt to Sitemap.asp?format=TXT; this allows requests for both the XML-based and text-based sitemaps to work, even though the Robots.txt file contains the path to the XML file. The last part of the URL Rewrite syntax returns HTTP 404 errors if anyone tries to send direct requests for either the Robots.asp or Sitemap.asp files; this isn't absolutely necesary, but I like to mask what I'm doing from prying eyes. (I'm kind of geeky that way.)
<?xml version="1.0" encoding="UTF-8"?> <configuration> <system.webServer> <rewrite> <rewriteMaps> <clear /> <rewriteMap name="Static URL Rewrites"> <add key="/robots.txt" value="/robots.asp" /> <add key="/sitemap.xml" value="/sitemap.asp?format=XML" /> <add key="/sitemap.txt" value="/sitemap.asp?format=TXT" /> </rewriteMap> <rewriteMap name="Static URL Failures"> <add key="/robots.asp" value="/" /> <add key="/sitemap.asp" value="/" /> </rewriteMap> </rewriteMaps> <rules> <clear /> <rule name="Static URL Rewrites" patternSyntax="ECMAScript" stopProcessing="true"> <match url=".*" ignoreCase="true" negate="false" /> <conditions> <add input="{Static URL Rewrites:{REQUEST_URI}}" pattern="(.+)" /> </conditions> <action type="Rewrite" url="{C:1}" appendQueryString="false" redirectType="Temporary" /> </rule> <rule name="Static URL Failures" patternSyntax="ECMAScript" stopProcessing="true"> <match url=".*" ignoreCase="true" negate="false" /> <conditions> <add input="{Static URL Failures:{REQUEST_URI}}" pattern="(.+)" /> </conditions> <action type="CustomResponse" statusCode="404" subStatusCode="0" /> </rule> <rule name="Prevent rewriting for static files" patternSyntax="Wildcard" stopProcessing="true"> <match url="*" /> <conditions> <add input="{REQUEST_FILENAME}" matchType="IsFile" /> </conditions> <action type="None" /> </rule> </rules> </rewrite> </system.webServer> </configuration>
That sums it up for this blog; I hope that you get some good ideas from it.
For more information about the syntax in Robots.txt and Sitemap.xml files, see the following URLs:
Note: This blog was originally posted at http://blogs.msdn.com/robert_mcmurray/
Tags: IIS, SEO, URL Rewrite, ASP, IIS 7, IIS 8, URL, Classic ASP