Many web sites now include a simple way to autodiscover the RSS feed for the site. This is done via a simple LINK tag and is supported by all the modern browsers. You should see - for example - a RSS icon in the address bar at this blog because I have the following HTML in my HEAD block:
<link rel="alternate" type="application/rss+xml" title="RSS" href="http://feedproxy.google.com/RaymondCamdensColdfusionBlog" />
I was talking to Todd Sharp today about how ColdFusion could look for this URL and I came up with the following snippet.
<cfset urls = ["http://www.raymondcamden.com", "http://www.coldfusionbloggers.org", "http://www.androidgator.com", "http://www.cfsilence.com/blog/client"]> <cfloop index="u" array="#urls#">
<cfoutput>Checking #u#<br/></cfoutput> <cfhttp url="#u#">
<cfset body = cfhttp.fileContent>
<cfset linkTags = reMatch("<link[^>]+type=""application/rss+xml"".?>",body)>
<cfif arrayLen(linkTags)>
<cfset rssLinks = []>
<cfloop index="ru" array="#linkTags#">
<cfif findNoCase("href=", ru)>
<cfset arrayAppend(rsslinks, rereplaceNoCase(ru,".href=""(.?)"".", "\1"))>
</cfif>
</cfloop>
<cfdump var="#rsslinks#" label="RSS Links">
<cfelse>
None found.
</cfif>
<p/>
</cfloop>
The snippet begins with a few sample URLs I used for testing. We then loop over each and perform a HTTP get. From this we can then use some regex to find link tags. You can have more than one so I create an array for my results and append to it the URLs I find within them. Nice and simple, right? You could also turn this into a simple UDF:
if(arrayLen(linkTags)) {
var rssLinks = [];
for(var ru in linkTags) {
if(findNoCase("href=", ru)) arrayAppend(rsslinks, rereplaceNoCase(ru,".href=""(.?)"".*", "\1"));
}
}
return rssLinks;
<cfscript>
function getRSSUrl(u) {
var h = new com.adobe.coldfusion.http();
h.setURL(arguments.u);
h.setMethod("get");
h.setResolveURL(true);
var result = h.send().getPrefix().fileContent;
var rssLinks = [];
var linkTags = reMatch("<link[^>]+type=""application/rss\+xml"".*?>",result);
}
</cfscript>
Not sure how useful this is - but enjoy!