Hi,
I’m currently specifying a build and I’m wondering if it is possible to crawl an RSS feed. I’ve previously crawled XML and I just want to make sure that the same method can be used as RSS is a similar format to XML.
Thanks Michael
Hi,
I’m currently specifying a build and I’m wondering if it is possible to crawl an RSS feed. I’ve previously crawled XML and I just want to make sure that the same method can be used as RSS is a similar format to XML.
Thanks Michael
Hi Michael,
RSS is a dialect of XML, so it’s actually treated the same as XML by Funnelback. The same process can be used as XML by using the ‘XML Processing’ and ‘Configure Metadata Mappings’ options in the Admin Interface.
In addition to the previous comments:
crawler.parser.mimeTypes includes the appropriate mime types for your RSS and that you update the crawler.link_extraction_regular_expression and crawler.link_extraction_group setting to also identify URLs stored in the RSS (usually in a <url> element). However, I’m not sure how easy it will be to update the link regex to have a pattern that matches both standard links and also the ones in any RSS feed.