I have a scenario where I am indexing a number of external websites but some of the start URLs redirect to other links. We need to be able to populate the start URLs with a url such as “mysite.com” but when the crawler runs it will be redirected to a different URL (e.g. “redirect.com”) - this is fine but when it comes to external_metadata configuration - ideally we would like to populate this file with the known URL mysite.com rather than the redirected URL. Is there anyway to configure the crawler to handle redirects?
For example - if my external_metadata.cfg contains:
mysite.com disclaimer:this is a third party site
I would want this disclaimer to be applied to mysite.com and also any sites to which the crawler is redirected (rather than linked). Is this possible or would we need to explicitly list each redirect URL in the external_metadata.cfg?