Hi all,
I've got some web pages that display different content depending on whether a query string parameter is present (and what it is), and I have to crawl and index both versions with a web collection. I've created a seed page with a list of URLs without and URL:s with e.g.:
https://www.somewhere.com/something/mycontent-1
https://www.somewhere.com/something/mycontent-1?year=2023
https://www.somewhere.com/something/mycontent-2
https://www.somewhere.com/something/mycontent-2?year=2023
etc etc
The year parameter is set to 2024 when not specified in the query string, and this value is saved as metadata for use in search facets.
When I update the collection, only the yearless URLs appear in the index, with the year metadata as 2024. If I manually add a query string URL then it updates the data in the index (including changing the year metadata value to 2023) but the URL in the index remains yearless and there is still only one page indexed instead of two.
If I create rewrite rules that translate the query string URLs into the format:
https://www.somewhere.com/something/mycontent-1/2023
and update the seed page with the new URLs, the same thing happens - I only get the yearless URLs indexed. If I remove all the yearless URLs so only /2023 URLs are on the seed page, the correct data is indexed (the year metadata is 2023 for all pages crawled) but the URLs remain yearless i.e. when you click on the link in the index it takes you to the wrong page with the 2024 data.
Is there some way I can tell Funnelback that the URLs are different, though similar, pages, and I want them all listed in the index?