Adding /page in exclude_patterns is affecting crawling web documents

gnana · May 22, 2020, 5:59pm

After added /page in exclude_pattern in collection configuration, It’s also blocked to crawl other URLs, those not have /page

/page is any keyword or constant, that we should use in exclude_pattern!?

This is our list URLs need to removed from crawling, any suggestions are greatly appreciated.

https://domainname.com/page/1/
https://domainname.com/page/2/
https://domainname.com/page/3/
https://domainname.com/page/4/
https://domainname.com/page/5/

plevan · May 24, 2020, 10:21pm

You can’t use an exclude pattern for a page you need to crawl through.

Take a look at the following article - you should ideally specify robots meta tags in your page content, or failing that use the kill configuration to remove the files from the search index after indexing.