I am encountering a collection where the crawler is stopping at exactly 10,000 pages.
What are the kinds of settings that would prevent the crawler from continuing. I believe they are licensed for 25k.
Only setting I can think of is:
crawler.overall_crawl_timeout
This is currently set to 240min but I would not have thought it would consistently cut its crawl phase at a clean 10k
It could be a few things:
- Check the server licence and confirm the size (you mentioned you think it’s 25k)
- Check collection.cfg for the following settings:
crawler.max_files_per_server
crawler.max_files_stored
-maxdocs option set as an indexer_option
You could be hitting a crawler trap limiter and may need to increase crawler.max_files_per_area which defaults to 10000 documents
Max files stored set as an option for site_profiles.cfg
1 Like
My gut feeling is that this is the reason. This collection is crawled from a Seed url containing a generated list of assets, this list is over 10k. I will up this limit
Cheers Peter 