Collection stopping crawl at 10k

I am encountering a collection where the crawler is stopping at exactly 10,000 pages.

What are the kinds of settings that would prevent the crawler from continuing. I believe they are licensed for 25k.

Only setting I can think of is:
crawler.overall_crawl_timeout
This is currently set to 240min but I would not have thought it would consistently cut its crawl phase at a clean 10k

It could be a few things:

  • Check the server licence and confirm the size (you mentioned you think it’s 25k)
  • Check collection.cfg for the following settings:
    crawler.max_files_per_server
    crawler.max_files_stored
    -maxdocs option set as an indexer_option

You could be hitting a crawler trap limiter and may need to increase crawler.max_files_per_area which defaults to 10000 documents

Max files stored set as an option for site_profiles.cfg

1 Like

My gut feeling is that this is the reason. This collection is crawled from a Seed url containing a generated list of assets, this list is over 10k. I will up this limit
Cheers Peter :slight_smile: