Hi Jim,
Standard disclaimer -- every collection is different with different size HTML pages and server response times.
That being said, 48hours for 15k documents is too long in my opinion.
In order to identifiy the bottleneck, I would recommend checking out the Content Auditor which is located in the Funnelback Marketing dashboard. If the web collection is part of a meta collection, you would view the documents in the meta collection, or if the web collection is stand-alone you could view the documents directly for that collection.
In the Content Auditor, you could look for two factors that can have an impact on crawl time:
- Response time: This shows how long the Funnelback web crawler had to wait for the server to send the document
- Document size: If there are a lot of large documents (say PDFs), then a long crawl time would be expected
There are some configuration options that could be used to speed things up, but it depends on the environment that Funnelback is running in and it would be advisable to identify the source of the slowness before fiddling with settings.
For a log file to check how long it took, the crawl.log
may be the best, it has the 'Started At' time near the top and the 'Finished At' time near the bottom. There is a helpful log reference article here that you may find useful.
Rather than the log, you may prefer a graphical representation of the update times. This can be found in the Administration dashboard for the particular collection --> "Analyse" tab --> "View Collection Update History" button .