Thanks @plevan
Current workflow is as follows:
- Source system exports 6 XML files, each ~150-300MB large (with a total of 210K records)
- A script runs to perform XSLT transformation on the XML, then copies them from the source server to /var/tmp on the funnelback server.
- Funnelback performs a local index on them from /var/tmp.
We ran into difficulty using web crawler (the files were too big to load/crawl) and filecopier (java heap issues when copying files, even if we massively increased the heap size).
Best wishes,
Andrew