Thanks for the clarification Peter.
The separate collection is so I can update recent content without having to crawl the 20,000 urls in the main website.
Big crawls can take a long time and I need recent content to be available as soon as it is published.
We used to do instant update feeds - but often ran into issues with lock files while the main crawl was taking place or multiple instant updates tried to run at once.
I don't really want to use a push collection - as that's a big workload to get the API up and running and configure Matrix to act on all of the different events.
I should be able to work out a way to ensure the content in each collection is unique - and will also use the result collapsing option as a failsafe backup.
Thanks again for your help
Karl