How are the files currently obtained?
There are some other options - usually a web collection will work for this but a custom collection may be more appropriate in this instance - it can be used to connect to whatever the arbitrary source is and fetch the file. It can then be hooked up to a standard filter chain as for other collections.
You may also be able to use a filecopy collection in place of the local collection (this will enable you to filter) but it’s not ideal as filecopy collections have some quirks associated with them that may need to be worked around depending on what your collection is doing. This is probably a quicker solution but not ‘best practice’.
Heap space issues can often be worked around by adjusting the relevant heap (e.g. gather.max_heap_size) if that’s where you are running out of memory.