Pre_index_command curl command dynamically curl paginated xml urls

Is it possible to loop through multiple urls dynamically and curl them on the pr_idex_command of a local collection?

Here is our current command, we manually add the url’s as more pages appear. I have added some line breaks between the curl commands here.

pre_index_command=

curl --retry 5 --retry-delay 1 --retry-max-time 600 -Lm600 https://www.ulster.ac.uk/tester/_web_services/phds-xml/_nocache?result_40001_result_page=1 -o $SEARCH_HOME/conf/$COLLECTION_NAME/_default_preview/web/opps2.xml;

curl --retry 5 --retry-delay 1 --retry-max-time 600 -Lm600 https://www.ulster.ac.uk/tester/_web_services/phds-xml/_nocache?result_40001_result_page=2 -o $SEARCH_HOME/conf/$COLLECTION_NAME/_default_preview/web/opps2.xml;

etc…

There are up to 12 pages now, becoming hard to manually update.

You could write a shell script to do that however best practice is to implement a custom collection.

Local collections are not recommended for this sort of thing (they are really designed to index a bunch of locally stored files in place and do not do any filtering).

There’s a worked example on using a custom collection to connect to an API here: http://training-search.clients.funnelback.com/training/FUNL204.html#_indexing_systems_via_an_api