Prevent funnelback from showing XML files

Hi,

I’m using a local collection to index a single XML file which contains a large number of records. If I search for !nullquery, all the records show (as they should) but the original XML file itself is being listed.

Is there a way I can stop it from being counted as a result?

Thanks,
Robin

The safest way would be to add the source XML file to the kill_exact.cfg configuration file. During the next collection update, files exactly matching those URLs will be removed after the index phase.

See also:
https://docs.funnelback.com/kill_exact_cfg.html

How would I do that? Should I:

  1. Create a new notepad document and put the exact URL as the first line then save that as kill_exact.cfg

  2. Go to Administer > Browse Collection Configuration Files

  3. Under config, click browse, select kill_exact.cfg file then click Upload file?

Then update the collection again? Do I need to edit collection.cfg?

You’re very close. The process in v14+ looks like:

  1. Administer > Browse Collection Configuration Files
  2. From the ‘Config’ tier, you’ll see a drop-down of configuration files to add. This should include kill_partial.cfg and kill_exact.cfg. Selecting one of them will take you to the editing screen for kill_*.cfg
  3. Add URLs (or URL patterns) to your kill_* file
  4. Reindex: Administer > Updated > Advanced Update > Reindex Live view

See also:
https://docs.funnelback.com/document_flags.html

If you’re using a version older than 14.0, you’ll need to add some additional workflow steps, as described at:

Ah we only have 13.2.0…

Is there anyway of getting to the bin directory from the admin page or do you need backend directory access?

You’ll still be able to achieve this without backend (SSH) access. I’d suggest:

  1. Creating a kill_exact.cfg / kill_partial.cfg locally, using the syntax described at Document Flags - Funnelback Documentation.
  2. Upload configuration files via the Admin UI (Administer > Browse Collection Configuration Files > Config > Upload)
  3. Adding the following line to collection.cfg:

post_index_command=$SEARCH_HOME/bin/padre-fl $SEARCH_HOME/data/COLLECTION/live/idx $SEARCH_HOME/conf/COLLECTION/kill_partial.cfg -kill; $SEARCH_HOME/bin/padre-fl $SEARCH_HOME/data/COLLECTION/live/idx $SEARCH_HOME/conf/COLLECTION/kill_exact.cfg -kill -exactmatch

Triggering a re-index via the Admin UI will then trigger the commands in the post_index_command.

When you do upgrade to v14+, these file names, locations and contents can remain unchanged, but the manual workflow steps will no longer be necessary.

See also:

Thanks for your help Gordon, I’ll give that a go!

When I try to upload the configuration file kill_exact.cfg I get the following error:

Processing errors
The following errors were encountered:

  • Security violation
  • You cannot upload a file called 'kill_exact.cfg

Is this due to my account type?

Quite possibly. You may not have the necessary permissions to upload via the Admin UI. Some of those controls are managed via the File Manager configuration, and user account permissions.

I’d suggest getting in touch with your Search Administrator or Funnelback Support to assist further, citing detail from the scenario and associated discussion above.