Funnelback not recognising robots.txt rules

Hi,

I’m are trying to prevent certain sections of the site from being indexed.

I’ve updated the robots.txt file with a few Disallow rules for the Funnelback user-agent. you can see here: https://www.fife.gov.uk/robots.txt

I’ve updated the collections several times but still I see URLs with “/kb/docs/articles/privacy-notices/” in them

Do you know why Funnelback would seemingly ignore this? or have I not formatted the robots.txt properly.

Hi,

What version of Funnelback are you using if I may ask? Disallow rules set for the Funnelback bot seems to be valid as I can see.
However, would you mind providing the below information please?

a) collection.cfg contents
b) collection.cfg.start.urls contents
c) Funnelback version
d) Sample url(full) which got indexed against the robots directive.

If you cannot post the the content asked on a) and b) due to security issues, it’s fine to have c) and d) , so I can check a little further.
Thanks.

Thanks for replying, We have a few different collections and I’m not 100% sure what the “main” collection for the site is.

  • WSS Search Local
  • WSS Agent Desktop
  • WSS Agent Desktop Local
  • WSS Search Meta

I’m just new to the team and wasn’t around when this was setup.

Funnelback Version: 15.18.0

Sorry, I can’t find collection.cfg.start.urls anywhere

Example URLS:

Collection.cfg from WSS Search Local:

admin_email=team.intranet@fife.gov.uk
analytics.scheduled_database_update=false
auto-completion.source=csv,internal
changeover_percent=10
collection=$COLLECTION_NAME
collection_group=Verint Web Self Service
collection_type=local
data_report=false
data_root=$SEARCH_HOME/data/$COLLECTION_NAME/offline/data/
filter=false
gather=false
group.project_id=Prod Verint WSS
indexer_options=-ifb -facet_item_sepchars=|, -utf8input -forcexml
post_update_command=$SEARCH_HOME/conf/@workflow/post_update_operations.sh $COLLECTION_NAME
pre_index_command=$SEARCH_HOME/conf/@workflow/pre_gather_operations.sh wss-search-local;$SEARCH_HOME/conf/$COLLECTION_NAME/@workflow/connect_to_api.sh $COLLECTION_NAME
query_processor_options=-stem=2 -MBL=1000 -rmc_sensitive=1 -SM=meta -SF=[c,v,d,j,q,r,s,id,t,tags,personas,personasid,featured,count]
recommender=true
service_name=WSS Search Local
ui.modern.form.auto-completion-master.content_type=application/json
ui.modern.form.concierge-completion.content_type=application/json
workflow.configuration=-local
workflow.data_type=matrix
workflow.matrix_host=https://www.fife.gov.uk
workflow.meta_collection=wss-search-meta

Collection.cfg from WSS search Meta:

admin.undeletable=true
admin_email=steven.gardner-crm@fife.gov.uk
auto-completion.alpha=0.5
collection=$COLLECTION_NAME
collection_group=Verint Web Self Service
collection_type=meta
data_report=false
datasource=false
faceted_navigation.date.sort_mode=ddate
gather=false
group.project_id=Prod Verint WSS
indexer_options=-ifb
query_processor_options=-stem=2 -MBL=1000 -rmc_sensitive=1 -cool.4=0.80 -SM=meta -SF=[c,t,d,id,tags,tagstitles,featured,featureddate,rating,disclaimer,count,thumbnail,category,subcategory]
recommender=true
service_name=WSS Search Meta
ui.modern.form.generate-qccsv.content_type=text/csv
ui.modern.form.overlay-completion.content_type=application/json
ui.modern.form.search-overlay.content_type=application/json
ui.modern.form.top-clicks-json.content_type=application/json
ui.modern.full_facets_list=true
ui.modern.news_link=news
ui.modern.search_link=search
ui.modern.session=true

Thanks

Hi,

First collection is the one you should look for. Both collections won’t have start url’s since one is a local collection and the other is a meta collection.
So the local collection doesn’t gather(see gather=false). Hence it won’t look at the robots.txt at all.
You’d need to either add exclude pattern or a kill file. Kill Partial - Funnelback Documentation - Version 15.18.0