Filtering spam search queries

Lately the amount of spam/robot submissions to our search has largely increased. By spam submission I mean -  http://likespromotion.com/, https://www.youtube.com/watch?v=... , http://www.clashofclanshackforfree.com …  It doesn’t give us big load yet but still creates a lot of noise in our reports and logs.

 

I am already using reporting-blacklist.cfg but unfortunately it won’t work in this case. I would like to exclude all queries starting with http(s)*:// and blacklist doesn’t support regex to my knowledge.

 

Does anyone know a way we could filter those robot/spam submissions ideally before processing queries or at least at the reporting time?

 

Thanks, Vitali

I'd start by focusing on preventing those queries ever getting to Funnelback - input patterns in HTML5 might be useful, but there's no guarantee that the spam bot would respect them.

 

If you're using the Modern UI, pre-process hook scripts would be worth investigating - a query can be examined and, prior to processing,  be compared against undesirable patterns.  You may want to then set the query to empty, or generate a state that would be similar to the initial search from.

 

See the example at: http://docs.funnelback.com/user_interface_hook_scripts.html#Processing%20additional%20input%20parameters

 

Alternatively, you may want to configure your initial forms to submit a benign URL key/value pair  that is generated by user agents that support client-side scripting - the presence of this value in a form submitted to Funnelback can also be examined as a pre-process hook, with termination occurring in processing if the value is absent.

 

IP address filtering in the reporting blacklist might also be useful - assuming these spam bots are coming from a fixed set of IP addresses.

Thanks Gordon,

Unfortunately in our environment we don’t have all control to web pages and where search form is used. I will need to look into pre-process hooks than.  

I filter some of them via IP but the list keeps growing.  

Vitali