Removing stop words (-ras=2) not having any effect

If I do a search for “accounting and finance” I get no results. However, if I do a search for “accounting finance” (removed “and”) I get lots of results. I came across your config docs regarding removing stop words so I’m trying to set that up, however I’m not seeing any improvement for “accounting and finance” .. it’s just the same as before. Is there anything else I need to configure?

Here is my collection.cfg

query_processor_options=-stem=2 -fmo=true -SM=both -SF=[a,c,d,9,Z,A,B,S,M,D,T,C,L,E,l,U,V,X,j,k,m] -MBL=1000 -SHLM=1 -ras=2

Hi Bitz,

Could I check if you are using the quotation marks (" ") in your query?

Something to note is that the removal of stop words does not kick in for queries using the phrase operator.

e.g. json result for the query for query and operators on the Funnelback documentation website:

json result for query for "query and operators" on the same collection

Hope this helps.

Thanks,

~Gioan

Yeh that seems to be the issue:

"query": "",
"queryAsProcessed": "[accounting \"accounting finance\" \"accounting and mathematics statistics\" \"accounting business law\" \"accounting business enterprise\"] [andersonian \"and tourism management\" \"and creative writing french\" \"and creative writing\"] [finance finance finance finance finance]",
"queryRaw": "",
"querySystemRaw": "[\"accounting\" \"accounting finance\" \"accounting and mathematics statistics\" \"accounting business law\" \"accounting business enterprise\" ] [\"and\" \"andersonian\" \"and tourism management\" \"and creative writing french\" \"and creative writing\" ] [\"finance\" \"finance\" \"finance\" \"finance\" \"finance\" ]",

Because I’m using a script similar to this:

But when I remove this script the ras option works and the stop word (“and”) is removed. Then I see results:

"query": "accounting and finance",
"queryAsProcessed": "accounting finance",
"queryRaw": "accounting and finance",
"querySystemRaw": null,
"queryCleaned": "accounting and finance",

But now I’ve lost allowing partial queries :frowning:

Within the script above, before it fetches suggestions for the query, to strip out stop words at that stage - is this a good idea? If so, is there a means of fetching my list of stop words for this collection within the script, rather than hard coding them in, and then I can strip them out before proceeding with the rest of the script fetch suggestions?

Otherwise, do you have any suggestions how I can continue to allow partials queries, but also remove stop words?

In the meantime, I’ve just added to the script as we need to not have our results limited by stop words (e.g. “accounting and finance” ) AND allow partial queries (e.g. “chem” → “checmistry”):

...
def terms = q.query.tokenize(" ");

// something of a hack as we need to remove stop words from the query 
// If there is a better way to fetch this list from the environment, please do.
// See: https://docs.funnelback.com/customise/advanced-options/stop-words.html
def stopWords = ["a", "a's", "able" ... "yourselves", "z", "zero"]
terms.each {
    def term = it
    if (stopWords.indexOf(term) > -1) {
        terms -= term
    }
}

...
terms.each {

Ideally, I’d rather check that this query processor option is set to 2, and if so fetch stop words rather than have them hard coded like this.

It is a bit hack-ish, I know, but it seems to work for this collection, for now. Still, open to better suggestions…