I am finding that when I do a search within a collection with no facets or filters, I am getting results that in no way match the search query.
eg. Search “Cat” and amongst the results get pages that have no title, summary, content or metadata with the word “cat”.
I’m wondering if anyone can shed light on this. Does funnelback internally made associations between documents or add its own metadata? Is the scenario described part of core funnelback functionality or is it a product of a bug in a particular configuration? If it is core functionality, can it be turned off to get a more targeted result
Did you view the source of the document, are you sure it doesn’t contain the term cat?
From the search results view the cache copy and also verify that it does not contain the term cat.
You may also want to check the query that was actually run change search.html to search.json, and look under response → resultPacket → queryAsProcessed, perhaps the query for cat was transformed into something else. Also check response → resultPacket → qsups.
externally the document could have cat assigned to it because of anchor text or from recorded click logs.
Perhaps this will help you:
-weight_only_fields=< S > Documents will not be retrieved in DAAT mode if they only match unfielded query
terms in one or more of the implicit fields listed here. For example,
specifying ‘[K,k]’ will stop the query ‘Monica Lewinski’ matching a document
solely because of click data or referring anchortext.
you may also need to to set
Preformatted text-anniemode=0
see Padre Query Processor Options - Funnelback Documentation - Version 15.14.0
It’s most likely to be coming from anchor text (the words used on links that point to the page) as suggested by Luke. You might be able to see this by looking at the URL in content auditor and then viewing the link information (available from the introductory paragraph that displays the rank summary) and looking at the contents of the Anchor text column.
The the search index is based on XML it’s possible that it’s getting content assigned from other records (this has happened in the past) - you can get around this by adding a fake mapping for unfielded content to force Funnelback to only index the fields you’ve specified in the xml.cfg. To do this you add a line similar to
-,,,//fakefield
to your xml.cfg then reindex.