Configure search.json to query with partial queries? (e.g. "hist" queries as "history")

bizt 2019-01-09 14:38:05 UTC #1

If I query search.json with ?query=hist it gives me results based on that rather than treating it as a partial query. Is it possible to configure funnelback to treat this as a partial query (of "history") and fetch results based on what it assumes the complete query to be (e.g. "history") ? This is useful as I have a onkeyup text box that makes a call when the user stops typing, so users can be lazy and only type the start of a word (e.g. "hist") to see results for the complete work (e.g. "history")

How I'm currently doing this is first sending a AJAX call to suggest.json?partial_query=hist, then take the first suggestion (e.g. history) and then send a request to search.json?query=history to achieve that outcome. But, it makes for more complicated code and two HTTP requests instead of one. I hoping that there is a better way to do this.

I've had a look at the query processor options docs but couldn't find anything that allowed this, but I maybe missed something or there is something else.

https://docs.funnelback.com/more/extra/padre_query_processor_options.html

plevan 2019-01-09 21:59:11 UTC #2

Funnelback doesn't partially match queries like your example (query=hist). You may come across some references to truncation in the documentation which does this but only in a very old query processing mode - however I would strongly recommend you do not use this because it is basically deprecated and changes the way queries are processed - and is incompatible with most features that have been added to Funnelback over the last few years.

There is a workaround for this that does something very similar to what you've described by using suggest.json to expand your query. There is actually a worked example of how to do this in the Funnelback knowledgebase:

bizt 2019-01-10 10:18:19 UTC #3

Hi. Thanks for the great suggestion. This looks very useful, however I've tried to implement it and I'm not seeing any any difference. I've created the hook_pre_process.groovy file (using the drop-down selector so not to mistype the filename), I've published the new file, and then I've added partial_query_enabled=true and partial_query_expansion_index=3 to the collection.cfg file. I've even updated the collection just encase that was required.

Below is the parameters I'm using with the request:

/s/search.json?collection=uos-courses-xml&query=hist*&profile=defaultpreview

Notice the wildcard. I know that the first suggestion at least ought to be "history" which should populate the results array in the JSON but it's empty. I've tried setting even partial_query_expansion_index=1 to isolate that term, but still no joy.

Anything I might have missed? Any way to monitor that the hook script runs even (e.g. log files)?

Also, in that link you provided it states, "the expanded queries can be seen by viewing the JSON or XML output and looking at the query/queryAsProcessed/queryRaw/querySystemRaw/queryCleaned values from the response packet.", however, I'm only seeing the following:

"response": {
	"resultPacket": {
		"details": {
			"padreVersion": "FUNNELBACK_PADRE_15.12.0.4 MDPLFS (Web/Ent) [64 bit]",
			"collectionSize": ...,
			"collectionUpdated": 1547115063000
		},
		"query": "hist",
		"queryAsProcessed": "hist",
		"queryRaw": "hist*",
		"querySystemRaw": null,
		"queryCleaned": "hist",
		"collection": "uos-courses-xml",
		"resultsSummary": {
			"fullyMatching": 0,
			"collapsed": 0,
			"partiallyMatching": 0,
			"totalMatching": 0,
			"estimatedCounts": false,
			"carriedOverFtd": null,
			"totalDistinctMatchingUrls": null,
			"numRanks": 10,
			"currStart": 0,
			"currEnd": 0,
			"prevStart": null,
			"nextStart": null,
			"totalSecurityObscuredUrls": null,
			"anyUrlsPromoted": false,
			"resultDiversificationApplied": false
		},
		"spell": {
			"url": "query=list&collection=uos-courses-xml&profile=_default_preview",
			"text": "list"
		},
		"bestBets": [],
		"results": [],
		...

plevan 2019-01-10 21:10:29 UTC #4

You'll see something like queryAsProcessed: [hist history historical] if it's working, assuming you get suggestions when you use suggest.json as the hook script uses the same underlying mechanism that suggest.json uses.

What version of Funnelback are you running - there is a note in the code that a couple of line will need changing if you are running an earlier version than 14.2.

I would try checking the modern ui logs for the collection. If the hook script is generating an error the messages will get written here.

Depending on the version the modern ui logs could be located at:

the collection's main log folder (under collection-logs in the log viewer when you view the collection's logs)
the global log folder (under web logs when you select system logs from the top menu in the admin interface) - it's possible you may not have access to this depending on your hosting / user permissions.

bizt 2019-01-11 10:02:37 UTC #5

Hi.

We're using Funnelback 15.12.0

I can't see anything in the modernui.public.log of modernui.admin.log files. The modernui.public.log file seems to log entries certainly when I mistype the URL. There is a ton of stuff in the modernui.admin.log that I'm not so familiar with but doing a search here even just for "hook" doesn't match anything. Not really sure what I should be looking for here though.

Something I noticed that seems a little strange is that the "Publish" button to the right of the hook_pre_process.groovy file remains even when I publish and confirm. I'm guessing this is normal? This is also the case for xml.cfg etc, these files also have a publish button that just doesn't go away, but I know for sure that changes to these files are effective so maybe it's not an issue. See screenshot - http://rtyn.biz/staticfiles/fbfiles.png

Is there anything in the Funnelback installation that have to be enabled to run hook scripts?

Anyway, here's also a link to a screenshot of the hook_pre_process.groovy open for editing encase I've missed something when pasting it in there - http://rtyn.biz/staticfiles/groovy.png

plevan 2019-01-14 21:09:24 UTC #6

Hi Martyn,

Hook scripts will run as soon as they are created and saved.

I just had a look at your screenshot and the first line of your hook script should be deleted (the line with hook_pre_process.groovy). The first line of the file should be the line starting with def.

I suspect what's happening is that the hook script is failing to compile because of that error and it's being caught in the system level logging which you may not be able to see.

Try deleting that line and saving the file and hopefully things may start to work.

re. the publish button - that will always be displayed for collection level configuration files as there is no way currently for Funnelback to know if the file has been published to any remote server (if you're running in a multi-server cluster).

For profile level configuration a comparison can be made between the preview and live versions of the file.

regards,
Peter

P.S. if you need quick replies to these issues I'd suggest raising a support request via your support channel.

bizt 2019-01-15 10:04:45 UTC #7

Hi,

I've deleted that line, re-published the hook script but it just has the same results:

https://fback-dsp.is.strath.ac.uk/s/search.json?collection=uos-courses-xml&query=hist*&profile=_default_preview

"query": "hist",
"queryAsProcessed": "hist",
"queryRaw": "hist*",
"querySystemRaw": null,
"queryCleaned": "hist",

I've even tried updating the collection but same.

Is it possible for me to write to a log file I have access to? e.g. https://stackoverflow.com/questions/15981182/logging-in-groovy-script. Then I could maybe do something like... ?

Logger logger = Logger.getLogger("")
logger.info (q.collection.configuration.value(["partial_query_enabled"]))

By the way, I checked the documentation and it seems the first line error is there too... might wanna update that page

gtran 2019-01-15 10:13:17 UTC #8

Hi Bizt,

Would you be able to check the log files again for anything that is suspicious (the collection specific one in particular). The locations of the log files can be found below:

$SEARCH_HOME/web/logs/modernui.[Public/Admin].log.
$SEARCH_HOME/data/<collection>/log/modernui.[Public/Admin].log (Linux)
$SEARCH_HOME/web/logs/modernui.<collection>.[Public/Admin].log (Windows)

I suspect that there could be some other error which is causing the hook script to fail.

In regards to custom logging, you can write to the log files by doing the following in your hook scripts:

https://docs.funnelback.com/15.18/develop/programming-options/hook-scripts.html#message-logging

def logger = org.apache.logging.log4j.LogManager.getLogger("com.funnelback.MyHookScript")
...
logger.info("The query is: " + transaction.question.query);
...
logger.fatal("No results were found")

Hope this helps.

~Gioan

gtran 2019-01-15 10:22:05 UTC #9

Hi Bizt,

I just had a look at a working version of this implementation for a client in Melbourne which was done only a few weeks back. Would you be able to compare your hook_pre_process.groovy to the following:

// imports required for access to the padre suggest service
import java.io.File;
import java.util.List;
import com.funnelback.common.config.DefaultValues;
import com.funnelback.dataapi.connector.padre.PadreConnector;
import com.funnelback.dataapi.connector.padre.suggest.Suggestion;
import com.funnelback.dataapi.connector.padre.suggest.Suggestion.ActionType;
import com.funnelback.dataapi.connector.padre.suggest.Suggestion.DisplayType;

if(transaction.question.inputParameterMap['profile'].equals("auto-completion") || 
  transaction.question.inputParameterMap['profile'].equals("auto-completion_preview") ||
  transaction.question.inputParameterMap['profile'].equals("clients") ||
  transaction.question.inputParameterMap['profile'].equals("clients_preview") ||
  transaction.question.inputParameterMap['profile'].equals("providers") ||
  transaction.question.inputParameterMap['profile'].equals("providers_preview")) {
  def q = transaction.question
  q.form = "qc"

  // convert a partial query into a set of query terms
  // maximum number of query terms to expand partial query to - read from collection.cfg partial_query_expansion_index parameter
  // eg. partial_query=com might expand to query=[commerce commercial common computing]
  def partial_query_expansion_index = 5
  if ((q.collection.configuration.value(["partial_query_expansion_index"]) != null) && (q.collection.configuration.value(["partial_query_expansion_index"]).isInteger())) {
    partial_query_expansion_index = q.collection.configuration.value(["partial_query_expansion_index"])
  }

  def profile = "_default"
  if (q.inputParameterMap["profile"] != null) {
    profile = q.inputParameterMap["profile"]
  }

  if (q.inputParameterMap["partial_query"] != null) {
    File searchHome = new File("/opt/funnelback")
    File indexStem = new File(q.collection.configuration.value(["collection_root"]) + File.separator + "live" + File.separator + "idx","index")

    // NOTE: CONSTRUCTOR HAS CHANGED post v14.2 and requires searchHome as the first param
    List<Suggestion> suggestions = new PadreConnector(searchHome,indexStem)
    .suggest(q.inputParameterMap["partial_query"])
    .suggestionCount(partial_query_expansion_index)
    .fetch();

    def expanded_query = ""

    suggestions.each {
      expanded_query += '"'+it.key+'" '
    }

    // set the number of suggestions to the value of the configured auto-completion.show    
    if (q.inputParameterMap["show"] != null ) {
      q.additionalParameters["num_ranks"] = [q.inputParameterMap["show"]]
    }

    // set the query to the expanded set of query terms ORed together 
    if (expanded_query != "") {
     q.query = "["+expanded_query+"]"
    }
  }
}

bizt 2019-01-16 15:20:56 UTC #10

Hi. Thanks for that. First thing I noticed form your script is that I didn't have any import statements. I wasn't seeing any errors in the logs from the Admin UI but when I found the correct log file on the server - $SEARCH_HOME/data/<collection>/log/modernui.Public.log - it made life much easier. And yeh it was telling me class names were missing. I did wonder about that but I've never worked with groovy scripts before so wasn't sure. Might wanna update the docs page "Adding limited wildcard support to DAAT mode" with those if it helps others.

Other things I changed to get the script to work are:

// Ensure only partial_query_enabled=true passes this condition
if (q.collection.configuration.value(["partial_query_enabled"]) == "true") {

// Needs to be an int otherwise .suggestionCount(partial_query_expansion_index) with throw an exception
partial_query_expansion_index = Integer.parseInt(q.collection.configuration.value(["partial_query_expansion_index"]));

Once I made those changes it seems to work and I get results now for "hist*". Below is the full script if that makes more sense:

// imports required for access to the padre suggest service
import java.io.File;
import java.util.List;
import com.funnelback.common.config.DefaultValues;
import com.funnelback.dataapi.connector.padre.PadreConnector;
import com.funnelback.dataapi.connector.padre.suggest.Suggestion;
import com.funnelback.dataapi.connector.padre.suggest.Suggestion.ActionType;
import com.funnelback.dataapi.connector.padre.suggest.Suggestion.DisplayType;
import com.funnelback.common.Environment;

def logger = org.apache.logging.log4j.LogManager.getLogger("com.funnelback.MyHookScript")

def q = transaction.question
if (q.collection.configuration.value(["partial_query_enabled"]) == "true") {
    
    // Convert a partial query into a set of query terms
    // Maximum number of query terms to expand partial query to - read from collection.cfg partial_query_expansion_index parameter.
    // eg. partial_query=com might expand to query=[commerce commercial common computing]
    def partial_query_expansion_index = 5
    if ((q.collection.configuration.value(["partial_query_expansion_index"]) != null) && (q.collection.configuration.value(["partial_query_expansion_index"]).isInteger())) {
      partial_query_expansion_index = Integer.parseInt(q.collection.configuration.value(["partial_query_expansion_index"]));
    }
    if (q.query != null) {
        // explode the query and expand each item that ends with a *
        def terms = q.query.tokenize(" ");

        terms.each {
            def term = it
            if (term ==~ /\w+\*$/) {
                //remove term from q.query
                terms -= term
                def termclean = term.replaceAll(~/\*$/,"")
                // Read $SEARCH_HOME
                def sH = Environment.getValidSearchHome().getCanonicalPath();
                File searchHome = new File(sH)
//              File searchHome = new File("/opt/funnelback")
                File indexStem = new File(q.collection.configuration.value(["collection_root"]) + File.separator + "live" + File.separator + "idx","index")
                // NOTE: CONSTRUCTOR HAS CHANGED post v14.2 and requires searchHome as the first param
                List<Suggestion> suggestions = new PadreConnector(searchHome,indexStem)
                  .suggest(termclean)
                  .suggestionCount(partial_query_expansion_index)
                  .fetch();
                // build the expanded query from the list of suggestions
                def expanded_query = ''
                suggestions.each {
                    expanded_query += '"'+it.key+'" '
                }
                // set the query to the expanded set of query terms ORed together
                if (expanded_query != "") {
                    if (q.rawInputParameters["s"] == null) {
                    q.rawInputParameters["s"] = ["["+expanded_query+"]"]
                    }
                    else {
                    q.rawInputParameters["s"][0] += " ["+expanded_query+"]"
                    }
                }/**/
            }
        }
        // reconstruct query.
        q.query = terms.join(" ");
    }
}

Anyway, thanks for your time and help. I should have enough to work with and make changes by myself. Looks better than what we had before.

plevan 2019-01-16 21:49:16 UTC #11

Hi Martyn,

I'm glad the problem is solved. I'm not sure what went wrong when the KB article was published but I've added the missing section from the head of the file and republished the article.

I'm surprised about your comment with the exception and need for parseInt - that doesn't seem to happen on our server and the loose typing is one of the things that groovy handles, but I've added your changes to avoid this in future.

regards,
Peter

aw282 2020-12-18 12:01:43 UTC #12

Hi there

Just trying to get this working from the kb article. First time using groovy scripts, but getting errors when importing the libraries. Do I need to specify anywhere else in the collection to be able to use these libs, or is our server misconfigured? Error below.

Error while running 'pre_process' hook for collection 'uofsa-web-sc-photo' on a search of type 'SEARCH' groovy.lang.GroovyRuntimeException: Could not find matching constructor for: com.funnelback.dataapi.connector.padre.PadreConnector(java.io.File, java.io.File, java.lang.String)

Edit - got it working. i think the comments on the groovy script in the kb might be incorrect. We are on 15.16.0 (45801 #143) and the PareConnector constructor only needs two params.