Add or change metadata value and use this as a facet value

I have a Web collection which parses an XML feed. The feed contains a text field from which I want to extract a year value. For example “September 2019” would extract “2019”. This would ideally be stored as a separate metadata value.

I have used host_post_datafetch.groovy to successfully extract the value but it isn’t available to the facet.

I read that the post_datafetch runs before the facet data is generated (Hook scripts - Funnelback Documentation - Version 15.18.0). I don’t see why the facet value isn’t affected by the hook script. Please can someone explain.

I do have a couple of workarounds for this problem but I’m interested to be able to manipulate metadata for faceting.

I’m using Funnelback 15.14.0 hosted in the cloud by Funnelback.

Hi,
I think you need to manipulate the data before the indexer sees it. You can do this by writing a filter to extract the year and then create a facet on that extracted year.

You could try to manipulate data on the query side but that will be rather complex.

Following the documentation, I think you’re referring to custom filters and the supporting page on writing them. There is aso an example string document filter using the example in the docs.

Thank you for pointing me in the right direction.

Will

Let me know if you run into any issues.

Note that if you write the year to the metadata of the document in filters you will need to tell the indexer to index that metadata. If you are writing to the metadata of the document in filters, for versions up to and including 15.20, you may want to set in indexer_options in collection.cfg to have -create_phrase_metadata_terms=offotherwise you may end up running into a recently discovered bug.

I’d like to revive this thread. The topic is the same but the method (hook scripts) is different.

I’m now trying to update the facet values using a hook script. The correct script appears to be hook_post_datafetch.groovy.

I have successfully changed the metadata available to the page but the changes don’t apply to the facet values. The hook scripts documentation says that the script runs before output processing and therefore should precedes the building of facetted navigation. Have I misunderstood?

I included this script in hook_post_datafetch.groovy:

transaction?.response?.resultPacket?.results.each() {
  if(it.metaData["category"] != null)
  {
     it.metaData["category"] = it.metaData["category"] + "|Research"
  }
}

Thanks,
Will

So a bunch of facet calculations are done inside of the query processor, this means that at query time you wont really have the opportunity to apply logic. The metadata you are editing is what the query processor has returned, so modifying it further wont change facet counts.

Would you mind telling me what you now want to achieve? Why you want to do it here at query time, maybe it is easier to write these hooks?

Hi Luke,

My aim is to create a consistent filter for content which does not have consistent metadata and which often doesn’t have expressive URLs. For example, I want to filter documents relating to ‘Research’ where this concept is exposed in my content as more than a dozen different metadata keys and URLS. The inconsistent URL patterns prevents me from using gscopes.

The way that I thought I’d solve this problem is by appending a consistent metadata key to the query response. The documentation says:

Post-datafetch: (hook_post_datafetch.groovy) This runs immediately after the response object is populated based on the raw XML return, but before other response elements are built. This is most commonly used to modify underlying data before the faceted navigation is built.

You can see why I’ve tried to use the Post-datafetch hook script. I have also used external metadata for some keys which don’t exist in content.

The way that I think you’re pointing is that I need to alter the index before the query runs. I’m not clear whether the script should run against the cached documents, the index is built, or as a step before the query is processed. It would be most efficient to alter the index or cached documents.

Next on my list to read is document filtering.

Thanks,
Will

You may be able to use the metadata normaliser filter Built-in filters: Metadata normaliser filter (MetadataNormaliser) - Funnelback Documentation - Version 15.24.0 to overcome your issue.

For facets document filtering is going to be the place.

How inconsistent is the metadata? If it is something like:
“chem research” → research
"phys research → research

(where those on the left are metadata mapped to metadata class foo and what is on the right is the facet you want to show)

You could construct a query based facet that is display “research” for documents that match the query:
[foo:“$++ chem research $++” foo:"$++ phys research $++]

I don’t know how your metadata is inconsistent making so it may instead be that you need to use the filter Peter mentioned or your own more complicated filter.

It will be interesting to see what your metadata looks like and how you solve your problem.