Map metadata to data attribute

Can a metadata field map to an html data attribute so the attribute contents is searchable? For example:

The short answer is no.

The longer answer is that you could write a jsoup filter that extracts out the data attribute and set it as a metadata field.

See the jsoup filter example here:

In practice your jsoup filter will probably only be a few lines of code - something like:

package org.example

import com.funnelback.common.filter.jsoup.*

/**
 * Extracts some metadata from documents
 */

@groovy.util.logging.Log4j2
public class ExtractMetadata implements IJSoupFilter {

   @Override
   void processDocument(FilterContext context) {
    def doc = context.getDocument()
    def url = doc.baseUri()

    try {
      // Extract meta tags that have a data-attribute attribute and write the value as a metadata field custom.field
      
      doc.select("meta[data-attribute]").each() { meta ->
        context.additionalMetadata.put("custom.field", meta.attr("data-attribute"))
        log.debug("Added custom.field '{}' for '{}'", meta.attr("data-attribute"), url)
      }

    } catch (e) {
      log.error("Error scraping metadata from '{}'", url, e)
    }
  }
}

You would then save this to a folder in your collection called @groovy/org/example as ExtractMetadata.groovy and you would call it from your update by adding org.example.ExtractMetadata to your filter.jsoup.classes.

You would then need to run a full update (from advanced update) in order to run the filter over all your pages then add a metadata mapping to map custom.field to some metadata class in Funnelback.

2 Likes

Wow, thanks for sharing this. Funnelback never ceases to amaze me.