I don’t have a lot of experience with Funnelback, so I’m unsure if what I want to do is possible.
I have a large file that contains a lot of structured data (currently in JSON but can be converted to CSV/XML etc) and would like to be able to both index/search the information in this file as well as display the result “page” with Funnelback. (ie. Rather than sending the user to a static webpage with the information, use Funnelback to display that information.)
An example might help explain this better. Say this file contains information about mammals, and for each mammal entry (in this file) there’s the name of the mammal, a description, details on lifespan, habitat, diet and other information.
Ideally, I’d do a search for bear and the results would show me the name and description (from the file) for everything that matches. (Normal Funnelback behaviour.) However, when I click the result for Polar bear, Funnelback acts as the result page and shows me all the other information it has indexed from the file. If this were possible, I’d assume we’d pass new URL parameters to Funnelback to use a custom template that only displays only the specified result (or something along those lines).
Is this possible? There are almost 5000 entries in this file, so it’s not feasible to create static pages for each.
If you could break out each entry into its own XML object you could set up an XML Push collection, Funnelback would treat each XML entry as its own ‘page’ or result. You might need two endpoints one for the search and one for the ‘other information’, I m not sure I understand what it is you mean on that part.
For the first part, the ~5000 records could be kept as-is in JSON and indexed into separate documents using the built-in processes in Funnelback.
The first part of doing that would be having Funnelback convert the JSON to XML, and then splitting the XML into 5000 different documents. You don’t need to convert the JSON to XML yourself (and many people prefer working with JSON).
Then, you wanted to display static pages with the information from each individual record. For example, the static page could have a <h1>Polar Bear</h1> which comes from some XML (JSON) field like <mammalName>Polar Bear</mammalName>.
This could be done by using the cache copy of the document (document meaning the specific Polar Bear record) and XSLT processing.
Hey @dmikulis, I have used your guidance and have split the xml. I want to create the static pages using the XSLT processing. So that each entry can render as its own html page. I’ve created a template.xsl in the Profile: _default (preview) section of the config files. Can you give me an example of what is in the template.xsl I cannot seem to find any guidance on this.
Here’s the relevant documentation section for the next portion of this that you’re interested in now.
This part would require some knowledge of XSLT processing and the syntax of the template, and admittedly, I’m not well versed in this area. For the most part, it would be using the “value-of” XSL element to template the values into an HTML output.
Going off the Polar Bear example from earlier, this would be a simple example of displaying the name as a h1 HTML element (syntax may not be 100% correct).
One thing to note is that the XSLT processing will happen at the cache URL, so if user’s are meant to visit those pages directly from the search results you’ll want to change the URL to the cache URL.
I should also mention that this isn’t the only option. It is also possible to do the XSLT to template the XML into HTML from within your CMS/server rather than Funnelback’s. Essentially, the search result would link to a dynamic page where your CMS would fetch the XML from Funnelback’s cache URL and then template it into HTML for the user. The advantage to this method is that you can control what the final URL is that is displayed in the user’s browser (whereas doing it on the Funnelback side will always show the cache URL to the user), and for this reason we have had clients that choose this route. A disadvantage could be if your CMS/server isn’t set up to be able to do XSLT natively or if it’s not well-documented, doing it on the Funnelback side may be easier.
Hopefully this gives you enough to chew on for now, but please don’t hesitate to come back with more questions.
I feel at this point I’m just running in circles. I have completed the following as per the documentation that you supplied.
Added a template.xsl under the Profile: _default (live) section of the configuration files
Changed the simple.ftl file to use the ${s.result.cacheUrl} link from the xml record
Selected in the XML processing controls for an xml attribute of the record that is being split to be assigned as the url
Indexed relevant metdata fields so that they appear in the search
What is happening
The records are getting split correctly
The url is getting assigned to the record correctly
It is using the s/cache/ link for the individual records
When I click on the URL however I get the following error _'Could not access the requested cached document. _ This document may have been removed, or the link to this page may be broken, or your security settings may prevent access to cached documents.’
Also I cannot find any examples or anything at all on XSLT mapping, do I use the metadata name as the link, or the xml file path as the reference to the element that I want to display?
I believe what is happening is there is one JSON file with multiple records that is getting converted to a XML file with multiple records. The document content is stored into the cache before being split during indexing. The indexer currently sees 1 XML document and splits it according to the configuration set in the “XML Processing” section of the Admin Dashboard.
To confirm this, you may be able to find the (full) XML document in the cache link of the original URL (not the individual URLs for each record).
It looks like you may have to split the XML prior to the indexer, so that it is saved in the cache as individual documents instead of 1 big document (the content is cached before it is indexed).
We have already written a filter for this purpose in our public Github repository, filter and README. This filter should be placed after the JSONToXML filter in the chain. If it’s split correctly, each record from the original JSON will have it’s own document in the cache (instead of being part of one shared document).