One of our collections consistently returns only 11 results with the FS field in XML,
for example: <FS NAME="date" VALUE="22 May 2019"/>
the next pages show results, but not in any order, due to the lack of a timestamp field.
An other of our collections shows this behavior as well, but it tends to have more results that do include an FS timestamp field. Further collections don’t seem to have this issue at all.
The returned XML was meant to replicate what our Google Search Appliance had been returning, so our existing results code could be used without much modification. It’s possible the field in question (FS) is not a standard returned field, or, perhaps there is a different field we should be ordering our results by?
Does anything come to mind as a possible cause? What would be a good way for us to diagnose the issue further?
The markup you’ve shown comes from a Freemarker template in Funnelback that mimicks Google-style XML. The timestamp is coming from the core attribute ‘date’ in the Funnelback response packet. To resolve as to why it is not always showing you will need to inspect how the data is being indexed:
Check the metadata mappings for that collection to see what is mapped to the ‘d’ field (which will become the date).
Check that metadata field in the raw data: is it always populated? Is it a valid date format?
The reason you sometimes do not see the field is because the date is either non-existent in the data, or it is in a format that is unsupported.
I’ll see if I can learn more about the mappings for ‘d’.
When you say raw data do you mean the XML Funnelback returns and our code parses to display the search results pages?
Only 10 results in this collection (for this search term, it’s a different number for different search terms) ever seem to have the FS field, the others lack it entirely … is this more indicative of a missing date or a bad date format?
I assume the correct format is what is returned in the results with the FS field? Here’s an example of 4 results, 2 of the first 10 with an FS field, and 2 of subsequent results, missing FS entirely. There seems to be a blank line in its place.
<R N="1">
<U>https://...</U>
<UE>https%3A%2F%2F...</UE>
<T>1 Spring 2020 Graduate Course Bulletinv1 ...</T>
<RK></RK>
<FS NAME="date" VALUE="11 Oct 2019" />
<S>22 Salute Ceremony (tentative). 26 MA Final Projects course begins.</S>
<LANG>en</LANG>
<HAS>
<L />
<C SZ="509k" CID="968" ENC="ISO-8859-1"/>
</HAS>
</R>
<R N="2">
<U>https://...</U>
<UE>https%3A%2F%2F...</UE>
<T>Theatre Ticket Purchases Forms</T>
<RK></RK>
<FS NAME="date" VALUE="3 Oct 2019" />
<S>Text1: PLEASE NOTE: extra tickets purchased should not be given away to faculty or staff. Unused tickets must be attached to the Advance Close Out Form upon submission to the Budget Office.</S>
<LANG>en</LANG>
<HAS>
<L />
<C SZ="310k" CID="4,638" ENC="ISO-8859-1"/>
</HAS>
</R>
<R N="11">
<U>https://...</U>
<UE>https%3A%2F%2F...</UE>
<T>Salute 2019</T>
<RK></RK>
<S>Salute is an affectionate farewell tribute from the faculty and staff of the School of the Arts to the graduating class that celebrates students’ achievements through live performances and features words of wisdom from our dean, a</S>
<LANG>en</LANG>
<HAS>
<L />
<C SZ="57k" CID="1,970" ENC="ISO-8859-1"/>
</HAS>
</R>
<R N="12">
<U>https://...</U>
<UE>https%3A%2F%2F...</UE>
<T>Salute 2017</T>
<RK></RK>
<S>Salute is an affectionate farewell tribute from the faculty and staff of the School of the Arts to the graduating class.</S>
<LANG>en</LANG>
<HAS>
<L />
<C SZ="57k" CID="3,955" ENC="ISO-8859-1"/>
</HAS>
</R>
When I say raw data I mean the data Funnelback is ingesting, presumably from one of your data sources, to create the search index in the first instance. During that process it will be trying to establish a viable date for each document, based on your settings within Funnelback. You’ll need to access the administration interface to find this information.