We just went live at NYU.edu (searching and powering many of our subdomains). We use Adobe’s AEM web CMS for our main www.nyu.edu website and several of our subdomain sites. Is anyone else using Adobe AEM and Funnelback? I’d be curious to trade some notes here in the forum or offline.
I am aware of a number of people using Adobe AEM with Funnelback.
Unfortunately I know nothing of the CMS side implementation for these - however I can shed a little light on the Funnelback side implementation.
Some of them just integrate in the same way as for any generic website - a search box on the AEM site calling Funnelback directly - with Funnelback returning a fully formed and templated html response. A slight variation on this has Funnelback remotely including headers and footers (and nav) from the AEM site from a nominated page that contains html comments indicating the start and end of each section. This sort of integration has no dependency on the CMS platform - as far as Funnelback is concerned it’s just a html site.
I believe we also have some more closely integrated searches where AEM passes on the user’s search query to Funnelback and it responds with a custom JSON packet containing the results. These are then processed and displayed by AEM. This has the advantage of being part of the website, but does mean the search results suffer a performance hit (as you get AEM’s processing and render time on top of the time Funnelback takes to return the results). It also has the disadvantage that the AEM developer needs to implement all the front end functionality that is accessible via macros within the Freemarker templates in Funnelback, so aside from the dev work to build the initial search there is also a dev overhead if future functionality is required to be added.
Plevan,
Thanks for your reply. We used the Google Search Appliance at NYU across our main www.nyu.edu (my property) and several subdomains (e.g., tisch.nyu.edu, stern.nyu.edu etc.) in both ways you described:
- headed/themed by GSA;
- headless, with the AEM CMS querying the XML data and parsing it to display within the CMS-powered pages.
We replicated both methods when we moved to Funnelback this past weekend.
We will be reviewing the headless approach in the coming months (with the assumption that it may be more advantageous to move to the headed/themed approach be take advantage of the Funnelback features in a more agile way.
We have experienced some interesting crawl results in which the AEM CMS stores pages in a structure like SITE.nyu.edu/content/english/departments/physics/ but re-writes those for the public to view as SITE.nyu.edu/biology/ .. and some nuance how Funnelback crawls those, adds them to its index, and then displays those URLs in the search results.
I don’t know how many other CMS’s have a similar “internal” vs. “published” URL re-mapping, but was certainly curious to see if other AEM CMS customers had figure out the best way to approach this.
We are still engaged with Funnelback, so no urgency. Just curious to hear from others using AEM on what they experienced, what they learned, and what they concluded.
–Jim @ NYU
Hi Jim,
I’ve certainly seen URL variants used in other systems.
The best way you can control the URLs to get the CMS to write a canonical URL metadata field that specifies the canonical URL for the page being gathered. This way the URL that’s specified in the metadata will be used as the URL for the page regardless of the URL on which the page was fetched. This also should mean any crawler that indexes your site (eg. public Google etc) will display that URL in their search results.
regards,
Peter