Thanks for your reply. We used the Google Search Appliance at NYU across our main www.nyu.edu (my property) and several subdomains (e.g., tisch.nyu.edu, stern.nyu.edu etc.) in both ways you described:
- headed/themed by GSA;
- headless, with the AEM CMS querying the XML data and parsing it to display within the CMS-powered pages.
We replicated both methods when we moved to Funnelback this past weekend.
We will be reviewing the headless approach in the coming months (with the assumption that it may be more advantageous to move to the headed/themed approach be take advantage of the Funnelback features in a more agile way.
We have experienced some interesting crawl results in which the AEM CMS stores pages in a structure like SITE.nyu.edu/content/english/departments/physics/ but re-writes those for the public to view as SITE.nyu.edu/biology/ .. and some nuance how Funnelback crawls those, adds them to its index, and then displays those URLs in the search results.
I don't know how many other CMS's have a similar "internal" vs. "published" URL re-mapping, but was certainly curious to see if other AEM CMS customers had figure out the best way to approach this.
We are still engaged with Funnelback, so no urgency. Just curious to hear from others using AEM on what they experienced, what they learned, and what they concluded.
--Jim @ NYU