How do I remove content from Funnelback that doesn't have a url?

Hi

 

We have noticed that there are a number of assets in our funnelback collections that don't have a valid url.

 

If we put the page under construction, a trigger would remove the respective url from Funnel back, however, as these haven't got a url they can't be removed.

 

We need to remove the completely and then re-add them with a correct url.

 

Can anyone help?

I moved this to the FB forum as I believe this is something that has to happen in the FB admin side. 

Hi Cromers -

 

If you're working with a web collection in a single-server configuration, you'd be best starting with the instant remove / add URL tools accessible in the Funnelback Admin UI:

 

http://docs.funnelback.com/updating_collections.html#Update%20Types

 

To examine the scope of the problem, you might want to take a look through your collection's stored.log file:

 

Admin UI > Administer > Browse Log Files > Live View > stored.log

 

http://docs.funnelback.com/log_viewer.html

 

Finally, if there are patterns to the undesired URLs that are being included (e..g */?a=*), you might want to consider using a regular expression match for exclude patterns:

 

http://docs.funnelback.com/include_and_exclude_patterns.html#Example

 

You've mentioned 'invalid URLs' - if these documents are included in a Funnelback collection, they would have a URL - what is the the URL of these documents as determined by Funnelback?

 

Are there canonical URLs being defined in the page or server-side redirects that Funnelback might be interpreting?

 

http://docs.funnelback.com/webcrawler.html#Redirects

Thanks for all the info.

 

When a business listings is made live it gets pushed in to the respective push-collection with the following url - domain/business/business-listings/businesses/page-name

 

We imported a large number of business listings from our old site, but found out afterwards that there was a duplicate.

This duplicate didn't get a web path, but was made live and pushed to the collection.

 

This appears in the results screen as domain/business/business-listings/businesses/ so when a user selects them it returns a blank page.

 

We have since fixed the web path of this asset, but if we put the page Under Construction, it isn't deleted from the push-collection because it cannot find the respective url and therefore the web path doesn't get updated when the asset is made live!

Hi Cromers -

 

If you're using a Push collection (I'm assuming Funnelback v13.2), the process is slightly different.  Exclude patterns, redirects.txt and stored.log won't be of any use to you in that scenario.

 

This duplicate didn't get a web path, but was made live and pushed to the collection.

 

 

If you know what the URL for that duplicate is within the Funnelback Push collection (irrespective of whether there's any valid content at the actual URL), you should be able to remove it via an ad-hoc update to the Push collection in the Funnelback Admin UI:

  1. Update > Start Advanced Update
  2. Push > Delete Content > URL: {Undesirable, Duplicate URL} > Delete
  3. Observe status message
  4. Browser Back Button
  5. Push > Commit > Commit Pending Changes
  6. Observe status message
  7. Reload Public UI, searching for that particular URL's title / path (it should no longer exist)