Does anyone know if there is a minimum number of characters that have to be indexed in order for the automatic duplicate system to work?
Or does it work on a percentage of similar content?
I have these pages:
http://www.pmlive.com/pmhub/healthcare_advertising/taylor_james
http://www.pmlive.com/pmhub/healthcare_digital_communications/taylor_james
Which have almost everything noindex apart from a couple of small text areas - which are identical bar the different url (which appears in 2 or 3 places on the page).
I have similar pages with a lot more content and the duplication system works fine - only one url shows. EG:
http://www.pmlive.com/pmhub/healthcare_advertising/HAVAS_LYNX
http://www.pmlive.com/pmhub/healthcare_creative_and_design/HAVAS_LYNX
But the pages with a lot less content I can't get the duplication to work.
I can't find anything on this in the manuals so I'm guessing either:
1. There just isn't enough indexable content to trigger the duplicate removal
2. The different urls in the indexable content means there's a higher percentage of difference between the pages
IE: The pages with less indexable content are 95% similar (as 5 out of every 100 words are different) - the pages with more content are 99% similar (as 1 out of every 100 words are different)
So the different urls is affecting the pages with smaller indexable content disproportionally. And the duplicate removal process does not disregard the current url where it appears on the page (which I kind of thought it would)
Any help on how this works greatly appreciated as usual.
Karl
PS - Is there a way to see exactly what has been indexed for a specific url?