The best approach here is to look at the text of the cached version of the PDF document.
When a PDF is indexed Funnelback uses Tika to extract text from the PDF and whatever text is returned is what is indexed.
I suspect that this is just stripping out any formatting (or it's possible that your pdf doesn't really capture the formatting because there is often a disconnect between what you see and what's in the PDF itself). For example old PDFs have no concept of paragraphs and even things like bolding could cause duplicate letters etc. depending on how the PDF was generated.
After looking at the text you may be able to modify write a filter to modify the extracted text.
you could possibly also add the the words that you'd like ignored to the spelling blacklist, and also the list of stopwords