Exporting to Word: a tab or spaces replace part of some words--mainly after a ligature. How do I prevent this?

I'm starting with client files & all they have are pdfs with live text. When I save them as Word Docs I notice sometimes part of words are missing and spaces or a tab is in place of the missing text. This seems to mainly happen after a ligature (ex. fi, fl). Has anyone encountered this & does anyone know how to prevent it?


Susan Hoffman


1 Answer

Converting from PDF to Word, Excel or any other format is one of the most complex things you can try to do with a PDF file. It works very well in some cases, in other cases the output has very little to do with the original file. The key for success is that the PDF file needs to be "tagged" - which means that it contains information about the information that is displayed in the file. The best way to make sure that a PDF file is tagged correctly is by using the PDFMaker in Acrobat to create the PDF file from Word or Excel (that's the Acrobat ribbon or toolbar).

Unfortunately there is not much you can do to improve the output without spending a lot of time (e.g. by manually tagging the file). Also, if you are using Adobe's ExportPDF service and don't have access to Acrobat, that is not even an option.

Pretty much only thing you can do is complain to the original author of the file and tell them that they used a bad PDF generator to create the PDF file.

You could try to convert the documents to e.g. high resolution TIFF images, then import these TIFF images back into Acrobat and then run OCR. This may give you better results.

Karl Heinz Kremer
PDF Acrobatics Without a Net
PDF Software Development, Training and More...
http://www.khkonsulting.com


Karl Heinz Kremer   


Please specify a reason: