Unlike other legacy electronic formats, PDF is not a word-processing format. This means that the information stored in the PDF is not necessarily stored in the same sequence in which it is displayed on the page. PDF is a Page Description Language with its contents organized in a manner that tells a rendering tool (like Adobe Acrobat) how to draw each page.
|
The PDF language is not so much aware of a "table" as it is aware that a collection of paragraphs should be drawn in a block of columns. There are a number of tools available for exporting PDF to generic formats such as RTF, but most simply make use of the desktop publishing capability of the RTF language and the resulting content is still not extractable as a word-processing file. Intelligent effort must be applied to the extraction of information from PDF to ensure the intellectual integrity of the content is maintained.
|