Today, PDF is one of the most popular document formats, both privately and in the professional arena. Since its introduction by the US software provider Adobe Systems in 1993, the PDF format has developed into an internationally recognized industry standard.
The acronym PDF stands for Portable Document Format. Originally, the format had been developed in order to facilitate the exchange of electronic documents regardless of the platform, while retaining the original layout. This means that the content of a PDF file will always look the same as on the author's computer, regardless of the device or software used to display the file.
PDF files are used in many different areas. For example, the PDF format is often used to publish various kinds of information on the Internet. Forms in PDF format can also be used to collect user information without sacrificing the advantages of this format. Furthermore, PDF is used for the creation and exchange of artwork in the digital prepress process. It is also used for the long-term electronic archiving of documents and for the creation of 3D models in the fields of engineering and architecture.
Due to the outstanding significance of the PDF format, customers often request the translation of PDF files. However, the translation of PDF files involves a number of obstacles, challenges, and problems, which will be addressed in this document.
Editable vs. Non-Editable PDF Files
Usually, PDF files are categorized as editable and non-editable PDF files. Editable PDF files are files in which the text exists in the form of text elements. Non-editable PDF files are scanned documents. In these documents, the individual pages consist of full-page images.
Though content appears as text to the human eye, it actually consists of images (i.e. hard-coded pixel sequences) whose content cannot be edited directly. The content first needs to be converted to an editable state with the help of OCR (Optical Character Recognition) software. Unfortunately, the results are often not satisfactory. Of course the text can be enhanced manually, but this often means a lot of additional work, especially if the documents contain hand-written passages.
PDF Files and Translation
When translating PDF files, there is an important aspect to remember: The source-language content of a PDF file cannot be simply replaced with the translation in the target language. Even in the case of editable PDF files, editing is only possible to a very limited extent—if at all. After all, the objective of the PDF format is to prevent changes to the content (to make sure the layout is true to the original). To translate a PDF file, it must therefore first be converted into an editable file. Usually, PDF files are converted into Microsoft Word files.
The Across Translator Edition and some other translation management systems automatically convert PDF files into editable files. However, the conversion may be somewhat disappointing, especially with respect to the correct display of the content in the converted document.
Even converted documents that look good superficially often pose problems. For example, a problem that frequently occurs during the conversion is that a sentence that extends over several lines is split by hard line breaks. This inevitably leads to problems when translating the converted file. Other frequently encountered conversion problems include the following: Tab stops that are replaced by multiple spaces; in justified paragraphs, the spaces between the words are filled with multiple spaces; documents with multiple columns are not converted correctly.
Ask for the Original File
Important: For the above reasons, you should always try to get the original file from which the PDF was generated. Virtually all PDF files are generated from another file format, e.g. Word, Excel, or InDesign. Normally, the original file will be easy to handle.
Tip: The application from which a PDF file was created is often specified in the meta data of the respective PDF file. Knowing this might make it easier for the customer to locate the original file. For this purpose, open the PDF file with a PDF viewer (e.g. Adobe Reader) and open the document properties of the PDF file. In the Adobe Reader, this information can be found in the "Application" entry in the properties of the PDF file, which can be accessed via File → Properties.
Some preparatory steps before translating a PDF file:
Editable or Non-Editable PDF File
The translator should first open the document to be translated in a PDF viewer and check whether it is an editable or non-editable PDF file. If the text in the PDF file can be selected, the file is an editable file. If this is not possible, the text from the PDF file must first be made editable with the help of OCR software.
Converting PDF Files
The Across Translator Edition has an integrated conversion function for PDF files: After the project creation, editable PDF files are automatically converted into Word files (DOCX format). Thus, PDF files are always translated in the form of Word files. Following the translation, they are also checked out as Word files.
Tip: In the document settings templates of PDF (under Tools → System Settings → Document Settings → PDF), you can determine how PDF files are to be converted. Test the various settings in order to ascertain how you can achieve the best possible conversion result. For this purpose, set up a project and check in the PDF file. Subsequently, open the Word file in the translation editor crossDesk and review the content. Additionally, it would be advisable to generate a preview of the source file (via Tools → Preview → Source Preview) and review the file content and layout. If necessary, adjust the settings of the document settings template of PDF, check in the PDF file anew, and review the converted Word file once more.
Tip: Instead of using the automatic conversion function of the Across Translator Edition, the PDF file can of course also be converted with a special program.
Revising the Converted Word File
If the converted Word file contains content or layout errors, it would be advisable to correct them before the translation. To get the converted Word file, you can e.g. save the source file preview (see "Converting PDF files" above) in Word. Finally, perform the needed changes in Word (remove superfluous line breaks, multiple spaces, etc.) and check in the revised Word file to the Across Translator Edition.
When translating PDF files, there is usually no need to observe any PDF specifics if the PDF has been fully and correctly converted and revised (see above). Normally, the translator merely needs to translate text content.
Tip: If a sentence was wrongly split into two segments during the conversion of the PDF file (and the error was not corrected during the revision of the converted file), the error can usually be corrected directly during the translation by joining the two segments. To do so, select the two segments in the translation by way of multiple selection, e.g. by keeping the Ctrl key pressed and consecutively clicking the respective segments with the mouse. Subsequently, join the two segments by clicking "Join Selected Paragraphs" in the context menu.
Final document check
Upon completion of the translation and check-out of the target document from Across, the translated file should be opened in Word for a final review and any manual adjustments. Is everything displayed correctly? Do any text fields need to be resized in order for all content to be displayed correctly?
Conversion into PDF
As mentioned previously, PDF files need to be converted into Word documents for the translation. Thus, the finished translation will be checked out from Across in the form of a Word document. If the customer explicitly wants the translation back in PDF format, the Word file needs to be converted into a PDF file. For this, open the Word file and press F12 in order to open the "Save As" dialog. In the dialog, simply select the file type "PDF" in order to save the Word file as PDF file.
Attach the original PDF to the project
By means of an option in the document settings templates PDF, the original file can be attached to the respective project in PDF format. In this way, the original PDF file can always be accessed while working with the converted Word file.
To do so, click "PDF Settings" in the document settings templates of PDF (under Tools → System Settings → Document Settings → PDF) and activate the option "Attach original PDF file". In this way, the PDF file will be attached to the project and can be accessed in the translation editor crossDesk via the "Attachments" tab of crossView.