XML

Translate with Across

Introduction

XML is a standardized markup language that has recently experienced a sharp rise in significance. XML is often used for the management of structured data, i.e. data with a predefined data structure.

XML is used in many application areas, e.g. for the layout-independent description of documents (e.g. in content management systems and editorial systems) or as the format of choice for exchanging data between different applications. For instance, the translation industry makes extensive use of the TMX, TBX, and XLIFF formats, all of which are XML-based. Of course there are also many other XML-based markup languages. For example, consider XHTML, an XML-based variant of HTML, or MathML, a markup language for the display of mathematical formulas. The Office formats DOCX (Word), XLSX (Excel), and PPTX (PowerPoint), which were introduced in 2007, and the InDesign exchange format IDML are also based on XML.

Due to its great significance and popularity, XML naturally also plays a key role in the field of translation.

As XML is a text-based format, XML files can be edited with a conventional text editor. Often, however, XML files are created and edited with the help of special XML editors that prevent the creation of faulty files. Many editorial systems and authoring tools also make use of the XML format. Some applications generate XML files for data exchange purposes.

By the way: XML files may have but do not necessarily need to have the file extension .xml. XML-based markup languages usually have a different file extension. For example, this is the case with SVG, an XML-based image format for the presentation of 2D vector graphics, as well as the above mentioned exchange formats TMX, TBX, and XLIFF from the field of translation. (Tip: As the Across Translator Edition cannot know all XML-based markup languages and applications that use custom file extensions, XML-based files of a format that is not yet supported can be registered in the Across Translator Edition; see "Tips and Tricks" below.)

XML — a Brief Introduction

The acronym "XML" stands for "eXtensible Markup Language". XML is a markup language, i.e. a language that describes the structure of XML files with the help of markup elements referred to as tags. More precisely, XML is a meta language: It serves the definition of (new) markup languages or document types—hence the expression "extensible" in the name of XML. For this purpose, XML provides a number of rules that must be observed when defining a new markup language.

A language usually has a structure that is based on grammar rules. This is also true of markup languages. In XML and the markup languages based on it, there are two ways to define a grammar that determines the structure as well as the elements and attributes of XML files: by means of a document type definition (DTD) or by means of an XML schema definition (XSD). DTD is the older of the two variants, but uses its own syntax to describe the structure. The newer XSD describes the structure in the form of an XML document, i.e. using the rules and "vocabulary" of XML.

In connection with XML files, two concepts play a fundamental role: well-formedness and validity. An XML file is "well-formed" if it complies with XML rules. To be "valid", an XML file needs to be well-formed and contain an internal grammar (in the form of a DTD or XSD) or a reference to an external grammar, whose rules are complied with.

First Steps

Basically, the Across Translator Edition offers three different modes for translating XML files: Visual XML, Tagged XML, and Tagged XML v2. In the "Visual XML" mode, the XML elements and attributes in the crossDesk translation editor are not displayed as tags, but with the help of various styles. By contrast, the "Tagged XML" mode displays the XML elements and attributes in the form of tags. The same is true of the "Tagged XML v2" mode. However, unlike "Tagged XML", especially the attribute values that can be translated are displayed in separate segments. By default, the Across Translator Edition uses the "Tagged XML v2" mode for the translation of XML files. Therefore, the following sections will focus on the translation of XML using this mode.

Important: Document settings templates are especially important when translating XML files. Unlike HTML, the tags in XML are not predefined, but can be defined freely on the basis of certain rules. For the translation of XML files, it is therefore important to determine how the Across Translator Edition is to treat the XML elements when importing the respective XML file. It is advisable to create and customize a new document settings template for the translation project. To do so, go to >>Tools >>System Settings... >>Document Settings >>Tagged XML v2, click New to create a new settings template, and enter a name. Subsequently, the settings template must be filled with the respective XML elements and attributes. The easiest way to do this is to read out the XML file to be translated (or the underlying DTD/XSD) by clicking Load.... In this way, the current settings template will automatically be filled with the available XML elements and attributes.

Before setting up a project, it must be checked what specific content the XML file has and how it is to be handled during translation:

Consideration of the DTD/XSD

XML files are validated against the underlying DTD/XSD. In Across, this validation takes place via the QM criterion "Tagged XML validity" (see below). If the XML file to be translated contains a reference to an external DTD or XSD, it will automatically be loaded along during the check-in to Across (that is, if the external DTD/XSD can be accessed). This also applies if the XML file contains the DTD/XSD. (Tip: If an external DTD/XSD cannot be accessed, the DTD/XSD can be loaded in the settings template of Tagged XML v2. The respective DTD/XSD will be used as master DTD/XSD, i.e. it will be used for validation even if an XML file that is checked in to Across references another external DTD/XSD or contains another internal DTD/XSD.)

Definition of internal XML elements

When translating XML files, distinction is made between external and internal XML elements. External elements are located outside the segment to be translated and are therefore hidden automatically in the crossDesk translation editor. By contrast, internal elements (also referred to as "inline elements") are located inside a segment to be translated. For example, they might case part of the segment (e.g. a word) to be highlighted in a special way. Prior to the translation, it should therefore be checked whether the XML files contain such internal XML elements. If this is the case, they must be defined as such in the Across Translator Edition in order to ensure a smooth translation process. (Tip: By default, the settings of the Across Translator Edition provide for interpretation of all new or unknown XML elements as external elements. Therefore, only the internal XML elements need to be defined as such. To do so, select the respective element in the settings template of Tagged XML v2 under >>Tools >>System Settings... >>Document Settings >>Tagged XML v2, click Edit…, and set the element type to "Internal".)

Adjustment of attribute values

In some cases, it may be necessary to adjust the values of certain attributes within the scope of the translation. Therefore, the customer should be asked prior to the translation whether this is the case with the XML files to be translated. (Tip: By default, the settings of the Across Translator Edition do not provide for localization of attribute values. To change this, the respective option must be modified in the settings of the settings template of Tagged XML v2. In the respective settings template, select the element whose attribute values are to be modifiable and click Edit…. Then go to the "Attributes" tab, select the respective attribute, and use the drop-down list in the "Mode" column to define it as "Translatable".)

Embedded markup code

Sometimes, the markup code of an XML file might contain embedded markup code of another markup language, e.g. HTML. The inserted markup code may be embedded as a CDATA section (<![CDATA[ ... ]]> ) or by masking the code with the help of character entities (e.g. &lt; for <). Prior to the translation, it must therefore be clarified (with the customer, if necessary) whether the XML file to be translated contains any embedded code. (Tip: To determine that an XML element may contain embedded markup code, select the respective element in the settings template of Tagged XML v2 and click Edit.... Then activate the option "May contain embedded markup code" and select the embedding type.)

Length restrictions

In certain cases, it may be necessary for an element not to exceed a certain number of characters, e.g. to ensure that the respective content will be displayed in its entirety. If must be clarified with the customer in advance whether there are any length restrictions for certain elements. (Tip: To determine a length restriction for an element, select the respective element in the settings template of Tagged XML v2 and click Edit.... Then activate the option "Maximum length" and enter the maximum number of characters.)

Translating XML Files with the Across Translator Edition

Apart from translating the "normal" text, the translation of XML files involves a number of XML-specific aspects that need to be taken into consideration. This includes:

Using tags in the translation

When translating XML files, the inline tags contained in the source document must also be inserted in the translation. The tags must be used at the right position and in the correct order (e.g. start tag first, then end tag). The easiest way to do this is to move the cursor to the position at which the tag is to be inserted and then to double-click the respective tag in the source segment. Alternatively, this can be done with keyboard shortcuts: For example, press Ctrl+Shift+1 to insert the first tag from the current source segment in the translation, Ctrl+Shift+2 for the second tag, and so on. Press Ctrl+Shift+0 to insert the current tag in the translation.

Inserting additional inline tags

It might be necessary to insert additional inline tags apart from those contained in a source segment, e.g. to mark up another word in the translation. To do so, open the context menu by clicking the right mouse button at the position where the additional tag is to be inserted in the translation and select the command Insert Inline Tag.... Subsequently, select the respective tag from the list, add any required attribute values, and confirm with OK. The respective tag pair will be inserted in the translation at the current cursor position.

Inserting character entities

Character entities can be used to mask special characters that have a special meaning or function in the XML syntax, e.g. < instead of <. The masking prevents the special character (in this case the "less than" symbol) from being interpreted as the beginning of a tag. To insert a character entity in the translation, open the context menu by clicking the right mouse button at the respective position in the translation and select the command Insert Character Entity.... Subsequently, select an entity from the list and confirm with OK. The respective character entity will be inserted in the translation at the current cursor position.

Translation of attribute values

If attribute values are to be translated (see above), the respective attribute value will be presented in a separate segment during translation. In this way, the attribute value can be translated or adapted to the target language like normal text.

 

Tip: The Across Translator Edition features special quality management criteria for the translation of XML. These criteria help to avoid errors with respect to the well-formedness and validity of the translated XML files and with respect to the correct use of tags in the translation:

Placeables usage

Checks whether the number of tags is identical in the source and target segments.

Placeables order

Checks whether the tags that occur in the source segments have been used in the right order and number in the target segment.

Tagged XML well-formedness

Checks whether the current target segment is well-formed.

Tagged XML validity

Checks whether the translated XML file is valid. The validity check is always conducted for the entire XML file. The check will stop at the first error found. After this error is corrected, the validity check can be repeated in order to find any other errors. (Tip: The validity check must be started manually, e.g. via the context menu of the QM criterion.)

Tips and Tricks

Registering an XML document format that is not yet supported

An XML-based document whose file extension is not yet supported by default can still be translated with the Across Translator Edition. To do so, the new file extension merely needs to be registered. To do so, go to >>Tools >>System Settings... >>Document Settings >>Document Associations, select the "Tagged XML v2" mode from the drop-down list "Document type", and click Add.... Then specify the file extension of the new file format and a name/description and click OK. (Normally, no other settings need to be configured.) Henceforth, the new document format will be supported by the Across Translator Edition, enabling files of this format to be checked in and translated.

Check the settings of the document settings template

By means of the integrated preview function, you can check the current settings of a document settings template in a well-structured form. To do so, open the preview dialog via >>Configure... >>Preview..., click Browse... to select an XML file to be translated, and then click OK to generate the preview. The source code of the XML file will be highlighted in various colors to show e.g. which parts of the XML file are to be translatable and which ones are to be hidden or locked.

Conditional XML

It might be necessary to translate the content of an XML element in certain cases only, depending on another parent or subordinate element or depending on the attribute value of an element. In such cases, the document settings template of Tagged XML v2 can be configured accordingly.

Advanced settings of Tagged XML

More detailed settings for processing XML files (e.g. settings for character entities or for handling invalid or undefined elements) can be configured in the advanced settings. You can find the advanced settings in the respective document settings template of Tagged XML v2 under >>Configure... >>Advanced....

The "acrossXMLMerger" tool

In case multiple XML files need to be translated, the "acrossXMLMerger" tool from the Toolbox of the Across Translator Edition can be used. With the help of this tool, multiple XML files can be merged to a single XML file. Subsequently, this file can be checked in and translated in a resource-friendly way. Finally, the translated XML file can be split into the original individual files. (The Toolbox can be found in the subfolder Toolbox of the installation directory of the Across Translator Edition, e.g. C:\Program Files (x86)\Across\Toolbox.)