HTML is THE language of the World Wide Web: Each of the approximately 1.7 billion websites worldwide with their countless web pages ultimately consists of HTML. Therefore, HTML is an extremely important document format. As far as translations are concerned, HTML continues to play a key role in this age of globalization. However, the use of HTML is not limited to web pages. Other resources such as online software documentation also often consist of HTML.
In the past, web pages were usually created with the help of HTML editors (e.g. Adobe Dreamweaver). Nowadays, web pages are often generated using so-called content management systems (CMS). Many of these systems are web-based. Therefore, the access to these systems takes place via a conventional web browser—and so does the creation and editing of the web pages. Some widely used systems are WordPress, Joomla!, Drupal, and TYPO3. Apart from these common systems, CMS solutions custom-developed by enterprises and agencies are also used.
When talking about web pages, the translation process is often referred to as localization. Localization is usually defined as the technical, linguistic, and cultural adaptation of a product to a regional market. Thus, localization comprises more than the mere translation.
By the way: In terms of the content and technology, there is no difference between files with the extension .html and those with .htm. The extension .htm is a relic from an age in which files names were restricted to eight characters and the file extension to three characters.
The acronym "HTML" stands for "Hypertext Markup Language". HTML is a markup language, i.e. a language that "structures" the content of a web page with the help of markup elements referred to as tags.
An HTML file comprises the content of a web page (normally text) and some tags. A tag consists of two angle brackets (< and >) with the name of the tag in between, e.g. <strong>. Tags usually occur in pairs, namely in the form of a start tag (e.g. <strong>) and an end tag (e.g. </strong>). The combination of the start tag, end tag, and tag content (i.e. the information between these two tags) forms a so-called HTML element.
Besides normal text, HTML files naturally contain tags. Additionally, they may contain more specific elements, e.g. script contents.
Important: The form in which web pages are supplied for localization also depends on the software used to create them (see above). For example, it must be considered whether the utilized CMS has export and import functions that enable the exchange of web content. The web content to be localized may not be exported as HTML, but in a different format such as XML or XLIFF.
Important: For the localization of an entire website, all relevant files must be on hand. Besides the web pages to be localized, this may also include images, charts, and multimedia files, if these are to be localized as well.
Tip: It is advisable to create and customize a new document settings template for the localization project. To do so, go to >>Tools >>System Settings... >>Document Settings >>Tagged HTML, click New to create a new settings template, and enter a name. By default, the new template is prefilled with the most common HTML elements and attributes and duly preconfigured. Subsequently, you may want to read out the HTML file to be localized by clicking Load .... In this way, any HTML elements and attributes not yet contained in the template will be added to the settings template.
Before setting up a project, it must be checked what specific content the HTML files have and how it is to be handled during the localization:
Localization of hyperlinks
For hyperlinks, it must be clarified whether they are to be localized (e.g. www.my-across.net/en/ in English to www.my-across.net in German).
Tip: By default, the settings of the Across Translator Edition do not provide for localization of hyperlinks. To change this, configure the respective option in the settings of the settings template of Tagged HTML under >>Tools >>System Settings... >>Document Settings >>Tagged HTML. In the respective settings template, select the element "A" for hyperlinks (the easiest way to do so is to click the column header "Name" in order to sort the list of elements alphabetically) and click Edit…. Then go to the "Attributes" tab, select the "href" attribute (which defines the target of the hyperlink), and use the drop-down list in the "Mode" column to define it as "Translatable". Finally, confirm the change with OK.
Localization of meta tags
The content of meta tags plays a special role in the field of search engine optimization (SEO).
Additional information on the head, body, and meta tags
Meta tags are part of the header of an HTML file. Unlike the content of the HTML body, which comprises the visible part of a web page, the header content is not visible on the web page. This section contains information about the web page, e.g. information about the web page encoding, the authors, or the content of the web page. Meta tags can be used several times with different attributes. In the field of SEO, for example, the meta tag with the "description" attribute plays a key role, as the information it contains is normally displayed in the hit lists of the search engines. (Nowadays, the input of keywords with the "keywords" attributes can be skipped, as most search engines ignore this information anyway.) With respect to the meta tags, it must be checked whether the HTML files contain any SEO-relevant meta tags. You might need to ask your customer whether these meta tags need to be localized. (Tip: By default, the settings of the Across Translator Edition do not provide for localization of the content of meta tags. To change this, select the "META" element in the respective document settings template and set the "Translatable" action for the "content" attribute (see the above description on customizing the "href" attribute for hyperlinks).
In certain cases, it may be necessary for an element not to exceed a certain number of characters, e.g. to ensure that the respective content will be displayed in its entirety (e.g. title tag for the title of a web page). The customer should be asked in advance whether there are any length restrictions for certain elements. (Tip: To determine a length restriction for an element, select the respective element in the settings template of Tagged HTML and click Edit.... Then activate the option "Maximum length" and enter the maximum number of characters.)
Apart from translating the "normal" text, the localization of HTML files involves a number of HTML-specific aspects that need to be taken into consideration. This includes:
Using tags in the translation
When localizing HTML files, the HTML tags contained in the source document must also be inserted in the translation. The tags must be used at the right position and in the correct order (e.g. start tag first, then end tag). The easiest way to do this is to move the cursor to the position at which the tag is to be inserted and then to double-click the respective tag in the source segment. Alternatively, this can be done with keyboard shortcuts: For example, press Ctrl+Shift+1 to insert the first tag from the current source segment in the translation, Ctrl+Shift+2 for the second tag, and so on. Press Ctrl+Shift+0 to insert the current tag in the translation. (Tip: The quality management criteria "Placeables usage" and "Placeables order" help to avoid errors when inserting tags in the translation.)
Inserting additional tags
It might be necessary to insert additional tags apart from those contained in a source segment, e.g. to highlight another word in the translation or to insert an additional hyperlink. To do so, open the context menu by clicking the right mouse button at the position where the additional tag is to be inserted in the translation and select the commend Insert Inline Tag.... Subsequently, select the respective tag from the list, add any required attribute values, and confirm with OK. The respective tag pair will be inserted in the translation at the current cursor position.
Localization of hyperlinks
If the hyperlinks need to be localized (see above), the attribute value of the "href" attribute can be adapted during the translation. Simply insert the target-language URL in the translation instead of the source-language URL.
Final review of the localized web pages in the browser
After the localization work is finished and the HTML files have been checked out from the Across Translator Edition, the localized web pages should be opened in a web browser for a final review, especially to ensure correct and complete display of the content and operability of hyperlinks. (Note: Instead of the images, the localized web pages often merely display frames in the size of the images. This is due to technical reasons. The images are often inserted in the web pages by means of so-called relative links. Thus, the images cannot be accessed without taking further steps.)
Checking the settings of the document settings template
By means of the integrated preview function, you can check the current settings of a document settings template in a well-structured form. To do so, open the preview dialog via >>Configure... >>Preview..., click Browse... to select an HTML file to be localized, and then click OK to generate the preview. The source code of the HTML file will be highlighted in various colors to show e.g. which parts of the HTML file are to be translatable and which ones are to be hidden or locked.
It might be necessary to localize the content of an HTML element in certain cases only, depending on another parent or subordinate element or depending on the attribute value of an element. In such cases, the document settings template of Tagged HTML can be configured accordingly.
Advanced settings of Tagged HTML
More detailed settings for processing HTML files (e.g. settings for character entities or for handling invalid or undefined elements) can be configured in the advanced settings. You can find them in the respective document settings template of Tagged HTML under >>Configure... >>Advanced....