Common terms used in translation

Getting to know the jargon of translation is an essential part of building the partnership with the supply chain and fully comprehending what you are buying. Here follows a brief guide to commonly used terms in translation and what they mean.

Process of converting information into an appropriate format for the target language and culture.
TM applications employ fuzzy matching algorithm(s) to retrieve similar target language strings, flagging differences. The flexibility and robustness of the matching algorithm largely determine the performance of the system.
Alignment is the task of defining translation correspondences between source and target texts. Alignment is a process that allows text in a range of software packages to be converted semi-automatically into a Translation Memory format for re-use. There should be feedback from alignment to segmentation and a good alignment algorithm should be able to correct initial segmentation.
Alignment tool
Application that automatically pairs versions of same text in the source and target languages in a table. Also called bi-text tool.
Situation in which the intended meaning of a phrase is unclear and must be verified - usually with the source text author - in order for translation to proceed.
Antonyms are opposites words, that reside in an inherently incompatible binary relationship, e.g. In the pairs - male : female, long : short, up : down, and precede : follow.
Arabic numerals
Set of ten numerals (0,1,2,3,4,5,6,7,8,9) that comprise the most commonly used symbolic representation of numbers throughout the world.
Artificial intelligence
Branch of computer science devoted to creating intelligent machines that produced the first efforts toward machine translation.
A property defined and applied to a Translation Memory units/segment to help sequence retrieval. Attributes are also those fields that define and qualify term bases.
Automatic retrieval
TMs are searched and displayed automatically as a translator moves through a document. (Server based).
Automatic substitution
Exact matches come up in translating new versions of a document. During automatic substitution, the translator does not check the translation against the original, so if there are any mistakes in the previous translation, they will carry over.
Automatic translation
Machine-based translation process not subject to input by a human translator.
Back translation
Process of translating a previously translated text back into its source language.
Script that normally reads from right to left but contains some exceptions in which other characters, like numerals, read from left to right. Hebrew and Arabic are examples of bidirectional languages.
CAT (tools)
Computer-assisted translation (tools) - The process by which a human translator uses computer software to facilitate translation.
Common Sense Advisory
Market research agency providing data to operationalize, benchmark, optimize , and innovate industry best practices in translation, localization and associated industries.
Character set
Collection of symbols or characters that correspond to textual information in a language or language group.
In linguistics, cognates are words that have a common etymological origin. An example of cognates within the same language would be English shirt and skirt.
The activities required to check, process and output to one or multiple target formats in a single source publishing environment (e.g. Robohelp).
Collaborative translation
Emerging approach to translation in which companies use the elements of crowdsourcing in a controlled environment for working on large corporate projects in short periods of time.
Procedure of linking multiple files or messages together as a single document, often to facilitate processes such as search and replacement, term list extraction, collocation finding, and repetition rate establishment.
This feature allows translators to select one or more words in the source segment and the system retrieves segment pairs that match the search criteria. This feature is helpful for finding translations of terms and idioms in the absence of a terminology database.
Measure of how often a term or phrase is rendered the same way into the target language.
Information outside of the actual text that is essential for complete comprehension.
Controlled vocabulary
Standardized terms and phrases that constitute a system's vocabulary.
Controlled language
Language in which grammar, vocabulary and syntax are restricted in order to reduce ambiguity and complexity and to make the source language easier to understand by native and non-native speakers and easier to translate with machine and human translation.
Country code
Abbreviation of two or three characters to signify a country or dependent area. ISO 3166 specifies country codes, such as "AL" for Albania and "CZ" for the Czech Republic. There are also country codes for telephone numbers, such as +1 for the U.S. and Can
(Content Management System) Tool that stores, organizes, maintains, and retrieves data.
The practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people and especially from the online community rather than from traditional employees or suppliers
Abbreviation for community, crowdsourced, and collaborative translation.
Cultural adaptation
Adjustment of a translation to conform with the target culture.
Cultural assessment
Examination of an individual's or group's cultural preferences through comparative analyses.
Culturally-sensitive translation
Translation that takes into account cultural differences.
Abbreviation for double-byte enabled.
Desktop publishing
Applications like FrameMaker, PageMaker, and QuarkXPress to prepare documentation for publication.
Variety of a language spoken by members of a particular locale and characterized by a unique vocabulary, grammar and pronunciation.
XML-based architecture for authoring, producing and delivering technical information.
The Darwin Information Typing Architecture is a proposed OASIS standard. It provides a comprehensive architecture for the authoring, production and delivery of technical documentation. DITA was originally developed within IBM and then donated to OASIS.
The essence of DITA is the concept of topic-based publication construction and development, which allows the modular reuse of specific sections. Each section is authored independently, and then each publication is constructed from the section modules. This means that individual sections need only be authored and translated once, and may be reused many times over in different publications.
Abbreviation for do not translate. List of such phrases and words include brand names and trademarks.
Area of knowledge that is communicated within a text, translation, or corpus.
Document type definition. Description of how content should be structured, providing rules for tags and characteristics, to enable programs to more easily process and store the document. Commonly abbreviated DTD.
DTP: desk top publishing
It's about using specific software to combine and rearrange text and images and creating digital files. Before the 80's all printing and publishing was done manually and could take hours, then Paul Brainerd, founder of Aldus Corporation and PageMaker coined the term desktop publishing after printing a hard copy of a document from a desktop terminal. Here are some of the most common uses of DTP: Brochures, Newspapers, Newsletters, Technical Documentation, Web Pages. Although closely related DTP should not be confused with graphic design, which involves the creative process of coming up with the concepts and ideas and arrangements for visually communicating a specific message.
Double-byte enabled
Quality of an application or program that supports double-byte languages. Commonly abbreviated DBE.
Double-byte language
Language - such as Chinese, Korean, and Japanese - that requires two bytes (16 bits) to represent each character precisely.
Recording or replacement of voices commonly used in motion pictures and videos for which the recorded voices do not belong to the original actors or speakers and are in a different language.
Dynamic content
Data produced in response to changeable, unfixed and retrieved from a database through user requests.
Eastern Arabic numerals
Set of symbols used to represent numbers in combination with the Arabic alphabet in various countries, including Afghanistan, Egypt, Iran, Pakistan, Sudan, and parts of India. Also called Arabic Eastern Numerals.
Editing - Second level of review in the traditional TEP process.
Encoding scheme
System that assigns a numeric value to each character, in order to convert the character set to an automated form for transmitting and maintaining information.
Exact match
Exact matches (during Translation memory analysis) appear when the match between the current source segment and the stored one has been a character by character match. When translating a sentence, an exact match means the same sentence has been translated before. Exact matches are also called 100% matches.
Extended characters
Characters that exceed the ASCII character range of seven bits, such as characters with diacritical marks or non-Roman characters.
eXtensible markup language (XML)
Metadata language used to describe other markup languages. Commonly abbreviated XML.
False friends
False friends are pairs of words or phrases in two languages or dialects (or letters in two alphabets) that look or sound similar, but differ in meaning.
Abbreviation for French, Italian, German and Spanish.
Functional testing
Reviewing software applications and programs to ensure that the localization process does not change the software or impair its functions or on-screen content display.
Fuzzy match
Indication that words or sentences are partially - but not exactly - matched to previous translations.
When the match (during Translation Memory analysis) has not been exact, it is a fuzzy match. Some systems assign percentages to these kinds of matches, in which case a fuzzy match is greater than 0% and less than 100%. Those figures are not comparable across systems unless the method of scoring is specified.
Fuzzy logic
Process that creates near matches in text to translation memory terms when exact matches cannot be found.
Acronym for globalization, internationalization, localization, and translation.
GIM - Abbreviation for global information management.
Gist translation
Use of human or machine translation to create a rough translation of the source text that allows the reader to understand the essence of the text.
Globalization (G11N)
Globalisation (or globalization) describes the process by which regional economies, societies, and cultures have become integrated through a global network of political ideas through communication, transportation, and trade. Globalization is referred to as a cycle, rather than a single process. G11N - Abbreviation for globalization, with the number 11 representing the number of characters between the G and N.
Combination of the words 'global' and 'local,' used to describe products or services intended for international markets and have been customized for different languages, countries, and cultures
A glossary, also known as an idioticon, vocabulary, or clavis, is an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms.
Traditionally, a glossary appears at the end of a book and includes terms within that book which are either newly introduced, uncommon or specialized.
A bilingual glossary is a list of terms in one language which are defined in a second language or glossed by synonyms (or at least near-synonyms) in another language.
In a general sense, a glossary contains explanations of concepts relevant to a certain field of study or action. In this sense, the term is related to the notion of ontology. Automatic methods have been also provided that transform a glossary into an ontology or a computational lexicon.
GILT Metrics. GILT stands for (Globalization, Internationalization, Localization, and Translation). The GILT Metrics standard comprises three parts: GMX-V for volume metrics, GMX-C for complexity metrics and GMX-Q for quality metrics. The proposed GILT Metrics standard is tasked with quantifying the workload and quality requirements for any given GILT task.
Global information management Metrics eXchange (GMX) is a family of standards of globalization- and localization-related metrics. The three components of GMX are:
Volume (V) Global Information Management Metrics Volume addresses the issue of quantifying the workload for a given localization or translation task, something often handled using word counts. Word counts, however, do not convey the true range of possible statistics that can be used to assess the cost of localizing a document. GMX-V provides a standard and more precise definition of the statistics necessary for to assess the quantity of text (and costs) associated with language-related globalization tasks. More...
Complexity (C) (proposed). GMX-C will provide a standard metric for the assessment of textual complexity with regard to globalization tasks. This format has not yet been defined.<
Quality (Q) (proposed). GMX-Q will provide a standard format for the specification of quality requirements for globalization tasks, thus allowing quality expectations to be specified in contracts and other agreements and verified. This format has not yet been defined.
A homonym is one of a group of words that share the same spelling and the same pronunciation but have different meanings.
In Context Exact (ICE) match or Guaranteed Match
An ICE match is an exact match that occurs in exactly the same context, that is, the same location in a paragraph. Context is often defined by the surrounding sentences and attributes such as document file name, date, and permissions.
In-country review
Evaluation of a translated text by an individual who resides within the country where the target text will be used.
Internationalization (I18N)
Internationalization is the planning and preparation stages for a product that is built by design to support global markets. This process removes all cultural assumptions and any country- or language-specific content is stored so that it can be easily adapted. If this content is not separated during this phase, it must be fixed during localization, adding time and expense to the project. In extreme cases, products that were not internationalized may not be localizable. I18M - 18 stands for the number of letters between the first i and last n in internationalization, a usage coined at DEC in the 1970s or 80s.
Process of rendering oral spoken or signed communication from one language to another, or the output that results from this process.
System of signed, spoken, or written communication.
Language tags and codes
Language codes are closely related to the localizing process, as they indicate the locales involved in the translation and adaptation of the product. There are multiple language tag systems available for language codification. For example, the International Organization for Standardization (ISO) specifies both two- and three-letter codes to represent languages in standards ISO 639-1 and ISO 639-2, respectively.
Language combination
Group of active and passive languages used by an interpreter/translator.
Language kit
Add-on feature that permits a keyboard to produce character sets for a given language.
Language pair
Languages in which a translator or interpreter/translator can provide services.
Language Services Provider (LSP)
An organization or business that supplies language services, such as translation, localization, or interpretation. Commonly abbreviated LSP
  1. Practice of reusing previously translated terms and phrases in new translations
  2. Rank which evaluates how much of the previously translated text can be reused
Linguistic parsing
The base form reduction is used to prepare lists of words and a text for automatic retrieval of terms from a term bank. On the other hand, syntactic parsing may be used to extract multi-word terms or phraseology from a source text. So parsing is used to normalise word order variation of phraseology, this is which words can form a phrase.
Literal translation
Translation that closely follows the phrasing, order and sentence construction of the source text.
Localization Industry Standards Association
A metric for the evaluation of translation quality developed by the Localization Industry Standards Association.
Localization (L10N)
Process of adapting or modifying a product, service, or website for a given language, culture or region. Language localization (from the English term locale, a place where something happens or is set) is the second phase of a larger process of product translation and cultural adaptation (for specific countries, regions, or groups) to account for differences in distinct markets, a process known as internationalization and localization. Language localization is not merely a translation activity, because it involves a comprehensive study of the target culture in order to correctly adapt the product to local needs. Localization is sometimes referred to by the numeronym L10N (as in: L, followed by ten more letters, and then N). Localization refers to the actual adaptation of the product for a specific market. The localization phase involves, among other things, the four issues LISA describes as linguistic, physical, business and cultural, and technical issues

The localization process is most generally related to the cultural adaptation and translation of software, video games, and websites, and less frequently to any written translation (which may also involve cultural adaptation processes). Localization can be done for regions or countries where people speak different languages, or where the same language is spoken: for instance, different dialects of Spanish, with different idioms, are spoken in Spain than are spoken in Latin America; likewise, word choices and idioms vary among countries where English is the official language (e.g., in the United States, the United Kingdom, and the Philippines).
Localization engineering
Software engineering carried out to support localization. Activities include internationalization, bug fixing, functionality testing, dialog box resizing, help compilation, and other software-related activities. Most LSPs charge for these services by the
Localization tool
Application that assists with the translation and adaptation required for localization.
Machine Translation (also known as automated translation)
Translation carried out exclusively by a machine. Commonly abbreviated MT.
Machine translation plus translation memory
A workflow and technology process in which terms not found in translation memory are automatically sent to the machine translation software for translation.
Markup language
Artificial language that uses annotations to indicate how text should be formatted.
Indication that words or sentences are matched - either partially or fully - to previous translations.
Meaning-for-meaning translation
Translation for which the words used in both languages may not be exact equivalents, but the meaning is the same.
One of the ten most important languages on the web, including Chinese, Dutch, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.
Information that describes data.
Smallest unit of meaning in a language.
Mother tongue
Native and first learned language of an individual.
MT - Abbreviation for machine translation.
Multi-byte character set
Character set in which the number of bytes per character varies. Abbreviated MBCS.
Multi-byte language
Language that requires the use of a multi-byte character set.
Process by which the linguistic and cultural diversity among a group of people increases.
Multi-language vendor (MLV)
Language service provider that offers services in multiple language pairs. Abbreviated MLV.
Multilingual workflow
Automation of business processes related to the development of multilingual products by managing multilingual content, usually through a translation management system, machine translation, and translation memory.
Process of expanding an organization's presence into multiple nations. Commonly abbreviated M18N.
The SDL Trados terminology tool. Latest version SDL MultiTerm 2009 and SDL MultiTerm Server 2009.
Native language
First language that a human learns naturally, usually since childhood.
Networking (TM Server)
When networking during the translation it is possible to translate a text efficiently together with a group of translators. This way, the translations entered by one translator are available to the others. Moreover, if translation memories are shared before the final translation, there is a chance that mistakes made by one translator will be corrected by other team members.
Neutral Spanish (also Universal Spanish)
Spanish that is mutually intelligible by speakers from various parts of the Spanish-speaking world and is not immediately identifiable with any single regional variety of the language. No standards exist for defining neutral Spanish.
Next-wave language
One of the languages of growing importance on the web.
Abbreviation for open lexicon interchange format.
Description of the relationships between concepts, objects, and other entities within a given field.
Plain English
Method of writing English that employs a clear and simple style, usually for the purpose of improving readability. Among its features are using only active verbs (no passive voices) and making sure that each word has only one meaning.
PM - Abbreviation for project manager.
Individual who carries out management and coordination tasks for a given translation project. Commonly abbreviated PM.
Abbreviation for price per word.
Process by which one or more humans review, edit, and improve the quality of machine translation output.
Project manager
Individual who carries out management and coordination tasks for a given translation project. Commonly abbreviated PM.
Process by which a text is edited prior to translation in order to clarify ambiguous terms and increase translatability.
Phase of translation process in which documents are prepared for conversion into another language. Usually includes an automated analysis against translation memories so that previously translated text is inserted in a file, therefore avoiding rework and associated costs.
Project setup
Translation preprocessing steps include tasks such as glossary and style guide preparation, project planning, file preparation, content familiarization, and training.
Practice of checking a translated text to identify and correct spelling, grammar, syntax, and coherency and integrity errors, (usually carried out by a second linguist or translator. - not necessarily. Proofreading can be done by editors with no second language.
Pseudo localization
is the process of faking translation of software or web applications before starting to localize the product for real. It is used to verify that the user interface is capable of containing the translated strings (length) and to discover possible internationalization issues.
is a procedure which simulates how a translated document will look after translation and how much extra DTP or other work will be required before actual translation is done. This can help in setting the appropriate timelines of projects.
QA - Abbreviation for quality assurance.
Process designed to ensure translation quality, in which specific processes are followed with the purpose of minimizing errors.
QC - Abbreviation for quality control.
QI - Abbreviation for quality improvement.
Quality improvement Process designed to ensure translation quality, in which the overall goal is to enhance performance.
Quality assurance
Process designed to ensure translation quality, in which specific processes are followed with the purpose of minimizing errors.
Quality control
Process designed to ensure translation quality, in which the target text is reviewed with the purpose of catching errors.
Quality improvement
Quality improvement Process designed to ensure translation quality, in which the overall goal is to enhance performance.
Abbreviation for rules-based machine translation.
Measure of formality of language dependent upon the tone, terminology, and grammar implemented.
Sentence or phrase that is repeated in the source text, often referred to a Translation Memory analysis.
Rich Media Content
Synonymous for interactive multimedia.
A broad range of interactive digital media that exhibit dynamic motion, taking advantage of enhanced sensory features such as video, audio and animation.
Return on investment, a performance measure that is used to evaluate the efficiency of an investment.
Roman numerals
System of numerals that evolved from the system used in classical Rome, often used for purposes such as numbering pages in introductions or prefaces.
SAE J2450
A metric for the evaluation of translation quality, originally developed for the automotive sector. The metric comprises eror categorization and severity.
SDK - Abbreviation for software development kit.
Sentence or phrase that is separated from the rest of a text based on language construction rules such as punctuation.
Its purpose is to choose the most useful translation units. Segmentation is like a type of parsing. It is done monolingually using superficial parsing and alignment is based on segmentation. If the translators correct the segmentations manually, later versions of the document will not find matches against the TM based on the corrected segmentation because the program will repeat its own errors. Translators usually proceed sentence by sentence, although the translation of one sentence may depend on the translation of the surrounding ones.
Simplified Chinese
Contemporary written Chinese language used in mainland China and Singapore.
SimShip - Simultaneous shipment
Abbreviation for simultaneous shipment.
Single-byte character set
Character set in which a single 8-bit byte represents a character.
Single sourcing (single source publishing)
Single sourcing or single source publishing - Process of producing a document in one format and automatically translating or publishing it into multiple formats.
SMT - Abbreviation for statistical machine translation.
Software development kit
Documentation and source code that facilitate the process of developing programs that interface with a given product. Commonly abbreviated SDK.
Software engineering
Process of translating and adapting computer software from one language and culture into another. Also referred to as localization engineering.
Source code
Code that is compiled to develop a program.
Source count
Number of words in a text to be translated.
Source file
File that contains the source document in its original form, as opposed to a generated file, and is required for localization processes.
Source language
Original language of the text that to be translated.
Source text
Text to be translated.
Source text analysis
Analysis of the source text prior to translation that provides a better idea of the difficulty of the translation.
Segmentation Rules eXchange (SRX) is intended to enhance the TMX standard so that translation memory data that is exchanged between applications can be used more effectively. The ability to specify the segmentation rules that were used in the previous translation may increase the leveraging that can be achieved.
Segmentation Rules eXchange (SRX) is the vendor-neutral standard for describing how translation and other language-processing tools segment text for processing. It allows Translation Memory (TM) and other linguistic tools to describe the language-specific processes by which text is broken into segments (usually sentences or paragraphs) for further processing. It was developed when it was realized that TMX leverage was sometimes lower than expected because different tools segmented text in different ways, preventing a direct correlation between results between the tools. SRX version 2.0 was officially accepted as an OSCAR standard in April 2008.
Standard line
Measure of the usual number of keystrokes per line in a certain text, which varies per country, and consists on average of 50 to 60 characters; commonly used for translation projects that are priced on a per line basis.
Statistical machine translation
Second-generation solutions that take a probability-based approach to translation through computational analysis of data, treating data as character strings, determining patterns, and leveraging regularities. Commonly abbreviated SMT.
Style guide
Document that describes the correct grammar, punctuation, spelling, style and numeric formats to ensure consistency and quality in a translated text.
Style sheet
Document or template that describes the structure and format of a document, with instructions regarding fonts, page size, spacing, margins, paragraph styles and tag mark-ups to ensure consistency and quality in a translated text.
Subtitles (also captioning)
Subtitles are textual versions of the dialog in films and television programs, usually displayed at the bottom of the screen. They can either a written form of the original language or a translation.
Synonyms are different words with almost identical or similar meanings, e.g. Student and pupil.
Study of structure and elements that form grammatical sentences.
Marking content in a document with information about its content.
Target audience
Group of people who receive the information rendered by the interpreter in the target language.
Target language
Language into which the text is translated.
TBX - Abbreviation for term base eXchange.
XML standard for exchanging terminological data.
Technical translation
Translation of technical texts, such as user or maintenance manuals, catalogues and data sheets.
Translation - Edit - Proofread Process.
Word, phrase, symbol or formula that describes or designates a particular concept.
Term extraction (also term harvesting)
Selecting terms in a text and placing them in a terminology database for analysis at a later time.
Collection of terms
Terminology analysis
Process carried out prior to translation in order to analyze the vocabulary within a text and its meaning within the given context, often for the purpose of creating specialized dictionaries within specific fields.
Terminology database
Electronic repository of terms and associated data.
Term extraction
It can have as input a previous dictionary. Moreover, when extracting unknown terms, it can use parsing based on text statistics. These are used to estimate the amount of work involved in a translation job. This is very useful for planning and scheduling the work. Translation statistics usually count the words and estimate the amount of repetition in the text.
A termbase is a database containing terminology and related information. Most termbases are multilingual and contain terminology data in a range of different languages.
Termbase Definition and the Structure of Entries
All termbase entries are structured in the following way:
  • Entry level - contains system fields, and any descriptive fields that apply to the entry as a whole.
  • Index level - contains index fields with terms as content, and any descriptive fields that apply to all terms in a given language.
  • Term level - contains any descriptive fields that apply to a given term.
The termbase definition for a given termbase specifies the number and type of fields that a termbase entry may contain and the entry structure that entries must conform to. The entry structure specifies:
The number and type of fields that may exist at each level in the entry.
The hierarchical structure of fields within each level, that is, whether fields are nested or not.

Whether fields are mandatory or multiple at a given level of the entry.
NOTE: MultiTerm supports unlimited nesting of descriptive fields.
Termbase fields
The different types of field are as follows:
  • Index fields - contain the terms for each entry. Each index corresponds to one of the termbase languages.
  • Descriptive fields - contain descriptive information about the entry or language as a whole, or about the individual terms. Each descriptive field has a defined data type. Types of data include text, picklist, number, date, Boolean and multimedia file.
  • Entry class field - specifies the entry class to which the entry belongs.
  • System fields - created and maintained by the system, these fields are used to store tracking information for the entry as a whole or for individual fields. System fields in MultiTerm include the Entry number field and the set of four history fields. The Entry Number field is automatically assigned to each entry at entry level; for more information about history fields, see below.
  • History fields - MultiTerm uses a set of four history fields: Created on, Created by, Modified on and Modified by. History fields are automatically assigned to each entry at entry level and to each index at index level. For all other fields in the termbase, history fields are optional and must be commissioned in the Termbase Wizard. Once assigned, history fields are created and maintained by the system.
Term Link
Term Link (formerly TBX Link) is a XML namespace-based notation that enables specific identified terms within an XML document to be linked to an external XML termbase, including those in TBX - TermBase eXchange (TBX) format. The purpose of the Term Link specification is to provide a rigorous notation for linking embedded terms in an XML document to their entries in an external termbase.

Why Use Term Link?
Term Link provides the best method for linking XML document to terminological resources. It enables users to:
  • Ensure that localizers have access to terminology resources referenced in files.
  • Provide users with access to terminological data stored remotely.
  • History

Term Link is not yet an official standard, and its contents and format may change prior to official adoption.
Text memory
"Text memory" is the basis of the proposed Lisa OSCAR xml:tm standard. Text memory comprises author memory and translation memory.
TermBase eXchange. This LISA standard, which was revised and republished as ISO 30042, allows for the interchange of terminology data including detailed lexical information. The framework for TBX is provided by three ISO standards: ISO 12620, ISO 12200 and ISO 16642. ISO 12620 provides an inventory of well-defined "data categories" with standardized names that function as data element types or as predefined values. ISO 12200 (also known as MARTIF) provides the basis for the core structure of TBX. ISO 16642 (also known as Terminological Markup Framework) includes a structural metamodel for Terminology Markup Languages in general.
Terminology is the study of terms and their use. Terms are words and compound words that are used in specific contexts. Not to be confused with terms in colloquial usages, the shortened form of technical terms (or terms of art) which are defined within a discipline or speciality field. The discipline Terminology studies among other things how such terms of art come to be and their interrelationships within a culture.

Terminology therefore denotes a more formal discipline which systematically studies the labelling or designating of concepts particular to one or more subject fields or domains of human activity, through research and analysis of terms in context, for the purpose of documenting and promoting correct usage. This study can be limited to one language or can cover more than one language at the same time (multilingual terminology, bilingual terminology, and so forth) or may focus on studies of terms across fields.

Terminology is not connected to information retrieval in any way but focused on the meaning and conveyance of concepts. Terms (i.e. index terms) used in an information retrieval context are not the same as terms used in the context of terminology, as they are not always technical terms of art.
Terminology Management
Quality translation relies on the correct use of specialized terms. It improves reader understanding and reduces the time and costs associated with translation. Special terminology management systems store terms and their translations, so that terms can be translated consistently. Full-featured systems go beyond simple term lookup, however, to contain information about terms, such as part of speech, alternate terms and synonyms, product line information, and usage notes. They are generally integrated with translation memory systems and word processors to improve translator productivity.
Textual parsing
It is very important to recognize punctuation in order to distinguish for example the end of sentence from abbreviation. Thus, mark-up is a kind of pre-editing. Usually, the materials which have been processed through translators' aid programs contain mark-up, as the translation stage is embedded in a multilingual document production line. Other special text elements may be set off by mark-up. There are special elements which do not need to be translated, such as proper names and codes, while others may need to be converted to native format.
Term extraction tools
Tools for extracting text automatically from text to create a termbase. Tools include SDL MultiTerm Extract 2009.
Term base eXchange
XML standard for exchanging terminological data. Commonly abbreviated TBX.
Terminology management
Use of computer software to manage translation resources, create terminology databases for translation projects, and improve productivity and consistency.
Terminology management tool
Computer application that facilitates terminology management.
Terminology manager
Software application that facilitates the process of translation by interacting with a terminology database.
Terminology software
Data processing tool that allows one to create, edit and consult text or electronic dictionaries
Text expansion
Process that often occurs during translation in which the total number of characters in the target text exceeds that of the source text.
Text extraction
Process in which the text from a source file is placed into a word processing file for use by a linguist
Text style
Characteristics of terminology, style and sentence formation within a given text.
Abbreviation for translation memory eXchange. Translation Memory eXchange (TMX) is a standard that enables the interchange of translation memories between translation suppliers. TMX has been adopted by the translation community as the best way of importing and exporting translation memories. The current version is 1.4b - it allows for the recreation of the original source and target documents from the TMX data. An updated version, 2.0, is under development.
Traditional Chinese
Original Chinese ideographic character set used in Taiwan, Hong Kong, Macau and some Chinese communities who have not adopted the simplified characters used in the People's Republic of China.
Process by which new content is developed or adapted for a given target audience instead of merely translating existing material. It may include copywriting, image selection, font changes, and other transformations that tailor the message to the recipient.
Process of converting oral utterances into written form.
Degree to which a text can be rendered into another language.
Most common set of steps used for linguistic quality assurance in translation production processes. Commonly abbreviated TEP.
Process of rendering written communication from one language into another, or the output that results from this process. Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The word translation derives from the Latin translatio (which itself comes from trans- and fero, together meaning to carry across or to bring across).
Translation capacity
Average number of characters, words, lines, or pages that a professional translator can translate within a given time frame, such as a day, week, or month.
Translation kit (also Localization kit)
A set of files and instructions given to an LSP by a client. The purpose of a translation kit is to provide LSPs with expectations: the subject matter and target audience, files and format to be translated, delivery expectations, special considerations and instructions.
Translation management
The management of the translation workflow, often including the content assets also.
Translation management system (also TMS)
Program that manages translation and localization cycles, coordinates projects with source content management, and centralizes translation databases, glossaries, and additional information relevant to the translation process. Commonly abbreviated TMS.
Translation memory
Translated text segments that are stored in a database. A translation memory is a system which scans a source text and tries to match strings (a sentence or part thereof) against a database of paired source and target language strings with the aim of reusing previously translated materials. Alternatively, a translation memory, or TM, is a database that stores so-called segments, which can be sentences or sentence-like units (headings, titles or elements in a list), that have been previously translated. A translation-memory system stores the words, phrases and paragraphs that have already been translated and aid human translators. The translation memory stores the source text and its corresponding translation in language pairs called "translation units".
Some software programs that use translation memories are known as translation memory managers (TMM).
Translation memories are typically used in conjunction with a dedicated computer assisted translation (CAT) tool, word processing program, terminology management systems, multilingual dictionary, or even raw machine translation output.
A translation memory consists of text segments in a source language and their translations into one or more target languages. These segments can be blocks, paragraphs, sentences, or phrases. Individual words are handled by terminology bases and are not within the domain of TM.
Translation memory eXchange (also TMX)
Standard for converting translation memories from one format to another. Commonly abbreviated TMX.
Translation memory plus machine translation
A workflow and technology process in which terms not found in translation memory are automatically sent to the machine translation software for translation, with the results fed back into the translation memory. Commonly abbreviated TMT.
Translation memory system
Computer-aided translation tool that offers translation suggestions from translation memory.
Translation portal
Web-based service that enables translation agencies, freelance translators and customers to contact one another and exchange services.
Translation unit (also TU)
Segment of text treated as a single unit of meaning.
Process of converting words from a source text or audio file into a written text that facilitates pronunciation of the words.
Translation Memory, see Translation Memory.
SDL Trados is a leading Translation Memory Editor used in translation. Latest versions SDL Trados Studio 2009 and SDL Trados TM Server.
16-bit character set that is capable of encoding the characters of the world's major language scripts.
Unicode standard
Industry encoding standard that allows computers to represent and manipulate text in most of the world's writing systems.
Updating TM
A TM is updated with a new translation when it has been accepted by the translator. As always in updating a database, there is the question what to do with the previous contents of the database. A TM can be modified by changing or deleting entries in the TM. Some systems allow translators to save multiple translations of the same source segment.
UTF-16, UTF-32, UTF-8
UTF-16 - Abbreviation for 16-bit Unicode transformation format. UTF-32 - Abbreviation for 32-bit Unicode transformation format. UTF-8 -Abbreviation for 8-bit Unicode transformation format.
Universal Terminology eXchange (UTX) format is a standard specifically designed to be used for user dictionaries of machine translation, but it can be used for general, human-readable glossaries. The purpose of UTX is to accelerate dictionary sharing and reuse by its extremely simple and practical specification.
Technique in which a disembodied voice narrates a film, documentary, or other visual media.
Word count
Total number of words in a text, typically used to price translation projects.
Word delimiter
Character, such as a space or carriage return, that marks a distinction between words in a text.
Workflow management
Computer or web-based applications used to direct translation and localization work processes.
XML Localisation Interchange File Format. It is intended to provide a single interchange file format that can be understood by any localization provider. XLIFF is the preferred way of exchanging data in XML format in the translation industry
Abbreviation for eXtensible markup language. Metadata language used to describe other markup languages. Commonly abbreviated XML.
XML Text Memory (xml:tm)
xml:tm (XML-based Text Memory) is the vendor-neutral open XML standard for embedding text memory directly within an XML document using XML namespace syntax. xml:tm leverages the namespace syntax of XML to embed text memory information within the XML document itself.

At the core of xml:tm is the concept of "text memory". Text memory comprises two components:
  1. Author Memory. The XML document is segmented and a full history of all segments and revisions is maintained in the XML document itself.
  2. Translation Memory. When an xml:tm namespace document is ready for translation the namespace itself specifies the text that is to be translated. The tm namespace can be used to create an XLIFF-format document for translation.
xml:tm allows for much more focused and better defined translation memory matching than is possible using standard TM technology. In particular, it includes the following:
  • Exact Matching. Author memory provides exact details of any changes to a document. Where text units have not been changed for a previously translated document xml:tm provides the basis for declaring an ""Exact match"" with the previously translated target language document.
  • In-Document Leveraged Matching. xml:tm can also be used to find in-document leveraged matches
  • Database Leveraged Matching. When an xml:tm document is translated the translation process provides perfectly aligned source and target language text units. These can be used to create traditional translation memories.
  • In-Document Fuzzy Matching. The text units contained in the leveraged memory database can also be used to provide fuzzy matches of similar previously translated text from within the same document.
  • Fuzzy Matching. The text units contained in the leveraged memory database can also be used to provide fuzzy matches of similar previously translated text.
  • Non-Translatable Text. Text units that are made up solely of numeric, alphanumeric, punctuation or measurement items can be identified during authoring and flagged as non translatable, thus reducing the translation count metrics.