Technology Overview
2.0 Components and Processes
2.1
Elements of a Controlled Language and Controlled Language Tools
2.1.1 Terminology Checkers
2.1.2 Grammar Checkers
2.1.3
Style Checkers
2.1.4
Generation
2.2
Challenges Facing Controlled Languages
2.3 Related
Technologies
2.1 Elements
of a Controlled Language and Controlled Language Tools
The elements of a controlled language are the same as those of any
other language: words, rules, punctuation. A controlled language
prescribes these in a formalized, limited way. Controlled languages
are often composed simply of a controlled terminology and a grammar.
Controlled language tools used for document creation and/or revision are
often discussed in the same context as controlled languages, since they
the primary applications related to controlled languages. Tools are
used to analyze text, performing pattern recognition and string analysis
tasks to determine if the text conforms to the terminological and syntactic
rules of the controlled language. These tools may examine a multitude
of language characteristics, including basic syntax and morphology.
More advanced tools perform stylistic analysis and revision. Tool
sets may also include a generation component which provides suggestions
for approved alternate expressions.
2.1.1 Terminology Checkers Terminology checking tools check a text against a terminology base, which is developed through corpus analysis, development and definition. Dictionaries against which text is checked may be general and/or domain specific, and may be specific to one organization, or even proprietary in nature. Terminological checks may be made for acronyms (such as IEEE,or ACL), phrasal words, inflections, conjugations (or some form of stemming, such as -ed, -ing, -s), forms (continous vs. continuous), and cases. This analysis may also include some semantic analysis to disambiguate terms which may be confounded, such as that found in the Boeing Simplified English Checker [Wojcik00].2.1.2 Grammar Checkers Grammar checking consists of text parsing and pattern matching against a set of pre-defined grammar rules. One example is the Boeing Simplified English Checker [Wojcik96], which examines sentence length, paragraph length, noun cluster length (groups of nouns like input output channel and household paint remover), missing articles, and “unapproved” verbal auxiliaries and participles. Subject-verb agreement is another item which can be checked. A range of different approaches to parsing is employed by different systems.
2.1.3 Style Checkers Style checking checks and/or revises document types, formats, and layouts. Some of the stylistic conventions which can be checked are date and currency formats, table formats, and spelling variants.
2.1.4 Generation Generation refers to the composition of an alternative expression which can substitute for an expression which does not conform to the controlled language's specifications. A sentence or phrase already exists (the author wrote it), but the controlled language's checkers have flagged the sentence as unacceptable for one of a variety of reasons. Thus, in most cases, what is required is a modification of the original source expression. In some cases, however, an entirely new sentence or expression must be generated. An example of this is outline in the section on Generation.
Figure 1: General Architecture of the Boeing Enhanced Controlled
Language Checker [Wojcik00]
2.2 Challenges
Facing Controlled Languages
The issues and challenges for controlled languages are related to the
validation, testing, and refinement of controlled language components and
tools. Real world user validation of the tools is highly important.
One very critical question is the extent to which these tools are achieving
their objectives, and how to measure them. There is a lack of methods,
tools and metrics with which to objectively measure controlled language
tools' performance, and these need to be developed. Development of theories
which will help enhance functionality in the realm of language sensitivity,
context sensitivity, domain sensitivity and sense are future directions
for controlled languages [Dale95, Holmback96, Fouvry96].
2.3 Related Technologies
Controlled languages use a wide range of methods and technologies used
in other language processing tasks. Terminology development and definition
are used in machine translation, dialogue systems, voice recognition, extraction
and routing systems, translation memory systems, and computer-assisted
translation systems. Parsing, syntactic analysis, disambiguation,
and generation techniques are also used in machine translation, extraction/routing
systems, voice recognition, and dialogue systems.
Richard Altwarg
Macquarie University Graduate Program in Speech and Language Processing
SLP803 An Introduction to Language Technology
This site last updated November 20, 2000.
Comments and corrections welcome: raltwarg@earthlink.com