Controlled Languages: An Introduction                                                           Back|Home|Next

Technology Overview

2.0 Components and Processes
    2.1 Elements of a Controlled Language and Controlled Language Tools
       2.1.1 Terminology Checkers
       2.1.2 Grammar Checkers
        2.1.3 Style Checkers
        2.1.4 Generation
    2.2 Challenges Facing Controlled Languages
    2.3 Related Technologies

2.1 Elements of a Controlled Language and Controlled Language Tools
The elements of a controlled language are the same as those of any other language: words, rules, punctuation.  A controlled language prescribes these in a formalized, limited way.  Controlled languages are often composed simply of a controlled terminology and a grammar.  Controlled language tools used for document creation and/or revision are often discussed in the same context as controlled languages, since they the primary applications related to controlled languages.  Tools are used to analyze text, performing pattern recognition and string analysis tasks to determine if the text conforms to the terminological and syntactic rules of the controlled language.  These tools may examine a multitude of language characteristics, including basic syntax and morphology.  More advanced tools perform stylistic analysis and revision.  Tool sets may also include a generation component which provides suggestions for approved alternate expressions.
 

2.1.1 Terminology Checkers Terminology checking tools check a text against a terminology base, which is developed through corpus analysis, development and definition.  Dictionaries against which text is checked may be general and/or domain specific, and may be specific to one organization, or even proprietary in nature.  Terminological checks may be made for acronyms (such as IEEE,or ACL), phrasal words, inflections, conjugations (or some form of stemming, such as -ed, -ing, -s), forms (continous vs. continuous), and cases.  This analysis may also include some semantic analysis to disambiguate terms which may be confounded, such as that found in the Boeing Simplified English Checker [Wojcik00].

2.1.2 Grammar Checkers Grammar checking consists of text parsing and pattern matching against a set of pre-defined grammar rules. One example is the Boeing Simplified English Checker [Wojcik96], which examines sentence length, paragraph length, noun cluster length (groups of nouns like input output channel and household paint remover), missing articles, and “unapproved” verbal auxiliaries and participles.  Subject-verb agreement is another item which can be checked.  A range of different approaches to parsing is employed by different systems.

2.1.3 Style Checkers Style checking checks and/or revises document types, formats, and layouts.  Some of the stylistic conventions which can be checked are date and currency formats, table formats, and spelling variants.

2.1.4 Generation Generation refers to the composition of an alternative expression which can substitute for an expression which does not conform to the controlled language's specifications.  A sentence or phrase already exists (the author wrote it), but the controlled language's checkers have flagged the sentence as unacceptable for one of a variety of reasons.  Thus, in most cases, what is required is a modification of the original source expression.  In some cases, however, an entirely new sentence or expression must be generated.  An example of this is outline in the section on Generation.



Figure 1: General Architecture of the Boeing Enhanced Controlled Language Checker [Wojcik00]

2.2 Challenges Facing Controlled Languages
The issues and challenges for controlled languages are related to the validation, testing, and refinement of controlled language components and tools.  Real world user validation of the tools is highly important.  One very critical question is the extent to which these tools are achieving their objectives, and how to measure them.  There is a lack of methods, tools and metrics with which to objectively measure controlled language tools' performance, and these need to be developed. Development of theories which will help enhance functionality in the realm of language sensitivity, context sensitivity, domain sensitivity and sense are future directions for controlled languages [Dale95, Holmback96, Fouvry96].

2.3 Related Technologies
Controlled languages use a wide range of methods and technologies used in other language processing tasks. Terminology development and definition are used in machine translation, dialogue systems, voice recognition, extraction and routing systems, translation memory systems, and computer-assisted translation systems.  Parsing, syntactic analysis, disambiguation, and generation techniques are also used in machine translation, extraction/routing systems, voice recognition, and dialogue systems.

Back|Home|Next

Richard Altwarg
Macquarie University Graduate Program in Speech and Language Processing
SLP803 An Introduction to Language Technology

This site last updated November 20, 2000.
Comments and corrections welcome: raltwarg@earthlink.com