Controlled Languages: An Introduction                                                                  Back|Home|Next

The Role of Linguistic Knowledge in Controlled Languages

8.0 Language Generation and Controlled Languages
8.0 Language Generation
8.1 Authoring Tool Interaction Interface
8.3 Transfer Approach
8.4 Translation Memory Approach
8.5 Linguistic Realization

8.0 Language Generation
Language generation refers to the composition of an alternative expression which can substitute for an expression which does not conform to the controlled language's specifications.

In most cases, generation in a controlled language system is somewhat unique because it is not truly 'generation', but 're-generation'.  A sentence or phrase already exists (the author wrote it), but the controlled language's checkers have flagged this sentence as unacceptable for one of a variety of reasons.

8.1 Authoring Tool Interaction Interface
A primary issue in generation is to understand what level of prompting an author desires from the system.  Should it only advise the author of the non-conformance?  Should it identify the part of the sentence which does not conform?  Suggest an alternative for only the non-conforming part(s)? Or provide an entirely new expression?  An example of the choices given to an author by the Eurocastle controlled language tool is shown below:

Figure 6: Correction Dialog in Eurocastle Controlled Language Checker [Clemencin96]

"One of the areas for improvement is the building of (fully or partially) automatic rewriting systems to convert text into controlled language." [Bustamante00]  According to Bustamante [Bustamante00], the majority of controlled language generation systems currently fall into the category of transfer type systems, similar to those used in machine translation applications.  Examples are the SECC, LANT, and the MLAP Spanish and Greek grammar checkers.
 
 
 
 
 
 
 

8.3 Transfer Approach
If the reason a sentence is being 're-generated' is disallowed word use, e.g.; use of a word in a disallowed sense, lexical lookup can easily identify an alternative and perform a simple word substitution. For example:

Original Sentence:
The metal part is punched by the machine.  (disallow 'punched' as manufacturing term)
Generated Sentence:
The metal part is stamped by the machine. (simple substitution of verb, in same tense)
This is a very simple substitution, easily performed on the basis of a transfer from the original sentence to the generated sentence.

Bernth [Bernth98] categorizes the types of changes addressed by the EasyEnglish system as "Problems of Ambiguity" and "Grammatical Problems".  In either of these types of problems, the system may refer both to a lexicon and a grammar to identify and then regenerate substitute expressions.  For example;

"A note is forwarded to the user requesting the correct information"=>
"A note that requests the correct information is forwarded to the user." OR
"A note is forwarded to the user that requests the correct information"
In this example, the problem is one of ambiguity.  It is solved by identifying the phrases which are ambiguous, and remedied by providing a semantically and syntactically restructured appropriate substitute.  In this case, both lexical and syntactic information have been used, and a 'transfer' from the 'source' expression to the 'target' expression is performed.  A noteworthy point with respect to usability issues is that this system makes the suggestions, and asks the user to choose from among the two suggested options.

8.4 Translation Memory Approach
One approach to generation is to learn from the document itself, pairing sentences in the existing document in a manner like that used for translation memories [Clemincin96].  In this approach, the non-conforming components of an expression are compared to those in a previously modified expression, and modified similarly.  The translation memory is the database which stores the original and modified versions of the previously processed expressions.

This type of approach has been used in the post-editing of machine translated texts produced by the European Community's Translation Services [Allen00].  In this approach, the changes made by human editors in the post-editing of machine translated texts are saved to a database.  New texts then refer to this database, which functions as a knowledge base.  Changes made by human editors to previous documents are automatically made to the new texts. In this way, the various shortcomings of machine translation programs in their domain and sense definitions are overcome with a tool specifically designed to convert a 'generic' machine translated text to a  domain-specific translated text.

As an example, my non-native English speaking friend might write, 'Take salad with a clamp'.  A human editor would change this to 'Take salad with the tongs'.  The fact that the human editor made this change is held in memory, and the word 'tongs' will be selected (or suggested) as the match for the source word in future, whenever my friend writes documents related to clamping.  This can work well in a single domain of food services, but if my friend begins writing about auto repair, the memory must be re-written to reflect a preference for 'clamp' over tongs.

An important note in this context is that "Fuzzy matching has allowed the TM (translation memory) industry to deal with the problem of data sparseness by providing some types of sub-sentence matches even when whole sentence matches cannot be found"  [Allen00].   In other words, one of the challenges of creating a successful translation memory type re-generation system is sufficient data.  Similarly to any other computational training, a system requires adequate training data.

8.5 Linguistic Realization
Linguistic generation is the process of generating a wholly new grammatically correct sentence from the information available to it.  This is much different from the simpler transfer, memory, or lexical lookup methods described above.

A realizer will be responsible for generating a correct verb.  It may start from either a conceptual representation, or a specific verb, but must then choose the correct verb form, sentence type (imperative, declaratory, question), and tense.  It must then perform similarly for all the other main nouns, adjectives, and adverbs in the sentence.  It must create agreement between all these parts of speech, and will also be required to 'enforce' all the various standard rules of English grammar.

A linguistic realizer can be seen as an 'inverse-parser', in which the realizer's job is to operate in the opposite direction from that of a parser. In this approach, a bi-directional grammar is possible, allowing for symmetrical conversion either from language to concept or  there are critical differences between the inputs, outputs, and processes which take place in these two opposite directions [Reiter97].

Generation like this has been undertaken in highly structured interactive systems like weather reporting and rail ticket purchasing, but most controlled language domains are far too large and complex for this to work given current knowledge and technical constraints.

Back|Home|Next

Richard Altwarg
Macquarie University Graduate Program in Speech and Language Processing
SLP803 An Introduction to Language Technology

This site last updated November 20, 2000.
Comments and corrections welcome: raltwarg@earthlink.com