The Role of Linguistic Knowledge in Controlled Languages
8.0 Language Generation and Controlled Languages
8.0
Language Generation
8.1 Authoring Tool
Interaction Interface
8.3 Transfer Approach
8.4 Translation Memory Approach
8.5 Linguistic Realization
8.0 Language
Generation
Language generation refers to the composition of an alternative
expression which can substitute for an expression which does not conform
to the controlled language's specifications.
In most cases, generation in a controlled language system is somewhat unique because it is not truly 'generation', but 're-generation'. A sentence or phrase already exists (the author wrote it), but the controlled language's checkers have flagged this sentence as unacceptable for one of a variety of reasons.
8.1 Authoring
Tool Interaction Interface
A primary issue in generation is to understand what level of prompting
an author desires from the system. Should it only advise the author
of the non-conformance? Should it identify the part of the sentence
which does not conform? Suggest an alternative for only the non-conforming
part(s)? Or provide an entirely new expression? An example of the
choices given to an author by the Eurocastle controlled language tool is
shown below:
Figure
6: Correction Dialog in Eurocastle Controlled Language Checker [Clemencin96]
"One of the areas for improvement is the building of (fully or partially)
automatic rewriting systems to convert text into controlled language."
[Bustamante00]
According to Bustamante
[Bustamante00], the majority
of controlled language generation systems currently fall into the category
of transfer type systems, similar to those used in machine translation
applications. Examples are the SECC, LANT, and the MLAP Spanish and
Greek grammar checkers.
8.3 Transfer Approach
If the reason a sentence is being 're-generated' is disallowed word
use, e.g.; use of a word in a disallowed sense, lexical lookup can easily
identify an alternative and perform a simple word substitution. For example:
Original Sentence:This is a very simple substitution, easily performed on the basis of a transfer from the original sentence to the generated sentence.
The metal part is punched by the machine. (disallow 'punched' as manufacturing term)
Generated Sentence:
The metal part is stamped by the machine. (simple substitution of verb, in same tense)
Bernth [Bernth98] categorizes the types of changes addressed by the EasyEnglish system as "Problems of Ambiguity" and "Grammatical Problems". In either of these types of problems, the system may refer both to a lexicon and a grammar to identify and then regenerate substitute expressions. For example;
"A note is forwarded to the user requesting the correct information"=>In this example, the problem is one of ambiguity. It is solved by identifying the phrases which are ambiguous, and remedied by providing a semantically and syntactically restructured appropriate substitute. In this case, both lexical and syntactic information have been used, and a 'transfer' from the 'source' expression to the 'target' expression is performed. A noteworthy point with respect to usability issues is that this system makes the suggestions, and asks the user to choose from among the two suggested options.
"A note that requests the correct information is forwarded to the user." OR
"A note is forwarded to the user that requests the correct information"
This type of approach has been used in the post-editing of machine translated texts produced by the European Community's Translation Services [Allen00]. In this approach, the changes made by human editors in the post-editing of machine translated texts are saved to a database. New texts then refer to this database, which functions as a knowledge base. Changes made by human editors to previous documents are automatically made to the new texts. In this way, the various shortcomings of machine translation programs in their domain and sense definitions are overcome with a tool specifically designed to convert a 'generic' machine translated text to a domain-specific translated text.
As an example, my non-native English speaking friend might write, 'Take salad with a clamp'. A human editor would change this to 'Take salad with the tongs'. The fact that the human editor made this change is held in memory, and the word 'tongs' will be selected (or suggested) as the match for the source word in future, whenever my friend writes documents related to clamping. This can work well in a single domain of food services, but if my friend begins writing about auto repair, the memory must be re-written to reflect a preference for 'clamp' over tongs.
An important note in this context is that "Fuzzy matching has allowed the TM (translation memory) industry to deal with the problem of data sparseness by providing some types of sub-sentence matches even when whole sentence matches cannot be found" [Allen00]. In other words, one of the challenges of creating a successful translation memory type re-generation system is sufficient data. Similarly to any other computational training, a system requires adequate training data.
8.5 Linguistic Realization
Linguistic generation is the process of generating a wholly new grammatically
correct sentence from the information available to it. This is much
different from the simpler transfer, memory, or lexical lookup methods
described above.
A realizer will be responsible for generating a correct verb. It may start from either a conceptual representation, or a specific verb, but must then choose the correct verb form, sentence type (imperative, declaratory, question), and tense. It must then perform similarly for all the other main nouns, adjectives, and adverbs in the sentence. It must create agreement between all these parts of speech, and will also be required to 'enforce' all the various standard rules of English grammar.
A linguistic realizer can be seen as an 'inverse-parser', in which the realizer's job is to operate in the opposite direction from that of a parser. In this approach, a bi-directional grammar is possible, allowing for symmetrical conversion either from language to concept or there are critical differences between the inputs, outputs, and processes which take place in these two opposite directions [Reiter97].
Generation like this has been undertaken in highly structured interactive systems like weather reporting and rail ticket purchasing, but most controlled language domains are far too large and complex for this to work given current knowledge and technical constraints.
Richard Altwarg
Macquarie University Graduate Program in Speech and Language Processing
SLP803 An Introduction to Language Technology
This site last updated November 20, 2000.
Comments and corrections welcome: raltwarg@earthlink.com