Controlled Languages: An Introduction                                                                  Back|Home|Next

The Role of Linguistic Knowledge in Controlled Languages

7.0 Grammars and Grammar Tools for Controlled Language
    7.1 Grammar Checkers and Grammar Rules for Controlled Languages
    7.2 Grammar Checking Approaches
        7.2.1 Boeing Simplified English Checker
        7.2.2 Caterpillar Functional English-KANT Checker
        7.2.3 Multilint Pattern Matching Approach
    7.3 Grammars vs. Pattern Matching

7.1 Grammar Checkers and Grammar Rules for Controlled Languages
One major component of a controlled language tool is a grammar checker.  A controlled language will define the types of grammar acceptable in the controlled language.  Some common grammar rules used to enhance text call for simplified sentences, short sentences, and minimal embedding.  Here are the writing rules of the PACE controlled language:

        1. Keep sentences short.
        2. Omit redundant words.
        3. Order the parts of the sentence logically.
        4. Do not change constructions in mid-sentence.
        5. Take care with the logic of ‘and’ and ‘or’.
        6. Avoid elliptical constructions.
        7. Do not omit conjunctions or relatives.
        8. Adhere to the PACE dictionary.
        9. Avoid strings of nouns.
        10. Do not use ‘ing’ unless the word appears thus in the PACE dictionary.

More specific definitions of these rules must be written into the grammar.  For example, the number of words allowed in a sentence, the length of noun compounds, and the size and number of embedded phrases must be enumerated for the above rules to function.  The rules are then applied based on a checker’s analysis of the text.

Another important point is that most of the PACE rules apply to style, and do not address correctness.  The most important purpose of a grammar checker is to analyze text syntax for correctness, identifying areas which are incorrect.

Grammar checkers can be based on heuristics which can do relatively simple pattern matching.  This type of approach is relatively robust and easy to implement, most effective for the ‘simpler’ rules, such as sentence length.  It is also relatively more likely to generate incorrect critiques and to miss constructions which do not conform to the rules of the controlled language.

A full computational grammar provides greater reliability than a rule set, but may get derailed by unforeseen input.  The Boeing Simplified English Checker uses a grammar formalism based on Generalized Phrase Structure Grammar, and structural ambiguity is handled using statistical methods.  Other checkers use other kinds of formal grammars [Clemencin96].

Figure 5: Representation of Grammar Rules for Conversions in Cogentex French Controlled Language Checker:  Support Verb Handling and Passive-toActive Rules [Nasr98]

7.2 Grammar Checking Approaches
One example of the difference between a pattern matching system and a full grammar is that a syntax based pattern matching system will accept words and terms used in an unapproved sense when used as an approved part of speech.  For example, if the word 'follow' is confined to the meaning 'to come after', the Boeing Simplified English Checker (BSEC) will accept the phrases 'follow the path home' and 'follow the instructions', even though the semantic sense of 'follow'  in both of these examples is different from 'to come after'.  Because a shallow analysis of these phrases finds 'follow' in the accepted syntactic position, the same as that for 'a nap follows lunch', the system will not flag them as unapproved senses, as it should.  A deeper analysis is required [Wojcik96].

7.2.1 Boeing Simplified English Checker
The Enhanced Grammar, Style, and Content Checker (EGSC), the improved version of the tool described above, uses several strategies to perform deeper analysis:
-- word sense declarations for words in the system
-- semantic hierarchies and categorization of word senses
-- word sentence 'thesaurus' indicating language standard with which each word sense is associated
-- domain specific semantic selection restrictions and noun compounding information
-- domain specific word sense frequencies

In addition, it focuses on parsing full sentences (rather than sub-sentential chunks),  a strategy also employed by other checkers using deeper grammatical analysis [Clemencin96]. Using these features, the enhanced lexicon works together with the grammar to perform a deeper analysis of sentence level texts, and achieves greater accuracy and effectiveness, as shown here:

'You must follow all the instructions for special parts.'
Verb errors: follow                  Use: obey               [Wojcik96]
Many controlled language tool developers have remarked that contextual analysis of paragraphs, sections, and even entire documents may provide information which will aid in further disambiguation of difficult terms and phrases [Huijsen98, Wojcik96, Clemencin96].

7.2.2 Caterpillar Functional English-KANT Checker
The well known Caterpillar Functional English relies on KANT for analysis.  "KANT uses explicit source language lexicons, grammars and domain semantics to produce an inter lingua representation (IR) for each sentence.  Each IR is a semantic frame containing features and semantic roles, which may be filled by other IR frames" [Nyberg96].  KANT clearly employs a combination of strategies, including a full grammar.  The use of an inter lingua also indicates the employment of a machine translation approach to controlled language checking.

7.2.3 Multilint Pattern Matching Approach
Schmidt-Wigger [Schmidt-Wigger 98] describes a comparison between systems using a full grammar and the MultiLint system, which analyzes input text based on a "flat pattern matching approach".  The pattern matching approach was believed to be  "more practical" because it avoids the issues of conflicting grammar rules, inadequate parsing and rule coverage, and structural variations among sub languages.  MutliLint's flat pattern approach was compared to the full grammar approaches of  SECC and BSEC:
 
Recall  Precision
MULTILINTGrammarComponent 57% 81%
MULTILINTStyleComponent 65% 92%
SECC([Adriaens94] 87% 93%
BSEC ([Wojcik90]) 89% 79%
recall: number of retrieved errors/number of existing errors
precision: number of retrieved errors/number all retrieved cases
[Schmidt-Wigger 98]

7.3 Grammars vs. Pattern Matching
It must be pointed out that the Multiling findings cited above were quoted to demonstrate that the Multiling system's focus is on precision rather than recall, and Schmidt-Wigger clearly states that the systems listed above are not directly comparable, for a number of reasons.  However, the data do indicate, and the author confirms, that "precision of a grammar checker can never reach that of a style checker…(because)…style checking has to cope with the sub language of the corpus, while a rule set for grammar checking has to cope with the sub language of the corpus AND with the erroneous structures it wants to check."  In other words, the lack of a grammar reduces the system's effectiveness in identifying, analyzing and correcting grammatical errors.

Back|Home|Next

Richard Altwarg
Macquarie University Graduate Program in Speech and Language Processing
SLP803 An Introduction to Language Technology

This site last updated November 20, 2000.
Comments and corrections welcome: raltwarg@earthlink.com