The Role of Linguistic Knowledge in Controlled Languages
7.0 Grammars and Grammar Tools for Controlled
Language
7.1
Grammar Checkers and Grammar Rules for Controlled Languages
7.2 Grammar
Checking Approaches
7.2.1
Boeing Simplified English Checker
7.2.2
Caterpillar Functional English-KANT Checker
7.2.3
Multilint Pattern Matching Approach
7.3
Grammars vs. Pattern Matching
7.1
Grammar Checkers and Grammar Rules for Controlled Languages
One major component of a controlled language tool is a grammar checker.
A controlled language will define the types of grammar acceptable in the
controlled language. Some common grammar rules used to enhance text
call for simplified sentences, short sentences, and minimal embedding.
Here are the writing rules of the PACE controlled language:
1. Keep sentences short.
2. Omit redundant words.
3. Order the parts of the
sentence logically.
4. Do not change constructions
in mid-sentence.
5. Take care with the logic
of ‘and’ and ‘or’.
6. Avoid elliptical constructions.
7. Do not omit conjunctions
or relatives.
8. Adhere to the PACE dictionary.
9. Avoid strings of nouns.
10. Do not use ‘ing’ unless
the word appears thus in the PACE dictionary.
More specific definitions of these rules must be written into the grammar. For example, the number of words allowed in a sentence, the length of noun compounds, and the size and number of embedded phrases must be enumerated for the above rules to function. The rules are then applied based on a checker’s analysis of the text.
Another important point is that most of the PACE rules apply to style, and do not address correctness. The most important purpose of a grammar checker is to analyze text syntax for correctness, identifying areas which are incorrect.
Grammar checkers can be based on heuristics which can do relatively simple pattern matching. This type of approach is relatively robust and easy to implement, most effective for the ‘simpler’ rules, such as sentence length. It is also relatively more likely to generate incorrect critiques and to miss constructions which do not conform to the rules of the controlled language.
A full computational grammar provides greater reliability than a rule set, but may get derailed by unforeseen input. The Boeing Simplified English Checker uses a grammar formalism based on Generalized Phrase Structure Grammar, and structural ambiguity is handled using statistical methods. Other checkers use other kinds of formal grammars [Clemencin96].
Figure
5: Representation of Grammar Rules for Conversions in Cogentex French Controlled
Language Checker: Support Verb Handling and Passive-toActive Rules
[Nasr98]
7.2 Grammar Checking
Approaches
One example of the difference between a pattern matching system and
a full grammar is that a syntax based pattern matching system will accept
words and terms used in an unapproved sense when used as an approved part
of speech. For example, if the word 'follow' is confined to the meaning
'to come after', the Boeing Simplified English Checker (BSEC) will accept
the phrases 'follow the path home' and 'follow the instructions', even
though the semantic sense of 'follow' in both of these examples is
different from 'to come after'. Because a shallow analysis of these
phrases finds 'follow' in the accepted syntactic position, the same as
that for 'a nap follows lunch', the system will not flag them as unapproved
senses, as it should. A deeper analysis is required [Wojcik96].
7.2.1 Boeing
Simplified English Checker
The Enhanced Grammar, Style, and Content Checker (EGSC), the improved
version of the tool described above, uses several strategies to perform
deeper analysis:
-- word sense declarations for words in the system
-- semantic hierarchies and categorization of word senses
-- word sentence 'thesaurus' indicating language standard with which
each word sense is associated
-- domain specific semantic selection restrictions and noun compounding
information
-- domain specific word sense frequencies
In addition, it focuses on parsing full sentences (rather than sub-sentential chunks), a strategy also employed by other checkers using deeper grammatical analysis [Clemencin96]. Using these features, the enhanced lexicon works together with the grammar to perform a deeper analysis of sentence level texts, and achieves greater accuracy and effectiveness, as shown here:
Many controlled language tool developers have remarked that contextual analysis of paragraphs, sections, and even entire documents may provide information which will aid in further disambiguation of difficult terms and phrases [Huijsen98, Wojcik96, Clemencin96].'You must follow all the instructions for special parts.'
Verb errors: follow Use: obey [Wojcik96]
7.2.2 Caterpillar Functional English-KANT
Checker
The well known Caterpillar Functional English relies on KANT for analysis.
"KANT uses explicit source language lexicons, grammars and domain semantics
to produce an inter lingua representation (IR) for each sentence.
Each IR is a semantic frame containing features and semantic roles, which
may be filled by other IR frames" [Nyberg96].
KANT clearly employs a combination of strategies, including a full grammar.
The use of an inter lingua also indicates the employment of a machine translation
approach to controlled language checking.
7.2.3 Multilint
Pattern Matching Approach
Schmidt-Wigger [Schmidt-Wigger 98] describes
a comparison between systems using a full grammar and the MultiLint system,
which analyzes input text based on a "flat pattern matching approach".
The pattern matching approach was believed to be "more practical"
because it avoids the issues of conflicting grammar rules, inadequate parsing
and rule coverage, and structural variations among sub languages.
MutliLint's flat pattern approach was compared to the full grammar approaches
of SECC and BSEC:
| Recall | Precision | |
| MULTILINTGrammarComponent | 57% | 81% |
| MULTILINTStyleComponent | 65% | 92% |
| SECC([Adriaens94]) | 87% | 93% |
| BSEC ([Wojcik90]) | 89% | 79% |
7.3 Grammars vs.
Pattern Matching
It must be pointed out that the Multiling findings cited above were
quoted to demonstrate that the Multiling system's focus is on precision
rather than recall, and Schmidt-Wigger clearly states that the systems
listed above are not directly comparable, for a number of reasons.
However, the data do indicate, and the author confirms, that "precision
of a grammar checker can never reach that of a style checker…(because)…style
checking has to cope with the sub language of the corpus, while a rule
set for grammar checking has to cope with the sub language of the corpus
AND with the erroneous structures it wants to check." In other words,
the lack of a grammar reduces the system's effectiveness in identifying,
analyzing and correcting grammatical errors.
Richard Altwarg
Macquarie University Graduate Program in Speech and Language Processing
SLP803 An Introduction to Language Technology
This site last updated November 20, 2000.
Comments and corrections welcome: raltwarg@earthlink.com