8. Informationswissenschaft als Brückenwissenschaft
Approaches to sense disambiguation with respect to automatic indexing and machine translation
5. The semantic relations approach: towards a semantic interlingua
Heinz-Dirk Luckhardt
The vision of an interlingua in which all natural language utterances
in any language could be represented has for a long time been a dream of computational
linguistics (cf. e.g. Melcuk 1960). It
may be roughly described by a triangle model:
Natural language utterances are parsed and represented in expressions
of the interlingua which is the same for all languages. A generation
module generates target language utterances from these interlingua
expressions. The advantage is obvious: for every language involved
there would only have to be a parser and a generator irrespective
of the language to be translated to or from. For a discussion of the
interlingua idea in comparison with the transfer theory see
Luckhardt 1987, 9f. et passim
This model has never been realized. Instead, many MT systems operate
according to the transfer model.
The parser analyses source language text and represents it in
an appropriate way, still with close relation to the source language,
especially with the source language words in it. In the transfer
phase this representation is transformed into a target language
representation with target language words in it, i.e. at least
the source language words are translated into target language
ones. In the generation phase from this representation target
language text is generated.
In parts an interlingua has been realized in the SUSY MT system,
i.e. for the translation of prepositions. Certain kinds of prepositions
(see below) are transformed into a language-indepent form which
is not touched in transfer. This approach seems plausible also
for multilingual automatic indexing, as it promises an extension
to information retrieval, insofar as it would allow a cross-lingual
specification of database searches.
The Ambiguity of Prepositions
Prepositions are highly ambiguous with respect to syntax and semantics, cf. <13> – <20>.
<13> He sent us a message from the USA . => Er schickte uns eine Nachricht aus den USA . <14> He sent us a message from the moon . => Er schickt uns eine Nachricht vom Mond . <15> He sent us to London . => Er schickte uns nach London . <16> He sent us to the river. => Er schickte uns an den Fluß . <17> He sent us to the house . => Er schickte uns zu dem Haus . <18> It depends on your evaluation . => Das hängt von ihrer Einschätzung ab . <19> ...the article demanded by the noun ... => ...der vom Nomen geforderte Artikel ... <20> I substitute X for Y . => Ich ersetze Y durch X.
Following the principles of valency grammar, we can distinguish 3 cases:
- non-valency-bound modifying constituents: <13> – <14>
- Valency-bound constituents where the meaning of the preposition is determined by the noun: <15> – <17>
- Valency-bound constituents where the meaning of the preposition is determined by the verb: <18> und <20>
The meaning of the preposition is determined by the noun of the prepositional phrase (<13> to <17>) or by the predicate (<18> and <20>).
Disambiguation by properties of the noun
The prepositional phrase in
He sent us a message from the USA . => Er schickte uns eine Nachricht aus den USA.
can be described by means of the 2 structures
below. In the following, I shall use the one on the right,
as it is more comfortable to operate with a small number of nodes.
The difference between an NP and a PP is marked by different values
of the NPTYP property (NPTYP = NORMAL and NPTYP = PREP).
Above, I have claimed that the meaning of
the prepositions in <13> and <14> depends on the noun:
<13> and <14> differ only with respect to the noun
of the PP. This has nothing to do with the semantic properties
of the noun, for both designate a place. For SUSY, classifications
were set up for every language where all nouns behaving in the
same manner with respect to certain prepositions were put into
the same class. Thus, the occurence of a preposition with a noun
of a specific class designates a specific meaning that can be
represented in an interlingua, i.e. in a notation valid
for all languages. „USA“ and „MOND“ belong
to different classes, as the same semantic relation (a point in
space) is expressed by two different prepositions („aus“
and „von“).
Tentative list of Interlingua expressions:
Type | interlingua expression | meaning | example | |
local | LOC | QUO | direction | nach England |
local | LOC | UBI | point in space | in England |
local | LOC | UNDE | source in space | aus England |
local | LOC | QUA | path in space | durch England |
temporal | TEMP | QUO | limit in time | bis Ostern |
temporal | TEMP | UBI | point in time | an Ostern |
temporal | TEMP | UNDE | source in time | seit Ostern |
temporal | TEMP | QUA | time period | den Winter über |
event | LOCTEMP | QUO | direction | ins Konzert |
event | LOCTEMP | UBI | point in event | bei der Taufe |
domain | DOMAIN | UBI | abstract place | in der Physik |
domain | DOMAIN | UNDE | abstract source | aus der Physik |
circumstance | ACCO | accompaniment | mit dem Freund | |
circumstance | INSTRUMENT | instrument | mit dem Hammer | |
….. |
Tentative classification:
Class No. |
Type of relation | specification | example |
01 | local | in-LOCUBI, auf-LOCQUO | Haus, Stadion, Schreibtisch |
02 | local | in-LOCUBI, in-LOCQUO | Schweiz, Elsaß, Stadt, USA |
03 | local | in-LOCUBI, nach-LOCQUO | Saarbrücken, Amerika |
04 | local | auf-LOCUBI, nach-LOC QUO | Sylt, Feuerland |
05 | local | auf-LOCUBI, auf-LOCQUO | Markt, Mond, Insel |
20 | temporal | an-TEMPUBI, bis-TEMPQUO | Ostern, Montag |
22 | temporal | bei-TEMPUBI | Tag, Sonnenaufgang |
26 | temporal | an-TEMPUBI, bis zu-TEMPQUO | Abend |
30 | event | bei-LOCTEMP UBI | Taufe, Abitur |
31 | event | auf-LOCTEMP UBI | Kirmes, Kundgebung |
40 | circumstance | mit-CIRCUMSTANCE | Unterstützung, Hilfe |
41 | circumstance | auf-CIRCUMSTANCE | Rat, Empfehlung |
42 | circumstance | in-CIRCUMSTANCE | Übereinstimmung, Not |
In parsing, the classification may be used to represent a PP by a semantic relation:
nach London (class 03) > > London in die Schweiz (class 02) > "direction": LOC QUO > Schweiz zu dem Haus (class 01) > > Haus
When I say ‚parsing‘ I mean parsing for MT and for information retrieval. In MT, the interlingua expression is used for generating a target language preposition:
LOC QUO London => nach London LOC QUO Schweiz => in die Schweiz LOC QUO Haus => zu (dem) Haus
In information retrieval, this approach may be used for parsing texts for indexing and for parsing user input. During indexing, the PPs in the input texts are represented by interlingua expressions which are stored with the descriptors in the data base. During retrieval, the user input (in any language) is transformed into interlingua expressions and then compared to the database.
Example:
Text sentence: ‚The export from the USA to Switzerland is decreasing.‘
Descriptors: LOCQUO Switzerland, export, LOCUNDE USA German search request: Export, in die Schweiz, aus den USA transformed into: Export, LOCQUO Schweiz, LOCUNDE USA translated into: export, LOCQUO Switzerland, LOCUNDE USA (This matches the descriptors assigned to the text!)
For every language, a lexicon would have to be built up classifying all simple nouns. And a set of rules has to be written transforming the prepositions into interlingua expressions depending on the classes of the nouns. The lexicon would have to contain only simple nouns, no compounds (except for lexicalized ones where the meaning of the compound is not a sum of the meanings of its components), as the class of the core noun will also be the class of the compound. So, the task seems manageable. In SUSY, this work has been done for a couple of thousands of nouns and it worked out quite well.
Disambiguation depending on the predicate
<18> It depends on your evaluation . => Das hängt von ihrer Einschätzung ab . <19> ...the article demanded by the noun ... => ...der vom Nomen geforderte Artikel ... <20> I substitute X for Y . => Ich ersetze Y durch X.
In <18> to <20> the PPs are valency-bound,
i.e. they are strongly linked to the predicate. In the EUROTRA
projects funded by the European Community this relationship was
represented by a semantic relation (semantic role, thematic role; cf.
Arnold et al. 1985, Freigang; et al. 1981, Blatt et al. 1984)
like in <26> for „X depends on Y“.
This approach had a number of advantages and
drawbacks that I shall say nothing about. Above this it can only
be used for MT, but not for automatic indexing, as I shall show
very briefly in the following.
There is no use in representing ‚for water‘ by an interlingua expression, as it cannot be understood without reference to the verb. Such representations only make use in complete predicate/argument representations, but not if the representations of the constituents are separated from each other like in the above case where they were used as descriptors in a database. A more advanced system might operate with larger units like
substitute milksubstitute for water
substitute milk for water
applying something like Fillmore’s semantic roles to make it usable in a multilingual environment. This must be left open here.
Example for an interlingua expression in SUSY
TEXTWORTFORM WKL LEMMANAME STW ------------------------------------------------------------- Die ARTB D- (ARTB) FWK Anpassung SUB ANPASSUNG SUB der ARTB D- (ARTB) FWK Mitarbeiterzahl SUB /MITARBEITER/ZAHL SUB an PRP AN (AKK) FWK den ARTB D- (ARTB) FWK betriebsnotwendigen ADJ /BETRIEB*S/NOTWENDIG ADJ Bedarf SUB BEDARF SUB macht FIV MACHEN VRB auch ADV AUCH FWK in PRP TEMP QUA FWK den ARTB D- (ARTB) FWK kommenden ADJ KOMMEN VRB Jahren SUB JAHR SUB einen ARTU EIN (ARTU) FWK zusaetzlichen ADJ ZUSAETZLICH ADJ Stellenabbau SUB /STELLE*N/ABBAU SUB erforderlich ADV ERFORDERLICH ADJ * *
‚In den kommenden Jahren‘ is represented by ‚TEMP QUA kommen Jahr‘, meaning an unspecific point or period in time during the coming years. The PP ‚an den betriebsnotwendigen Bedarf‘ is not represented in the same way, as its meaning depends on ‚Anpassung‘ (= Anpassung an … => adaptation to) and is not understandable on its own. Above this, it is quite safe to proceed this way, as with ‚Anpassung‘ the preposition ‚an‘ always translates as ‚to‘. This is a clear case for lexicalization.