Diese Website ist seit dem Ende des Studiengangs Informationswissenschaft
im Juni 2014 archiviert und wird nicht mehr aktualisiert.
Bei technischen Fragen: Sascha Beck - s AT saschabeck PUNKT ch
Drucken

8. Informationswissenschaft als Brückenwissenschaft

Approaches to sense disambiguation with respect to automatic indexing and machine translation

5. The semantic relations approach: towards a semantic interlingua

Heinz-Dirk Luckhardt

The vision of an interlingua in which all natural language utterances in any language could be represented has for a long time been a dream of computational linguistics (cf. e.g. Melcuk 1960). It may be roughly described by a triangle model:
inter
Natural language utterances are parsed and represented in expressions of the interlingua which is the same for all languages. A generation module generates target language utterances from these interlingua expressions. The advantage is obvious: for every language involved there would only have to be a parser and a generator irrespective of the language to be translated to or from. For a discussion of the interlingua idea in comparison with the transfer theory see Luckhardt 1987, 9f. et passim
This model has never been realized. Instead, many MT systems operate according to the transfer model.
transfer
The parser analyses source language text and represents it in an appropriate way, still with close relation to the source language, especially with the source language words in it. In the transfer phase this representation is transformed into a target language representation with target language words in it, i.e. at least the source language words are translated into target language ones. In the generation phase from this representation target language text is generated.

In parts an interlingua has been realized in the SUSY MT system, i.e. for the translation of prepositions. Certain kinds of prepositions (see below) are transformed into a language-indepent form which is not touched in transfer. This approach seems plausible also for multilingual automatic indexing, as it promises an extension to information retrieval, insofar as it would allow a cross-lingual specification of database searches.

top of page

The Ambiguity of Prepositions

Prepositions are highly ambiguous with respect to syntax and semantics, cf. <13> – <20>.

<13> He sent us a message from the USA . => 
 
     Er schickte uns eine Nachricht aus den USA . 
 
<14> He sent us a message from the moon . => 
     Er schickt uns eine Nachricht vom Mond . 
 
<15> He sent us to London . => 
     Er schickte uns nach London . 
 
<16> He sent us to the river. => 
 
     Er schickte uns an den Fluß . 
 
<17> He sent us to the house . => 
     Er schickte uns zu dem Haus . 
 
<18> It depends on your evaluation . => 
     Das hängt von ihrer Einschätzung ab . 
 
<19> ...the article demanded by the noun ... => 
 
     ...der vom Nomen geforderte Artikel ... 
 
<20> I substitute X for Y . => 
     Ich ersetze Y durch X. 

top of page


Following the principles of valency grammar, we can distinguish 3 cases:

  • non-valency-bound modifying constituents: <13> – <14>
  • Valency-bound constituents where the meaning of the preposition is determined by the noun: <15> – <17>
  • Valency-bound constituents where the meaning of the preposition is determined by the verb: <18> und <20>

The meaning of the preposition is determined by the noun of the prepositional phrase (<13> to <17>) or by the predicate (<18> and <20>).

top of page

Disambiguation by properties of the noun

The prepositional phrase in

He sent us a message from the USA .     => 
Er schickte uns eine Nachricht aus den USA. 

can be described by means of the 2 structures below. In the following, I shall use the one on the right, as it is more comfortable to operate with a small number of nodes. The difference between an NP and a PP is marked by different values of the NPTYP property (NPTYP = NORMAL and NPTYP = PREP).

pp
Above, I have claimed that the meaning of the prepositions in <13> and <14> depends on the noun: <13> and <14> differ only with respect to the noun of the PP. This has nothing to do with the semantic properties of the noun, for both designate a place. For SUSY, classifications were set up for every language where all nouns behaving in the same manner with respect to certain prepositions were put into the same class. Thus, the occurence of a preposition with a noun of a specific class designates a specific meaning that can be represented in an interlingua, i.e. in a notation valid for all languages. „USA“ and „MOND“ belong to different classes, as the same semantic relation (a point in space) is expressed by two different prepositions („aus“ and „von“).

top of page

Tentative list of Interlingua expressions:
Type interlingua expression meaning example
local LOC QUO direction nach England
local LOC UBI point in space in England
local LOC UNDE source in space aus England
local LOC QUA path in space durch England
temporal TEMP QUO limit in time bis Ostern
temporal TEMP UBI point in time an Ostern
temporal TEMP UNDE source in time seit Ostern
temporal TEMP QUA time period den Winter über
event LOCTEMP QUO direction ins Konzert
event LOCTEMP UBI point in event bei der Taufe
domain DOMAIN UBI abstract place in der Physik
domain DOMAIN UNDE abstract source aus der Physik
circumstance ACCO accompaniment mit dem Freund
circumstance INSTRUMENT instrument mit dem Hammer
…..

top of page

Tentative classification:
Class
No.
Type of relation specification example
01 local in-LOCUBI, auf-LOCQUO Haus, Stadion, Schreibtisch
02 local in-LOCUBI, in-LOCQUO Schweiz, Elsaß, Stadt, USA
03 local in-LOCUBI, nach-LOCQUO Saarbrücken, Amerika
04 local auf-LOCUBI, nach-LOC QUO Sylt, Feuerland
05 local auf-LOCUBI, auf-LOCQUO Markt, Mond, Insel
20 temporal an-TEMPUBI, bis-TEMPQUO Ostern, Montag
22 temporal bei-TEMPUBI Tag, Sonnenaufgang
26 temporal an-TEMPUBI, bis zu-TEMPQUO Abend
30 event bei-LOCTEMP UBI Taufe, Abitur
31 event auf-LOCTEMP UBI Kirmes, Kundgebung
40 circumstance mit-CIRCUMSTANCE Unterstützung, Hilfe
41 circumstance auf-CIRCUMSTANCE Rat, Empfehlung
42 circumstance in-CIRCUMSTANCE Übereinstimmung, Not

top of page


In parsing, the classification may be used to represent a PP by a semantic relation:

nach London (class 03)    >                              > London 
in die Schweiz (class 02)    >   "direction":   LOC QUO  > Schweiz 
zu dem Haus (class 01)    >                              > Haus 
 

When I say ‚parsing‘ I mean parsing for MT and for information retrieval. In MT, the interlingua expression is used for generating a target language preposition:

LOC QUO London => nach London 
LOC QUO Schweiz => in die Schweiz 
LOC QUO Haus => zu (dem) Haus 

In information retrieval, this approach may be used for parsing texts for indexing and for parsing user input. During indexing, the PPs in the input texts are represented by interlingua expressions which are stored with the descriptors in the data base. During retrieval, the user input (in any language) is transformed into interlingua expressions and then compared to the database.

Example:

Text sentence: ‚The export from the USA to Switzerland is decreasing.‘

Descriptors:           LOCQUO Switzerland, export, LOCUNDE USA 
German search request: Export, in die Schweiz, aus den USA 
transformed into:      Export, LOCQUO Schweiz, LOCUNDE USA 
translated into:       export, LOCQUO Switzerland, LOCUNDE USA 
 
                       (This matches the descriptors assigned to the text!) 

For every language, a lexicon would have to be built up classifying all simple nouns. And a set of rules has to be written transforming the prepositions into interlingua expressions depending on the classes of the nouns. The lexicon would have to contain only simple nouns, no compounds (except for lexicalized ones where the meaning of the compound is not a sum of the meanings of its components), as the class of the core noun will also be the class of the compound. So, the task seems manageable. In SUSY, this work has been done for a couple of thousands of nouns and it worked out quite well.

top of page

Disambiguation depending on the predicate

<18> It depends on your evaluation . => 
     Das hängt von ihrer Einschätzung ab . 
 
<19> ...the article demanded by the noun ... => 
     ...der vom Nomen geforderte Artikel ... 
 
<20> I substitute X for Y . => 
 
     Ich ersetze Y durch X. 

In <18> to <20> the PPs are valency-bound, i.e. they are strongly linked to the predicate. In the EUROTRA projects funded by the European Community this relationship was represented by a semantic relation (semantic role, thematic role; cf. Arnold et al. 1985, Freigang; et al. 1981, Blatt et al. 1984) like in <26> for „X depends on Y“.

depend
This approach had a number of advantages and drawbacks that I shall say nothing about. Above this it can only be used for MT, but not for automatic indexing, as I shall show very briefly in the following.

He substituted milk for water.

There is no use in representing ‚for water‘ by an interlingua expression, as it cannot be understood without reference to the verb. Such representations only make use in complete predicate/argument representations, but not if the representations of the constituents are separated from each other like in the above case where they were used as descriptors in a database. A more advanced system might operate with larger units like

substitute milk
substitute for water
substitute milk for water

applying something like Fillmore’s semantic roles to make it usable in a multilingual environment. This must be left open here.

top of page

Example for an interlingua expression in SUSY

glossary of acronyms used

TEXTWORTFORM         WKL   LEMMANAME             STW 
------------------------------------------------------------- 
 
Die                  ARTB  D- (ARTB)             FWK 
Anpassung            SUB   ANPASSUNG             SUB 
der                  ARTB  D- (ARTB)             FWK 
Mitarbeiterzahl      SUB   /MITARBEITER/ZAHL     SUB 
an                   PRP   AN (AKK)              FWK 
den                  ARTB  D- (ARTB)             FWK 
betriebsnotwendigen  ADJ   /BETRIEB*S/NOTWENDIG  ADJ 
Bedarf               SUB   BEDARF                SUB 
macht                FIV   MACHEN                VRB 
auch                 ADV   AUCH                  FWK 
in                   PRP   TEMP QUA              FWK 
den                  ARTB  D- (ARTB)             FWK 
kommenden            ADJ   KOMMEN                VRB 
Jahren               SUB   JAHR                  SUB 
einen                ARTU  EIN (ARTU)            FWK 
zusaetzlichen        ADJ   ZUSAETZLICH           ADJ 
Stellenabbau         SUB   /STELLE*N/ABBAU       SUB 
erforderlich         ADV   ERFORDERLICH          ADJ 
*                          * 
 

‚In den kommenden Jahren‘ is represented by ‚TEMP QUA kommen Jahr‘, meaning an unspecific point or period in time during the coming years. The PP ‚an den betriebsnotwendigen Bedarf‘ is not represented in the same way, as its meaning depends on ‚Anpassung‘ (= Anpassung an … => adaptation to) and is not understandable on its own. Above this, it is quite safe to proceed this way, as with ‚Anpassung‘ the preposition ‚an‘ always translates as ‚to‘. This is a clear case for lexicalization.