8. Informationswissenschaft als Brückenwissenschaft

Approaches to sense disambiguation with respect to automatic indexing and machine translation

5. The semantic relations approach: towards a semantic interlingua

Heinz-Dirk Luckhardt

The vision of an interlingua in which all natural language utterances in any language could be represented has for a long time been a dream of computational linguistics (cf. e.g. Melcuk 1960). It may be roughly described by a triangle model:

Natural language utterances are parsed and represented in expressions of the interlingua which is the same for all languages. A generation module generates target language utterances from these interlingua expressions. The advantage is obvious: for every language involved there would only have to be a parser and a generator irrespective of the language to be translated to or from. For a discussion of the interlingua idea in comparison with the transfer theory see Luckhardt 1987, 9f. et passim
This model has never been realized. Instead, many MT systems operate according to the transfer model.

The parser analyses source language text and represents it in an appropriate way, still with close relation to the source language, especially with the source language words in it. In the transfer phase this representation is transformed into a target language representation with target language words in it, i.e. at least the source language words are translated into target language ones. In the generation phase from this representation target language text is generated.

In parts an interlingua has been realized in the SUSY MT system, i.e. for the translation of prepositions. Certain kinds of prepositions (see below) are transformed into a language-indepent form which is not touched in transfer. This approach seems plausible also for multilingual automatic indexing, as it promises an extension to information retrieval, insofar as it would allow a cross-lingual specification of database searches.

top of page

The Ambiguity of Prepositions

Prepositions are highly ambiguous with respect to syntax and semantics, cf. <13> – <20>.

<13> He sent us a message from the USA . => 
 
     Er schickte uns eine Nachricht aus den USA . 
 
<14> He sent us a message from the moon . => 
     Er schickt uns eine Nachricht vom Mond . 
 
<15> He sent us to London . => 
     Er schickte uns nach London . 
 
<16> He sent us to the river. => 
 
     Er schickte uns an den Fluß . 
 
<17> He sent us to the house . => 
     Er schickte uns zu dem Haus . 
 
<18> It depends on your evaluation . => 
     Das hängt von ihrer Einschätzung ab . 
 
<19> ...the article demanded by the noun ... => 
 
     ...der vom Nomen geforderte Artikel ... 
 
<20> I substitute X for Y . => 
     Ich ersetze Y durch X.

top of page

Following the principles of valency grammar, we can distinguish 3 cases:

non-valency-bound modifying constituents: <13> – <14>

Valency-bound constituents where the meaning of the preposition is determined by the noun: <15> – <17>

Valency-bound constituents where the meaning of the preposition is determined by the verb: <18> und <20>

The meaning of the preposition is determined by the noun of the prepositional phrase (<13> to <17>) or by the predicate (<18> and <20>).

top of page

Disambiguation by properties of the noun

The prepositional phrase in

He sent us a message from the USA .     => 
Er schickte uns eine Nachricht aus den USA.

can be described by means of the 2 structures below. In the following, I shall use the one on the right, as it is more comfortable to operate with a small number of nodes. The difference between an NP and a PP is marked by different values of the NPTYP property (NPTYP = NORMAL and NPTYP = PREP).

Above, I have claimed that the meaning of the prepositions in <13> and <14> depends on the noun: <13> and <14> differ only with respect to the noun of the PP. This has nothing to do with the semantic properties of the noun, for both designate a place. For SUSY, classifications were set up for every language where all nouns behaving in the same manner with respect to certain prepositions were put into the same class. Thus, the occurence of a preposition with a noun of a specific class designates a specific meaning that can be represented in an interlingua, i.e. in a notation valid for all languages. „USA“ and „MOND“ belong to different classes, as the same semantic relation (a point in space) is expressed by two different prepositions („aus“ and „von“).

top of page

Tentative list of Interlingua expressions:

Type	interlingua expression		meaning	example
local	LOC	QUO	direction	nach England
local	LOC	UBI	point in space	in England
local	LOC	UNDE	source in space	aus England
local	LOC	QUA	path in space	durch England
temporal	TEMP	QUO	limit in time	bis Ostern
temporal	TEMP	UBI	point in time	an Ostern
temporal	TEMP	UNDE	source in time	seit Ostern
temporal	TEMP	QUA	time period	den Winter über
event	LOCTEMP	QUO	direction	ins Konzert
event	LOCTEMP	UBI	point in event	bei der Taufe
domain	DOMAIN	UBI	abstract place	in der Physik
domain	DOMAIN	UNDE	abstract source	aus der Physik
circumstance	ACCO		accompaniment	mit dem Freund
circumstance	INSTRUMENT		instrument	mit dem Hammer
…..

top of page

Tentative classification:

Class No.	Type of relation	specification	example
01	local	in-LOCUBI, auf-LOCQUO	Haus, Stadion, Schreibtisch
02	local	in-LOCUBI, in-LOCQUO	Schweiz, Elsaß, Stadt, USA
03	local	in-LOCUBI, nach-LOCQUO	Saarbrücken, Amerika
04	local	auf-LOCUBI, nach-LOC QUO	Sylt, Feuerland
05	local	auf-LOCUBI, auf-LOCQUO	Markt, Mond, Insel
20	temporal	an-TEMPUBI, bis-TEMPQUO	Ostern, Montag
22	temporal	bei-TEMPUBI	Tag, Sonnenaufgang
26	temporal	an-TEMPUBI, bis zu-TEMPQUO	Abend
30	event	bei-LOCTEMP UBI	Taufe, Abitur
31	event	auf-LOCTEMP UBI	Kirmes, Kundgebung
40	circumstance	mit-CIRCUMSTANCE	Unterstützung, Hilfe
41	circumstance	auf-CIRCUMSTANCE	Rat, Empfehlung
42	circumstance	in-CIRCUMSTANCE	Übereinstimmung, Not

top of page

In parsing, the classification may be used to represent a PP by a semantic relation:

nach London (class 03)    >                              > London 
in die Schweiz (class 02)    >   "direction":   LOC QUO  > Schweiz 
zu dem Haus (class 01)    >                              > Haus

When I say ‚parsing‘ I mean parsing for MT and for information retrieval. In MT, the interlingua expression is used for generating a target language preposition:

LOC QUO London => nach London 
LOC QUO Schweiz => in die Schweiz 
LOC QUO Haus => zu (dem) Haus

In information retrieval, this approach may be used for parsing texts for indexing and for parsing user input. During indexing, the PPs in the input texts are represented by interlingua expressions which are stored with the descriptors in the data base. During retrieval, the user input (in any language) is transformed into interlingua expressions and then compared to the database.

Example:

Text sentence: ‚The export from the USA to Switzerland is decreasing.‘

Descriptors:           LOCQUO Switzerland, export, LOCUNDE USA 
German search request: Export, in die Schweiz, aus den USA 
transformed into:      Export, LOCQUO Schweiz, LOCUNDE USA 
translated into:       export, LOCQUO Switzerland, LOCUNDE USA 
 
                       (This matches the descriptors assigned to the text!)

For every language, a lexicon would have to be built up classifying all simple nouns. And a set of rules has to be written transforming the prepositions into interlingua expressions depending on the classes of the nouns. The lexicon would have to contain only simple nouns, no compounds (except for lexicalized ones where the meaning of the compound is not a sum of the meanings of its components), as the class of the core noun will also be the class of the compound. So, the task seems manageable. In SUSY, this work has been done for a couple of thousands of nouns and it worked out quite well.

top of page

Disambiguation depending on the predicate

<18> It depends on your evaluation . => 
     Das hängt von ihrer Einschätzung ab . 
 
<19> ...the article demanded by the noun ... => 
     ...der vom Nomen geforderte Artikel ... 
 
<20> I substitute X for Y . => 
 
     Ich ersetze Y durch X.

In <18> to <20> the PPs are valency-bound, i.e. they are strongly linked to the predicate. In the EUROTRA projects funded by the European Community this relationship was represented by a semantic relation (semantic role, thematic role; cf. Arnold et al. 1985, Freigang; et al. 1981, Blatt et al. 1984) like in <26> for „X depends on Y“.

This approach had a number of advantages and drawbacks that I shall say nothing about. Above this it can only be used for MT, but not for automatic indexing, as I shall show very briefly in the following.

He substituted milk for water.

There is no use in representing ‚for water‘ by an interlingua expression, as it cannot be understood without reference to the verb. Such representations only make use in complete predicate/argument representations, but not if the representations of the constituents are separated from each other like in the above case where they were used as descriptors in a database. A more advanced system might operate with larger units like

substitute milk
substitute for water
substitute milk for water

applying something like Fillmore’s semantic roles to make it usable in a multilingual environment. This must be left open here.

top of page

Example for an interlingua expression in SUSY

glossary of acronyms used

TEXTWORTFORM         WKL   LEMMANAME             STW 
------------------------------------------------------------- 
 
Die                  ARTB  D- (ARTB)             FWK 
Anpassung            SUB   ANPASSUNG             SUB 
der                  ARTB  D- (ARTB)             FWK 
Mitarbeiterzahl      SUB   /MITARBEITER/ZAHL     SUB 
an                   PRP   AN (AKK)              FWK 
den                  ARTB  D- (ARTB)             FWK 
betriebsnotwendigen  ADJ   /BETRIEB*S/NOTWENDIG  ADJ 
Bedarf               SUB   BEDARF                SUB 
macht                FIV   MACHEN                VRB 
auch                 ADV   AUCH                  FWK 
in                   PRP   TEMP QUA              FWK 
den                  ARTB  D- (ARTB)             FWK 
kommenden            ADJ   KOMMEN                VRB 
Jahren               SUB   JAHR                  SUB 
einen                ARTU  EIN (ARTU)            FWK 
zusaetzlichen        ADJ   ZUSAETZLICH           ADJ 
Stellenabbau         SUB   /STELLE*N/ABBAU       SUB 
erforderlich         ADV   ERFORDERLICH          ADJ 
*                          *

‚In den kommenden Jahren‘ is represented by ‚TEMP QUA kommen Jahr‘, meaning an unspecific point or period in time during the coming years. The PP ‚an den betriebsnotwendigen Bedarf‘ is not represented in the same way, as its meaning depends on ‚Anpassung‘ (= Anpassung an … => adaptation to) and is not understandable on its own. Above this, it is quite safe to proceed this way, as with ‚Anpassung‘ the preposition ‚an‘ always translates as ‚to‘. This is a clear case for lexicalization.

4. The sublanguage approach | 6. The semantic (text) knowledge approach

Universität des Saarlandes - Fachrichtung Informationswissenschaft

8. Informationswissenschaft als Brückenwissenschaft

Approaches to sense disambiguation with respect to automatic indexing and machine translation

5. The semantic relations approach: towards a semantic interlingua

Heinz-Dirk Luckhardt

The Ambiguity of Prepositions

Disambiguation by properties of the noun

Tentative list of Interlingua expressions:

Tentative classification:

Disambiguation depending on the predicate

Example for an interlingua expression in SUSY