Multiword expression processing: A survey

Mathieu Constant, Gülşen Eryiğit, Johanna Monti, Lonneke Van Der Plas, Carlos Ramisch, Michael Rosner, Amalia Todirascu

Research output: Contribution to journalArticlepeer-review

169 Citations (Scopus)

Abstract

Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word boundaries that are both idiosyncratic and pervasive across different languages. The structure of linguistic processing that depends on the clear distinction between words and phrases has to be re-thought to accommodate MWEs. The issue of MWE handling is crucial for NLP applications, where it raises a number of challenges. The emergence of solutions in the absence of guiding principles motivates this survey, whose aim is not only to provide a focused review of MWE processing, but also to clarify the nature of interactions between MWE processing and downstream applications. We propose a conceptual framework within which challenges and research contributions can be positioned. It offers a shared understanding of what is meant by “MWE processing,” distinguishing the subtasks of MWE discovery and identification. It also elucidates the interactions between MWE processing and two use cases: Parsing and machine translation. Many of the approaches in the literature can be differentiated according to howMWE processing is timed with respect to underlying use cases. We discuss how such orchestration choices affect the scope of MWE-aware systems. For each of the two MWE processing subtasks and for each of the two use cases, we conclude on open issues and research perspectives.

Original languageEnglish
Pages (from-to)837-892
Number of pages56
JournalComputational Linguistics
Volume43
Issue number4
DOIs
Publication statusPublished - 1 Dec 2017

Bibliographical note

Publisher Copyright:
© 2017 Association for Computational Linguistics.

Funding

This work has been supported by the PARSEME project (Cost Action IC1207). Special thanks to Federico Sangati, who not only provided valuable suggestions, references, and feedback, but also contributed important initial written material to the foundations of this survey. We would like to thank all members of the PARSEME working group 3 who contributed suggestions and references: Jan Genci, Tunga Güngör, Tomas Krilavicius, Justina Mandravickaite, Michael Oakes, Yannick Parmentier, Carla Parra Escartín, Gerold Schneider, Inguna Skadin, a, Dan Tufis, Éric Villemonte de la Clergerie, Veronika Vincze, and Eric Wehrli. This work has been partially funded by the French National Research Agency (ANR) through the PARSEME-FR project (ANR-14-CERA-0001) and by a TUBITAK 1001 grant (no: 112E276). © 2017 Association for Computational Linguistics.

FundersFunder number
Agence Nationale de la RechercheANR-14-CERA-0001
Türkiye Bilimsel ve Teknolojik Araştirma Kurumu112E276, 1001

    Fingerprint

    Dive into the research topics of 'Multiword expression processing: A survey'. Together they form a unique fingerprint.

    Cite this