Word sense disambiguation and information retrieval mark sanderson department of computing science, university of glasgow, glasgow g12 8qq united kingdom email. Automatic as opposed to manual and information as opposed to data or fact. Note that in his book van rijsbergen betrays his preference for distance. Pdf word sense ambiguity is recognized as having a detrimental effect on the.
While wsd, in general, has a number of important applications in various fields of artificial intelligence information retrieval, text processing, machine. Previous works tries to do word sense disambiguation, the process of assign a sense to a word inside a specific context, creating algorithms under a supervised or unsupervised. For this reason, we propose in this paper a semisupervised method for word sense disambiguation wsd for the scienti c literature domain. This article begins with discussing the origins of the problem in the earliest machine translation systems. An application of word sense disambiguation to information. On the importance of word sense disambiguation for information retrieval. Foundations of statistical natural language processing. New evaluation methods for word sense disambiguation. Retrieval, word sense disambiguation, wordnet, owa operator. Word sense disambiguation is a task of finding the correct sense of the words and automatically assigning its correct sense to the words which are polysemous in a particu. Word sense disambiguation roberto navigli and paola velardi abstractword sense disambiguation wsd is traditionally considered an aihard problem.
Introduction information retrieval 1 is a process of retrieving the relevant documents from the document database when the user enters his query in the search engine. This chapter describes the main approaches to the problem, methods for evaluating performance, and potential applications. Pdf word sense disambiguation for information retrieval. A word sense disambiguation algorithm for information. Word sense disambiguation for crosslanguage information. Many verbal languages will have many ambiguous words. Facing current challenges david martinez iraolak eneko agirre bengoaren zuzendaritzapean egindako tesiaren txostena, euskal herriko unibertsitatean informatikan doktore titulua eskuratzeko aurkeztua donostia, 2004ko urria. Challenges and practical approaches with word sense. Word sense disambiguation and information retrieval springerlink. Information retrieval database with wordnet word sense disambiguation. Most approaches to word sense disambiguation or to. Early attempts to solve the wsd problem suffered from a lack of coverage.
Word sense disambiguation and information retrieval in proceedings of the 17th international acm sigir, pp 49 57, dublin, ie, 1994. Introduction in all the major languages around the world, there are a lot of words which denote meanings in different contexts. Word sense disambiguation 15 is a technique to find the exact sense of an ambiguous word in a particular context. Word sense disambiguation wsd is a subfield within computational linguistics, which is also referred to as natural language processing nlp, where computer systems are designed to identify the correct meaning or sense of a word in a given context. Ontologybased word sense disambiguation for scienti c.
In in proceedings of ranlp05, borovets, pages 525531, 2005. Word sense disambiguation wsd is the process of identifying the meanings of words in context. This process is experimental and the keywords may be updated as the learning algorithm improves. In proceedings of the 5th international workshop on semantic evaluation, pages 387391, uppsala, sweden. The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications. The task we address is the disambiguation of scienti c terms and acronyms used in scienti c abstracts. Work on word sense disambiguation continued throughout the next two decades in the framework of aibased natural language understanding research, as well as in the fields of content analysis, stylistic and literary analysis, and information retrieval.
Retrieving with good sense in information retrieval, vol. This is the first book to cover the entire topic of word sense disambiguation wsd including. Ambiguity is a common phenomenon in text, especially in the biomedical domain. It has often been thought that word sense ambiguity is a cause of poor performance in information retrieval ir systems.
This research work deals with natural language processing nlp and extraction of essential information in an explicit form. This is particularly due to the senseval evaluation exercises which created standard data sets for the task. The belief is that if ambiguous words can be correctly disambiguated, ir. Introduction languages have several kinds of ambiguity where many words can be comprehended in various aspects based on certain contexts 1. Word sense disambiguation and information retrieval. Introduction eneko agirre, philip edmonds download the pdf of chapter 1 contents. Overall, the author concludes that keyword in context kwic collocations still offer a commonsense solution to accurate word disambiguation. It has been observed that indexing using disambiguated mean. The author and publisher of this book have used their best efforts in preparing this book. We have developed a word sense disambiguation algorithm, following cheng and wilensky 1997, to disambiguate among wordnet synsets.
Wordsense disambiguation wsd is the process of identifying the meanings of words in context. The word sense disambiguation process consists of assigning to each given word in a context, one definition or meaning predefine sense or not, that is distinguishable. Word sense disambiguation in information retrieval article pdf available in intelligent information management 102. Word sense disambiguation in information retrieval revisited. Before choosing the word sense disambiguation algorithm to be used in the indices, i ran a simple benchmark of several disambiguation algorithms using the perl benchmark module. These efforts include the development, research, and testing of the theories. The main approaches to tackle the problem were dictionarybased, connectionist, and statistical strategies. The difficulty of this problem stems from the subtlety of word sense differences and the need for some level of understanding. This is the companion website for the following book. Information retrieval database with wordnet word sense. The second chapter describes some earlier approaches to word sense disambiguation and.
Information retrieval natural language processing ambiguous word sense score word sense disambiguation these keywords were added by machine and not by the authors. Pdf word sense disambiguation and information retrieval. For instance, it is frequently the case that a gene, a protein encoded by the. Our approach is based on the use of both contextual information from. The natural language processing has a set of phases that evolves from lexical text analysis to the pragmatic one in which the authors intentions are shown. Unfortunately the word information can be very misleading. Pdf word sense disambiguation in information retrieval revisited. The belief is that if ambiguous words can be correctly disambiguated, ir performance will increase. Pdf word sense disambiguation in information retrieval. While interpreting the specific meaning of acronyms and abbreviations within a sentence is often easy for a human reader, this process is nontrivial for a machine 10,11.
Word sense disambiguation in information retrieval. This algorithm is to be used in a crosslanguage information retrieval system, cindor, which indexes queries and documents in a languageneutral concept representation based on wordnet synsets. The ambiguity problem appears in all of these tasks. Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space. Word sense ambiguity is recognized as having a detrimental effect on the precision of information retrieval systems in general and web search. This manual investigation involved the study of thousands of these query. Word sense disambiguation book bibliography of wsd. Word sense disambiguation, yarowsky algorithm, information retrieval, natural language processing, quran 1.
As for further research, the authors results may be pertinent to bilingual information retrieval systems, with queries constructed in the users native language. This chapter discusses the basic concepts of word sense disambiguation wsd and the approaches to solving this problem. Word sense disambiguation and information retrieval citeseerx. One of the major applications of word sense disambiguation wsd is information retrieval ir. However, it can be used for various other natural language processing nlp applications like machine translation, information retrieval, sentiment analysis, text entailment, etc. Word sense disambiguation and information retrieval white rose. Graphbased word sense disambiguation in telugu language. Word sense disambiguation 2 wsd is the solution to the problem. In natural language processing, word sense disambiguation wsd is an open challenge which improves the performance of the applications such as machine translation and information retrieval system. For example, the word back in back home and my back has.
The solution to this problem impacts other computerrelated writing, such as discourse, improving relevance of search engines, anaphora resolution. Word sense disambiguation is the process of removing and resolving the ambiguity between words. Word sense disambiguation in biomedical applications. Natural languages processing, word sense disambiguation 1. Pdf it has often been thought that word sense ambiguity is a cause of poor. Proceedings of the lrec 2002 workshop on creating and using semantics for information retrieval and filtering, third international conference on language resources and evaluation, las palmas, canary islands, spain, june. A breakthrough in this field would have a significant impact on many relevant webbased applications, such as web information retrieval, improved access to web services, information extraction, etc. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations. Dr systems may work as combine harvesters, which bring back useful material from the vast fields of raw material. The most common among the information management strategies is document retrieval dr and information filtering. Information retrieval is a wide, often looselydefined term but in these pages i shall be concerned only with automatic information retrieval systems. It is mainly developed for the purpose of word sense disambiguation in indian languages.
732 239 1603 315 687 1349 95 87 1534 605 1354 1404 1054 1488 392 1108 1278 964 1438 327 1357 226 291 967 1126 499 954 286 353 796 1264 1325 1345 73 1121 1357 667 464 695 822 969 80 1249 215 135 554 651 567 91 1002