An intelligent use of stemmer and morphology analysis for Arabic information retrieval

Alnaied, Ali, Elbendak, Mosa and Bulbul, Abdullah (2020) An intelligent use of stemmer and morphology analysis for Arabic information retrieval. Egyptian Informatics Journal, 21 (4). pp. 209-217. ISSN 1110-8665

[img]
Preview
Text
1-s2.0-S1110866519303469-main.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives 4.0.

Download (1MB) | Preview
Official URL: https://doi.org/10.1016/j.eij.2020.02.004

Abstract

Arabic Information Retrieval has gained significant attention due to an increasing usage of Arabic text on the web and social media networks. This paper discusses a new approach for Arabic stem, called Arabic Morphology Information Retrieval (AMIR), to generate/extract stems by applying a set of rules regarding the relationship among Arabic letters to find the root/stem of the respective words used as indexing terms for the text search in Arabic retrieval systems. To demonstrate the usefulness of the proposed algorithm, we highlight the benefits of the proposed rules for different Arabic information retrieval systems. Finally, we have evaluated AMIR system by comparing its performance with LUCENE, FARASA, and no-stemmer counterpart system in terms of mean average precisions. The results obtained demonstrate that AMIR has achieved a mean average precision of 0.34% while LUCENE, FARASA and no stemmer giving 0.27%, 0.28% and 0.21, respectively. This demonstrates that AMIR is able to improve Arabic stemmer and increases retrieval as well as being strong against any type of stem.

Item Type: Article
Additional Information: Funding Information: We would like to thank the anonymous reviews for their valuable comments which have helped us to improve this paper. This work is partially supported by the National Natural Science Foundation of China under Grant No. 60775028, the Major Projects of Technology Bureau of Dalian No.2007A14GXD42, and IT Industry Development of Jilin Province.
Uncontrolled Keywords: Arabic morphological analysis, Arabic stemmer, Information retrieval systems, Natural language processing
Subjects: G400 Computer Science
G500 Information Systems
G900 Others in Mathematical and Computing Sciences
Department: Faculties > Engineering and Environment > Computer and Information Sciences
Depositing User: Rachel Branson
Date Deposited: 25 May 2022 13:36
Last Modified: 25 May 2022 13:45
URI: http://nrl.northumbria.ac.uk/id/eprint/49193

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics