Oʻzbekcha
ROUGE METRIKALARI ASOSIDA MATN QISQARTIRISH ALGORITMLARINING SAMARADORLIGINI BAHOLASH
Jurnal
Sun'iy intellektni pedagogik ta'limga tadbiq etishning ustivor yo'nalishlari
Nashr
Sun'iy intellektni pedagogik ta'limga tadbiq etishning ustivor yo'nalishlari
Annotatsiya
Ushbu maqolada turli matn qisqartirish (summarization) algoritmlarining samaradorligi tahlil qilinadi. Tadqiqotda an’anaviy (MFMMR, Lead, TextRank, LexRank, SumBasic, Gensim) hamda ilg‘or transformer asosidagi (BART) modellar qo‘llaniladi. Har bir model matnni qisqartirishda qanday natija bergani ROUGE (ROUGE-1, ROUGE-2, ROUGE-L) ko‘rsatkichlari orqali baholanadi. Eksperiment uchun bir nechta O‘zbek tilidagi jumlalardan iborat hujjatlar to‘plami asosida qisqartirishlar amalga oshirildi. Vizualizatsiya yordamida modellar samaradorligi grafik shaklida taqqoslandi. Natijalar shuni ko‘rsatadiki, transformer asosidagi BART modeli yuqori aniqlik ko‘rsatkichlariga ega bo‘lib, ROUGE metrikalarida ustunlik qiladi. Biroq, yengil va tez ishlaydigan an’anaviy algoritmlar ham ba’zi hollarda samarali xulosalar bera oladi. Ushbu tadqiqot matnni qisqartirish sohasida O‘zbek tilida ilg‘or yondashuvlar va baholash usullarini qo‘llash imkoniyatlarini ochib beradi.
Kalit so‘zlar
NLP
matn qisqartirish
ROUGE
algoritm baholash
matematik model
BERT
MFMMR
TextRank
LexRank
Русский
В данной статье анализируется эффективность различных алгоритмов суммаризации текста. В исследовании используются традиционные (MFMMR, Lead, TextRank, LexRank, SumBasic, Gensim) и продвинутые модели на основе трансформеров (BART). Результаты каждой модели в суммаризации текста оцениваются с помощью показателей ROUGE (ROUGE-1, ROUGE-2, ROUGE-L). Для эксперимента суммаризация проводилась на наборе документов, состоящем из нескольких предложений на узбекском языке. Эффективность моделей сравнивалась графически с помощью визуализации. Результаты показывают, что модель BART на основе трансформеров обладает высокими показателями точности и превосходит метрики ROUGE. Однако традиционные алгоритмы, которые являются легковесными и быстрыми, также могут давать эффективные результаты в некоторых случаях. Данное исследование открывает возможности использования передовых подходов и методов оценки в области суммаризации текста на узбекском языке.
сокращение текста
ROUGE
оценка алгоритма
математическая модель
НЛП
BERT
MFMMR
TextRank
LexRank
English
This article analyzes the effectiveness of various text summarization algorithms. The study uses traditional (MFMMR, Lead, TextRank, LexRank, SumBasic, Gensim) and advanced transformer-based (BART) models. The results of each model in text summarization are evaluated using ROUGE (ROUGE-1, ROUGE-2, ROUGE-L) indicators. For the experiment, summarization was performed on a set of documents consisting of several Uzbek language sentences. The effectiveness of the models was compared graphically using visualization. The results show that the transformer-based BART model has high accuracy indicators and dominates the ROUGE metrics. However, traditional algorithms that are lightweight and fast can also provide effective conclusions in some cases. This study opens up the possibilities of using advanced approaches and evaluation methods in the field of text summarization in the Uzbek language.
NLP
text reduction
ROUGE
algorithm evaluation
mathematical model
BERT
MFMMR
TextRank
LexRank
H.Bastian,P.Glasziou, I.Chalmers, Seventy-fivetrialsandelevensystem atic reviews aday: howwillweever keepup?PLoSMed. 7 (9) (2010) e1000326.
M.Gambhir, V.Gupta, Recent automatic text summarizationtechniques: asurvey,Artif. Intell.Rev. 47(1) (2017)1–66.
B.C.Wallace, S. Saha, F. Soboczenski, I.J.Marshall, Generating (factual?) narrative summaries of rcts: Experimentswith neural multi-document summarization, in: AMIA Annual Symposium Proceedings, Vol. 2021, AmericanMedical InformaticsAssociation, 2021,p. 605.
X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, X. Huang, Pre-trainedmodels for natural languageprocessing:Asurvey,Sci.ChinaTechnol.Sci. (2020)1–26.
B.Wang,Q.Xie, J. Pei, P. Tiwari, Z. Li, etal., Pre-trainedlanguagemodels inbiomedicaldomain:Asurvey frommultiscaleperspective, 2021, arXiv preprintarXiv:2110.05006.
A.J. Brockmeier, M. Ju, P. Przybyła, S. Ananiadou, Improving reference prioritisation with PICO recognition, BMC Med. Inf. Decis. Mak. 19 (1) (2019) 1–14.
Y. Liu, M. Lapata, Text summarization with pretrained encoders, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3730–3740.
N. Kanwal, G. Rizzo, Attention-based clinical note summarization, 2021, arXiv preprint arXiv:2104.08942.
M. Moradi, G. Dorffner, M. Samwald, Deep contextualized embeddings for quantifying the informative content in biomedical text summarization, Comput. Methods Programs Biomed. 184 (2020) 105117.
J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C.H. So, J. Kang, BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics 36 (4) (2020) 1234–1240.
B. Hao, H. Zhu, I. Paschalidis, Enhancing clinical bert embedding using a biomedical knowledge base, in: Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 657–661.
Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut, Albert: A lite BERT for self-supervised learning of language representations, in: International Conference on Learning Representations, 2019.
F. Liu, E. Shareghi, Z. Meng, M. Basaldella, N. Collier, Self-alignment pretraining for biomedical entity representations, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 4228–4238.
E. Alsentzer, J. Murphy, W. Boag, W.-H. Weng, D. Jindi, T. Naumann, M. McDermott, publicly available clinical BERT embeddings, in: Proceed ings of the 2nd Clinical Natural Language Processing Workshop, 2019, pp. 72–78.
Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, H. Poon, Domain-specific language model pretraining for biomedical natural language processing, ACM Trans. Comput. Healthc. (HEALTH) 3 (1) (2021) 1–23.