ROUGE METRIKALARI ASOSIDA MATN QISQARTIRISH ALGORITMLARINING SAMARADORLIGINI BAHOLASH

Muhamediyeva D.T.; Mamatov A.A.

doi:10.5281/zenodo.20214964

25.04.2026

Oʻzbekcha

ROUGE METRIKALARI ASOSIDA MATN QISQARTIRISH ALGORITMLARINING SAMARADORLIGINI BAHOLASH

Nashr sanasi

25.04.2026

Jurnal

Sun'iy intellektni pedagogik ta'limga tadbiq etishning ustivor yo'nalishlari

Nashr

Sun'iy intellektni pedagogik ta'limga tadbiq etishning ustivor yo'nalishlari

Sahifalar

498-504

DOI

10.5281/zenodo.20214964

Mualliflar

Muhamediyeva D.T. , Mamatov A.A.

Annotatsiya

Ushbu maqolada turli matn qisqartirish (summarization) algoritmlarining samaradorligi tahlil qilinadi. Tadqiqotda an’anaviy (MFMMR, Lead, TextRank, LexRank, SumBasic, Gensim) hamda ilg‘or transformer asosidagi (BART) modellar qo‘llaniladi. Har bir model matnni qisqartirishda qanday natija bergani ROUGE (ROUGE-1, ROUGE-2, ROUGE-L) ko‘rsatkichlari orqali baholanadi. Eksperiment uchun bir nechta O‘zbek tilidagi jumlalardan iborat hujjatlar to‘plami asosida qisqartirishlar amalga oshirildi. Vizualizatsiya yordamida modellar samaradorligi grafik shaklida taqqoslandi. Natijalar shuni ko‘rsatadiki, transformer asosidagi BART modeli yuqori aniqlik ko‘rsatkichlariga ega bo‘lib, ROUGE metrikalarida ustunlik qiladi. Biroq, yengil va tez ishlaydigan an’anaviy algoritmlar ham ba’zi hollarda samarali xulosalar bera oladi. Ushbu tadqiqot matnni qisqartirish sohasida O‘zbek tilida ilg‘or yondashuvlar va baholash usullarini qo‘llash imkoniyatlarini ochib beradi.

Kalit so‘zlar

NLP matn qisqartirish ROUGE algoritm baholash matematik model BERT MFMMR TextRank LexRank

Boshqa tillardagi variantlar

Русский

В данной статье анализируется эффективность различных алгоритмов суммаризации текста. В исследовании используются традиционные (MFMMR, Lead, TextRank, LexRank, SumBasic, Gensim) и продвинутые модели на основе трансформеров (BART). Результаты каждой модели в суммаризации текста оцениваются с помощью показателей ROUGE (ROUGE-1, ROUGE-2, ROUGE-L). Для эксперимента суммаризация проводилась на наборе документов, состоящем из нескольких предложений на узбекском языке. Эффективность моделей сравнивалась графически с помощью визуализации. Результаты показывают, что модель BART на основе трансформеров обладает высокими показателями точности и превосходит метрики ROUGE. Однако традиционные алгоритмы, которые являются легковесными и быстрыми, также могут давать эффективные результаты в некоторых случаях. Данное исследование открывает возможности использования передовых подходов и методов оценки в области суммаризации текста на узбекском языке.

сокращение текста ROUGE оценка алгоритма математическая модель НЛП BERT MFMMR TextRank LexRank

English

This article analyzes the effectiveness of various text summarization algorithms. The study uses traditional (MFMMR, Lead, TextRank, LexRank, SumBasic, Gensim) and advanced transformer-based (BART) models. The results of each model in text summarization are evaluated using ROUGE (ROUGE-1, ROUGE-2, ROUGE-L) indicators. For the experiment, summarization was performed on a set of documents consisting of several Uzbek language sentences. The effectiveness of the models was compared graphically using visualization. The results show that the transformer-based BART model has high accuracy indicators and dominates the ROUGE metrics. However, traditional algorithms that are lightweight and fast can also provide effective conclusions in some cases. This study opens up the possibilities of using advanced approaches and evaluation methods in the field of text summarization in the Uzbek language.

NLP text reduction ROUGE algorithm evaluation mathematical model BERT MFMMR TextRank LexRank

Foydalanilgan adabiyotlar

H.Bastian,P.Glasziou, I.Chalmers, Seventy-fivetrialsandelevensystem atic reviews aday: howwillweever keepup?PLoSMed. 7 (9) (2010) e1000326.

M.Gambhir, V.Gupta, Recent automatic text summarizationtechniques: asurvey,Artif. Intell.Rev. 47(1) (2017)1–66.

B.C.Wallace, S. Saha, F. Soboczenski, I.J.Marshall, Generating (factual?) narrative summaries of rcts: Experimentswith neural multi-document summarization, in: AMIA Annual Symposium Proceedings, Vol. 2021, AmericanMedical InformaticsAssociation, 2021,p. 605.

X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, X. Huang, Pre-trainedmodels for natural languageprocessing:Asurvey,Sci.ChinaTechnol.Sci. (2020)1–26.

B.Wang,Q.Xie, J. Pei, P. Tiwari, Z. Li, etal., Pre-trainedlanguagemodels inbiomedicaldomain:Asurvey frommultiscaleperspective, 2021, arXiv preprintarXiv:2110.05006.

A.J. Brockmeier, M. Ju, P. Przybyła, S. Ananiadou, Improving reference prioritisation with PICO recognition, BMC Med. Inf. Decis. Mak. 19 (1) (2019) 1–14.

Y. Liu, M. Lapata, Text summarization with pretrained encoders, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3730–3740.

N. Kanwal, G. Rizzo, Attention-based clinical note summarization, 2021, arXiv preprint arXiv:2104.08942.

M. Moradi, G. Dorffner, M. Samwald, Deep contextualized embeddings for quantifying the informative content in biomedical text summarization, Comput. Methods Programs Biomed. 184 (2020) 105117.

J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C.H. So, J. Kang, BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics 36 (4) (2020) 1234–1240.

B. Hao, H. Zhu, I. Paschalidis, Enhancing clinical bert embedding using a biomedical knowledge base, in: Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 657–661.

Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut, Albert: A lite BERT for self-supervised learning of language representations, in: International Conference on Learning Representations, 2019.

F. Liu, E. Shareghi, Z. Meng, M. Basaldella, N. Collier, Self-alignment pretraining for biomedical entity representations, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 4228–4238.

E. Alsentzer, J. Murphy, W. Boag, W.-H. Weng, D. Jindi, T. Naumann, M. McDermott, publicly available clinical BERT embeddings, in: Proceed ings of the 2nd Clinical Natural Language Processing Workshop, 2019, pp. 72–78.

Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, H. Poon, Domain-specific language model pretraining for biomedical natural language processing, ACM Trans. Comput. Healthc. (HEALTH) 3 (1) (2021) 1–23.

PDFni ko'rish