• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

HSE University Develops Tool for Assessing Text Complexity in Low-Resource Languages

An installation at the National Library of the Republic of Tatarstan celebrating the history of Tatar writing, featuring symbols from various alphabets

An installation at the National Library of the Republic of Tatarstan celebrating the history of Tatar writing, featuring symbols from various alphabets
© Wikimedia Commons

Researchers at the HSE Centre for Language and Brain have developed a tool for assessing text complexity in low-resource languages. The first version supports several of Russia’s minority languages, including Adyghe, Bashkir, Buryat, Tatar, Ossetian, and Udmurt. This is the first tool of its kind designed specifically for these languages, taking into account their unique morphological and lexical features.

According to the Institute of Linguistics of the Russian Academy of Sciences, 155 languages are spoken in Russia. Some of them are used by relatively small communities—for example, around 80,000 people speak Adyghe, while 250,000 to 350,000 people speak Buryat, Ossetian, and Udmurt. Other languages, such as Bashkir and Tatar, have more than one million native speakers. All of these languages hold official status in various republics of Russia, making it essential not only to preserve them but also to create conditions for their development, including opportunities for learning and use in education and science. 

In 2025, a Presidential Decree approving the Fundamentals of the State Language Policy of the Russian Federation was adopted. It affirms linguistic diversity and outlines a strategy for the development and practical use of the languages spoken by the peoples of Russia. One way to advance these goals is to create digital tools that make working with low-resource languages easier and more accessible.

A team of scientists at the HSE Centre for Language and Brain has developed an online text complexity calculator for quick and easy assessment of text difficulty in several minority languages, taking into account their linguistic features. The calculator is based on Textometr, a tool created by Antonina Laposhina and Maria Lebedeva for evaluating the complexity of Russian-language texts.

The calculator developed by psycholinguists at HSE University evaluates texts across several parameters: word length and frequency based on data from language corpora; the percentage of vocabulary covered by the frequency list (ie the share of words in the text that appear among the 5,000 most frequent words in the respective language); and the distribution of parts of speech within the text. In addition, the calculator considers factors such as lexical density and diversity, as well as the text's narrativity and descriptiveness.

The key innovation is the use of the Flesch Reading Ease formula, adapted separately for each language, making it possible to assess text complexity and readability more accurately. 

The Flesch score is based on the number of words, sentences, and syllables, but the original coefficients were developed for English and do not work well for structurally different languages—such as the polysynthetic Adyghe language, in which the average word is much longer. In a 2025 study, Uliana Petrunina and Nina Zdorova recalculated the formula’s coefficients specifically for Adyghe, which significantly improved the accuracy of the readability assessment.

Uliana Petrunina

'The parameters of our calculator are adapted to the structural features of each of the six low-resource languages of Russia, using text corpora as well as frequency and morphological analyses. We also adapted the classic Flesch Reading Ease score. As a result, the algorithm can be easily reconfigured for other low-resource languages, regardless of their typological characteristics,' explains Uliana Petrunina, Research Fellow at the HSE Centre for Language and Brain and one of the developers of the tool.

The tool will help create comparable stimulus materials for linguistic experiments and provide teachers with a resource for selecting high-quality educational materials by difficulty level. This solution represents an important contribution to the preservation and development of Russia’s minority languages and to supporting the country’s linguistic diversity. 

Nina Zdorova

'Our tool allows researchers and teachers to select materials based on their linguistic complexity, which is particularly important for research and education in languages with limited resources,' says Nina Zdorova, one of the creators of the tool.

Future versions are expected to include additional low-resource languages that are underrepresented in linguistics, both in Russia and beyond.

See also:

HSE Researchers Compile Scientific Database for Studying Children’s Eating Habits

The database created at HSE University can serve as a foundation for studying children’s eating habits. This is outlined in the study ‘The Influence of Age, Gender, and Social-Role Factors on Children’s Compliance with Age-Based Nutritional Norms: An Experimental Study Using the Dish-I-Wish Web Application.’ The work has been carried out as part of the HSE Basic Research Programme and was presented at the XXVI April International Academic Conference named after Evgeny Yasin.

New Foresight Centre Study Identifies the Most Destructive Global Trends for Humankind

A team of researchers from the HSE International Research and Educational Foresight Centre has examined how global trends affect the quality of human life—from life expectancy to professional fulfilment. The findings of the study titled ‘Human Capital Transformation under the Influence of Global Trends’ were published in Foresight.

Scientists Develop Algorithm for Accurate Financial Time Series Forecasting

Researchers at the HSE Faculty of Computer Science benchmarked more than 200,000 model configurations for predicting financial asset prices and realised volatility, showing that performance can be improved by filtering out noise at specific frequencies in advance. This technique increased accuracy in 65% of cases. The authors also developed their own algorithm, which achieves accuracy comparable to that of the best models while requiring less computational power. The study has been published in Applied Soft Computing.

HSE and Nazarbayev University: Scientific and Educational Cooperation

In April 2026, HSE University welcomed an official delegation from Nazarbayev University. The visit primarily focused on establishing cooperation between the two universities, expanding partnership ties, and developing joint projects in support of strengthening bilateral relations between Russia and Kazakhstan.

‘Meet Professors, Gain Experience’: Uzbek Lyceum Students Undertake Placement at HSE

The fourth off-site school organised under the Lyceum Classes project has taken place with the support of HSE University and implemented by the HSE Department of Internationalisation. This year, 79 students from International House Tashkent and Interhouse Lyceum came to HSE. The programme includes an introduction to the university, the opportunity to attend classes, and tours around Moscow.  

Participants of HSE LED Conference Discuss Progress in Linguistics and Pedagogy

On April 20–21, the HSE School of Foreign Languages held the V International Scientific and Practical Conference ‘Languages. Education. Development’ (HSE LED). It was organised in an online format and dedicated to current trends in the development of modern knowledge in linguistics and pedagogy. Over two days, about 1,700 participants (including more than 220 speakers) took part in the event— 40% more than in the previous academic year.

HSE and Yandex Propose Method to Speed Up Neural Networks for Image Generation

A team of scientists at HSE FCS and Yandex Research has proposed a method that reduces computational costs and accelerates text-to-image generation in diffusion models without compromising quality. These models currently set the standard for text-to-image generation, but their use is limited by high computational loads, the company said in a statement.

Mathematical Physics at HSE University Goes International

The HSE University International Laboratory for Mirror Symmetry and Automorphic Forms and the Beijing Institute of Mathematical Sciences and Applications (BIMSA) held a joint online conference on mathematical physics. The results of the laboratory research presented at the event will be published in leading academic journals.

HSE Scientists Identify Effective Models for Training Research Personnel for Industry

Experts from the HSE Institute for Statistical Studies and Economics of Knowledge have examined industrial PhD programmes across 19 countries worldwide. The analysis shows that the key components of an effective model include co-funding by universities, industry, and government; dual academic supervision; and flexible intellectual property arrangements. The findings have been published in Foresight and STI Governance.

HSE Biologists Identify Factors That Accelerate Breast Cancer Recurrence

Scientists at HSE University have identified a molecular mechanism underlying aggressive breast cancer. They found that the signals supporting tumour growth originate not from the tumour itself but from its microenvironment. The researchers also demonstrated that reduced levels of the IGFBP6 protein in the tumour microenvironment lead to the accumulation of macrophages—immune cells associated with a higher risk of cancer recurrence. These findings already make it possible to assess patient risk more accurately and may, in the future, enable the development of drugs that target cells of the tumour microenvironment. The study has been published in Current Drug Therapy.