We use cookies in order to improve the quality and usability of the HSE website. More information about the use of cookies is available here, and the regulations on processing personal data can be found here. By continuing to use the site, you hereby confirm that you have been informed of the use of cookies by the HSE website and agree with our rules for processing personal data. You may disable cookies in your browser settings.

  • A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

New Clustering Method Simplifies Analysis of Large Data Sets

New Clustering Method Simplifies Analysis of Large Data Sets

© iStock

Researchers from HSE University and the Institute of Control Sciences of the Russian Academy of Sciences have proposed a new method of data analysis: tunnel clustering. It allows for the rapid identification of groups of similar objects and requires fewer computational resources than traditional methods. Depending on the data configuration, the algorithm can operate dozens of times faster than its counterparts. The study was published in the journal Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia.

Each year, the volume of information requiring processing continues to grow. Data comes from a variety of sources: scientific research, financial reports, medical examinations, and many others. Clustering methods—which group data based on similar characteristics—are used to detect patterns and organise information within such large datasets. These groupings are known as clusters.

One of the most widely used clustering methods is the k-means algorithm. It divides data into a predetermined number of clusters, initially selecting their centres (centroids). However, this method has a limitation: the number of clusters must be known beforehand, which is not always possible when dealing with complex data. Scientists from HSE University and the V.A. Trapeznikov Institute of Control Sciences have proposed a new approach to simplify this process—tunnel clustering. Unlike the k-means method, this algorithm does not require the number of clusters to be set in advance; it determines the necessary number itself by analysing the data structure.

‘The algorithm forms “tunnels” in the data—regions in multidimensional space where objects with similar characteristics group together,’ explained Fuad Aleskerov, Head of the Department of Mathematics at the HSE Faculty of Economic Sciences. ‘Users can choose from three modes of operation: with fixed cluster boundaries, with adaptive boundaries that adjust to the data structure, or a combined approach. This makes the method flexible and suitable for various types of tasks.’

The method was tested on a synthetic (artificially generated) dataset of 100,000 objects, as well as on real-world tasks in public administration and the banking sector.

Visualisation of the original data and the results of tunnel clustering in a four-dimensional parallel coordinates system.
© Aleskerov, F.T., Myachin, A.L. & Yakuba, V.I. Tunnel Clustering Method. Dokl. Math. 110, 474–479 (2024)

The main advantage of the new method is its speed. Unlike classical algorithms that demand significant computational resources, tunnel clustering can, depending on the data configuration, perform the analysis dozens of times faster.

In addition, the researchers introduced the concept of the ‘transition degree’—a parameter indicating how many characteristics of an object must change for it to be classified into a different cluster. This helps assess the clarity of cluster boundaries and identify objects situated at the intersection of different groups.

‘People are generating more and more data, and the pace is only accelerating. According to the latest Digital 2025: Global Overview Report, as of early 2025, there were 5.56 billion internet users—nearly 68% of the global population. Adults spend an average of 6 hours and 38 minutes online each day, communicating, working, watching videos, and consuming content,’ said Alexey Myachin, Senior Research Fellow at the HSE International Centre for Decision Choice and Analysis. ‘Companies that ignore data analysis are losing vast sums of money.’

The authors continue to refine the algorithm, including conducting research into dimensionality reduction, which will help further decrease the time required to identify patterns in data.

The study was carried out with partial support from the Russian Science Foundation.

See also:

Researchers Present the Rating of Ideal Life Partner Traits

An international research team surveyed over 10,000 respondents across 43 countries to examine how closely the ideal image of a romantic partner aligns with the actual partners people choose, and how this alignment shapes their romantic satisfaction. Based on the survey, the researchers compiled two ratings—qualities of an ideal life partner and the most valued traits in actual partners. The results have been published in the Journal of Personality and Social Psychology.

Trend-Watching: Radical Innovations in Creative Industries and Artistic Practices

The rapid development of technology, the adaptation of business processes to new economic realities, and changing audience demands require professionals in the creative industries to keep up with current trends and be flexible in their approach to projects. Between April and May 2025, the Institute for Creative Industries Development (ICID) at the HSE Faculty of Creative Industries conducted a trend study within the creative sector.

From Neural Networks to Stock Markets: Advancing Computer Science Research at HSE University in Nizhny Novgorod

The International Laboratory of Algorithms and Technologies for Network Analysis (LATNA), established in 2011 at HSE University in Nizhny Novgorod, conducts a wide range of fundamental and applied research, including joint projects with large companies: Sberbank, Yandex, and other leaders of the IT industry. The methods developed by the university's researchers not only enrich science, but also make it possible to improve the work of transport companies and conduct medical and genetic research more successfully. HSE News Service discussed work of the laboratory with its head, Professor Valery Kalyagin.

Children with Autism Process Sounds Differently

For the first time, an international team of researchers—including scientists from the HSE Centre for Language and Brain—combined magnetoencephalography and morphometric analysis in a single experiment to study children with Autism Spectrum Disorder (ASD). The study found that children with autism have more difficulty filtering and processing sounds, particularly in the brain region typically responsible for language comprehension. The study has been published in Cerebral Cortex.

HSE Scientists Discover Method to Convert CO₂ into Fuel Without Expensive Reagents

Researchers at HSE MIEM, in collaboration with Chinese scientists, have developed a catalyst that efficiently converts CO₂ into formic acid. Thanks to carbon coating, it remains stable in acidic environments and functions with minimal potassium, contrary to previous beliefs that high concentrations were necessary. This could lower the cost of CO₂ processing and simplify its industrial application—eg in producing fuel for environmentally friendly transportation. The study has been published in Nature Communications. 

HSE Scientists Reveal How Staying at Alma Mater Can Affect Early-Career Researchers

Many early-career scientists continue their academic careers at the same university where they studied, a practice known as academic inbreeding. A researcher at the HSE Institute of Education analysed the impact of academic inbreeding on publication activity in the natural sciences and mathematics. The study found that the impact is ambiguous and depends on various factors, including the university's geographical location, its financial resources, and the state of the regional academic employment market. A paper with the study findings has been published in Research Policy.

Group and Shuffle: Researchers at HSE University and AIRI Accelerate Neural Network Fine-Tuning

Researchers at HSE University and the AIRI Institute have proposed a method for quickly fine-tuning neural networks. Their approach involves processing data in groups and then optimally shuffling these groups to improve their interactions. The method outperforms alternatives in image generation and analysis, as well as in fine-tuning text models, all while requiring less memory and training time. The results have been presented at the NeurIPS 2024 Conference.

When Thoughts Become Movement: How Brain–Computer Interfaces Are Transforming Medicine and Daily Life

At the dawn of the 21st century, humans are increasingly becoming not just observers, but active participants in the technological revolution. Among the breakthroughs with the potential to change the lives of millions, brain–computer interfaces (BCIs)—systems that connect the brain to external devices—hold a special place. These technologies were the focal point of the spring International School ‘A New Generation of Neurointerfaces,’ which took place at HSE University.

Researchers from HSE University in Perm Teach AI to Analyse Figure Skating

Researchers from HSE University in Perm have developed NeuroSkate, a neural network that identifies the movements of skaters on video and determines the correctness of the elements performed. The algorithm has already demonstrated success with the basic elements, and further development of the model will improve its accuracy in identifying complex jumps. 

Script Differences Hinder Language Switching in Bilinguals

Researchers at the HSE Centre for Language and Brain used eye-tracking to examine how bilinguals switch between languages in response to context shifts. Script differences were found to slow down this process. When letters appear unfamiliar—such as the Latin alphabet in a Russian-language text—the brain does not immediately switch to the other language, even when the person is aware they are in a bilingual setting. The article has been published in Bilingualism: Language and Cognition.