Archive

Journal of Applied Economic Research. 2025. Vol. 24. №2. Pp. 584-621

Clustering of Russian Manufacturing Companies by Indicators of Their Financial Condition Using Machine Learning Technologies

Lev A. Bulanov 1, Alexei V. Kalina 1,2, Vadim V. Krivorotov1

1 Ural Federal University named after the First President of Russia B.N. Yeltsin, Yekaterinburg, Russia
2 Institute of Economics, The Ural Branch of Russian Academy of Sciences, Yekaterinburg, Russia

Abstract

Clustering of research objects and combining them into similar groups based on a set of characteristics is an important stage in solving many tasks of socio-economic development, especially tasks related to assessing the state of the socio-economic system, as well as modeling and forecasting indicators of its future development. The purpose of this study is a relative assessment of the financial condition of large Russian manufacturing companies based on data from accounting reporting forms using clusterization methods classified as machine learning without a teacher. The results of such an assessment are subsequently supposed to be used to build a model for assessing the financial condition of companies based on one of the machine learning algorithms with a teacher. The paper offers key indicators of the financial condition of companies, on the basis of which it is proposed to perform their clusterization. They were identified as a result of the analysis of modern methods and approaches to research and assessment of competitiveness and competitive position of companies. When conducting clusterization based on the proposed set of indicators, financial reporting data from 2,249 Russian manufacturing companies based on the results of 2023 were used. Companies with a turnover of more than 2 billion rubles and a staff of more than 251 people were considered as large companies.. K-Means++, hierarchical clustering, and DBSCAN were used as clustering algorithms. In order to obtain the best result, special data preprocessing and selection of the necessary hyperparameters for clustering algorithms were carried out. The quality of the final clustering was assessed using the Davies-Bouldin (DBI) and the Calinski–Harabasz (CHI) scores. The results showed that the production companies under consideration can be combined into a relatively small number of clusters (usually no more than 3) in terms of financial condition, which opens up wide opportunities for building models of the financial condition of companies. Based on the results of using 3 clustering methods, K-Means++ turned out to be the best algorithm by a small margin, the formed centroids of which can be called the average assessment of companies with poor, normal and good financial condition. The quality of the final clustering can be assessed as good.

Keywords

financial analysis; machine learning; financial condition indicators; large company; clusterization of companies; K-Means++; hierarchical clusterization; DBSCAN

JEL classification

D22, G30, C45

References

1. Kryzanowski, L., Galler, M., Wright, D.W. (1993). Using artificial neural networks to pick stocks. Financial Analysts Journal, Vol. 49, Issue 4, 21–27. https://doi.org/10.2469/faj.v49.n4.21

2. Porter, M.E. (1990). The Competitive Advantage of Nations. New York, Free Press, 855 p. Available at: https://archive.org/details/competitiveadvan0000port

3. Porter, M.E. (2008). The five competitive forces that shape strategy. Harvard Business Review, Vol. 86, No. 1, 78–93. Available at: https://sistemasgerenciales.wordpress.com/wp-content/uploads/2016/04/the-five-competitive-forces-that-shape-strategy.pdf

4. Cao, X., Shen, X., Liu, Q. (2024). Mechanism of aquaculture competitiveness in China. Aquaculture Reports, Vol. 37, 102195. https://doi.org/10.1016/j.aqrep.2024.102195

5. Tao, X., Cai, W. (2025). The impact of digital finance on export competitiveness: Evidence from Chinese manufacturing enterprises. Finance Research Letters, Vol. 73, 106629. https://doi.org/10.1016/j.frl.2024.106629

6. Liu, Y.-L., Tian, L., Li, C., Wu, Ya. (2024). Analyzing the competitiveness and strategies of Chinese mobile network operators in the 5G era. Telecommunications Policy, Vol. 48, Issue 2, 102652. https://doi.org/10.1016/j.telpol.2023.102652

7. Fang, K., Zhou, Y., Wang, S., Ye, R., Guo, S. (2018). Assessing national renewable energy competitiveness of the G20: A revised Porter's Diamond Model. Renewable and Sustainable Energy Reviews, Vol. 93, 719–731. https://doi.org/10.1016/j.rser.2018.05.011

8. Cibinskiene, A., Dumciuviene, D., Bobinaite, V., Dragašius, E. (2021). Competitiveness of industrial companies forming the value chain of wind energy components: The case of Lithuania. Sustainability, Vol. 13, Issue 16, 9255. https://doi.org/10.3390/su13169255

9. Liu, J., Wei, Q., Dai, Q., Liang, C. (2018). Overview of wind power industry value chain using diamond model: A case study from China. Applied Sciences, Vol. 8, Issue 10, 1900. https://doi.org/10.3390/app8101900

10. Lau, A.K.W., Baark, E., Lo, W.L.W., Sharif, N. (2013). The effects of innovation sources and capabilities on product competitiveness in Hong Kong and the Pearl River Delta. Asian Journal of Technology Innovation, Vol. 21, Issue 2, 220–236. https://doi.org/10.1504/IJTM.2012.047244

11. Lotfi, B., Karim, M. (2016). Competitiveness determinants of Moroccan exports: quantity-based analysis. International Journal of Economics and Finance, Vol. 8, Issue 7, 140–148. https://doi.org/10.5539/ijef.v8n7p140

12. Li, Ya., Yu, H., Shen, Z. (2025). Dynamic prediction of product competitive position: A multisource data-driven competitive analysis framework from a multi-competitor perspective. Journal of Retailing and Consumer Services, Vol. 25, 104289. https://doi.org/10.1016/j.jretconser.2025.104289

13. Kim, S.-A., Park, S., Kwak, M., Kang, C. (2025). Examining product quality and competitiveness via online reviews: An integrated approach of importance performance competitor analysis and Kano model. Journal of Retailing and Consumer Services, Vol. 82, 104135. https://doi.org/10.1016/j.jretconser.2024.104135

14. Buckley, P.J., Pass, C.L., Prescott, K. (1988). Measures of international competitiveness: A critical survey. Journal of Marketing Management, Vol. 4, Issue 2, 175–200. https://doi.org/10.1080/0267257X.1988.9964068

15. Schefczyk, M. (1993). Operational performance of airlines: an extension of traditional measurement paradigms. Strategic Management Journal, Vol. 14, No. 4, 301–317. https://doi.org/10.1002/smj.4250140406

16. Good, D.H., Nadiri, M.I., Roller, L.H., Sickles, R.C. (1993). Efficiency and productivity growth comparisons of European and U.S. airlines: A first look at the data. The Journal of Productivity Analysis, Vol. 4, 115–125. https://doi.org/10.1007/BF01073469

17. Parkan, C., Wu, M.-L. (1999). Measurement of the performance of an investment bank using the operational competitiveness rating procedure. Omega, Vol. 27, Issue 2, 201–217. https://doi.org/10.1016/S0305-0483(98)00041-3

18. Fleisher, C.S., Bensoussan, B.E. (2015). Business and Competitive Analysis: Effective Application of New and Classic Methods. Second Edition. New Jersey, Pearson, 448 p. https://doi.org/10.24883/IberoamericanIC.v2i2.37

19. Prescott, J.E., Grant, J.H. (1988). A manager's guide for evaluating competitive analysis techniques. Interfaces, Vol. 18, Issue 3, 10–22. https://doi.org/10.1287/inte.18.3.10

20. Steinhaus, H. (1956). Sur la division des corps matériels en parties. Bulletin L’Académie Polonaise des Science, Vol. 4, No. 12, 801–804. (In French). Available at: http://laurent-duval.eu/Documents/Steinhaus_H_1956_j-bull-acad-polon-sci_division_cmp-k-means.pdf

21. Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, Vol. 28, Issue 2, 129–137. https://doi.org/10.1109/TIT.1982.1056489

22. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. University of California Press, 281–297. Available at: https://sci2s.ugr.es/keel/pdf/algorithm/congreso/1967-MacQueen-MSP.pdf

23. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, Issue 7, 881–892. https://doi.org/10.1109/TPAMI.2002.1017616

24. Ostrovsky, R., Rabani, Yu., Schulman, L.J., Swamy, C. (2006). The effectiveness of Lloyd-type methods for the k-means problem. Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06). IEEE, 165–176. https://doi.org/10.1109/FOCS.2006.75

25. Ahmed, M., Seraj, R., Islam, S.M.S. (2020). The k-means algorithm: A comprehensive survey and performance evaluation. Electronics, Vol. 9, Issue 8, 1295. https://doi.org/10.3390/electronics9081295

26. Arthur, D., Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding. Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2007). Society for Industrial and Applied Mathematics3600 University City Science Center Philadelphia, 1027–1035. https://doi.org/10.1145/1283383.1283494

27. Abdulnassar, A.A., Nair, L.R. (2023). Performance analysis of Kmeans with modified initial centroid selection algorithms and developed Kmeans9+ model. Measurement: Sensors, Vol. 25, 100666. https://doi.org/10.1016/j.measen.2023.100666

28. Ay, M., Özbakır, L., Kulluk, S., Gülmez, B., Öztürk, G., Özer, S. (2023). FC-Kmeans: Fixed-centered K-means algorithm. Expert Systems with Applications, Vol. 211, 118656. https://doi.org/10.1016/j.eswa.2022.118656

29. Li, J., Li, J., Wang, D., et al. (2023). Hierarchical and partitioned planning strategy for closed-loop devices in low-voltage distribution network based on improved KMeans partition method. Energy Reports, Vol. 9, 477–485. https://doi.org/10.1016/j.egyr.2023.05.161

30. He, J., Jiang, D., Zhang, D., Li, J., Fei, Q (2022). Interval model validation for rotor support system using Kmeans Bayesian method. Probabilistic Engineering Mechanics, Vol. 70, 103364. https://doi.org/10.1016/j.probengmech.2022.103364

31. Nielsen, F. (2016). Introduction to HPC with MPI for Data Science. Springer, 282 p. https://doi.org/10.1007/978-3-319-21903-5

32. Sibson, R. (1973). SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, Vol. 16, Issue 1, 30–34. https://doi.org/10.1093/comjnl/16.1.30

33. Defays, D. (1977). An efficient algorithm for a complete link method. The Computer Journal, Vol. 20, Issue 4, 364–366. https://doi.org/10.1093/comjnl/20.4.364

34. Eppstein, D. (2000). Fast hierarchical clustering and other applications of dynamic closest pairs. Journal of Experimental Algorithmics (JEA), Vol. 5, 1–es. https://doi.org/10.1145/351827.351829

35. Riyahi, M., Martín, A.G. (2025). Optimizing Capacity Expansion Modeling with a Novel Hierarchical Clustering and Systematic Elbow Method: A Case study on Power and Storage Units in Spain. Energy, Vol. 323, 135788. https://doi.org/10.1016/j.energy.2025.135788

36. Tang, Z., Wang, L., Guo, S., Liang, G., Zhang, W., Zhang, L., Rui, M., Guan, G., Wang, Yu. (2025). Study on modular design methodology of marine SMR system based on fuzzy hierarchical clustering and improved genetic algorithm. Progress in Nuclear Energy, Vol. 185, 105739. https://doi.org/10.1016/j.pnucene.2025.105739

37. Zhang, X., Wang, Y.-L., Byun, H. (2025). Enhancing energy saving and reducing latency by divisive hierarchical clustering with Multi-UAVs in WSANs. Applied Soft Computing, 112861. https://doi.org/10.1016/j.asoc.2025.112861

38. Ester, M., Kriegel, H.P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the 1996 Knowl-edge Discovery and Data Mining (KDD’96). AAAI Press, 226–231. Available at: https://cdn.aaai.org/KDD/1996/KDD96-037.pdf

39. Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X. (2017). DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems, Vol. 42, No. 3, 1–21. https://doi.org/10.1145/3068335

40. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD '99). ACM New York, 49–60. https://doi.org/10.1145/304182.304187

41. Campello, R.J., Moulavi, D., Sander, J. (2013). Density-based clustering based on hierarchical density estimates. In: Advances in Knowledge Discovery and Data Mining. Pacific-Asia Conference on Knowledge Discovery and Data Mining. Edited by J. Pei, V.S. Tseng, L. Cao, H. Motoda, G. Xu. Springer, 160–172. https://doi.org/10.1007/978-3-642-37456-2_14

42. Campello, R.J., Moulavi, D., Zimek, A., Sander, J. (2015). Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 10, No. 1, 1–51. https://doi.org/10.1145/2733381

43. Bíró, P. Kovács, B.B.H., Novák, T., Erdélyi, M. (2025). Cluster parameter-based DBSCAN maps for image characterization. Computational and Structural Biotechnology Journal, Vol. 27, 920–927. https://doi.org/10.1016/j.csbj.2025.02.037

44. Ozer, F.C., Tuydes-Yaman, H., Dalkic-Melek, G. (2024). Increasing the precision of public transit user activity location detection from smart card data analysis via spatial-temporal DBSCAN. Data & Knowledge Engineering, Vol. 153, 102343. https://doi.org/10.1016/j.datak.2024.102343

45. Mardani, K., Maghooli, K., Farokhi, F. (2025). Segmentation of coronary arteries from X-ray angiographic images using density based spatial clustering of applications with noise (DBSCAN). Biomedical Signal Processing and Control, Vol. 101, 107175. https://doi.org/10.1016/j.bspc.2024.107175

46. Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, Vol. 43, 59–69. https://doi.org/10.1007/BF00337288

47. Urme, O., Reza, S., Adham, Md I., Sattar, G.S. (2025). Arsenic, manganese, and iron concentration in groundwater of northwestern part of Bangladesh using self-organizing maps: Implication for health risk assessment. Heliyon, Vol. 11, Issue 2, e41805. https://doi.org/10.1016/j.heliyon.2025.e41805

48. Boubekki, A., Kampffmeyer, M., Brefeld, U., Jenssen, R. (2021). Joint optimization of an autoencoder for clustering and embedding. Machine Learning, Vol. 110, No. 7, 1901–1937. https://doi.org/10.1007/s10994-021-06015-5

49. Pulgar, F.J., Charte, F., Rivera, A.J., del Jesus, M.J. (2018). AEkNN: An AutoEncoder kNN-based classifier with built-in dimensionality reduction. ArXiv Preprint arXiv:1802.08465, 35 p. https://doi.org/10.48550/arXiv.1802.08465

50. John Wiley & Sons (1983). Understanding Robust and Exploratory Data Analysis. Edited by D.C. Hoaglin, F. Mosteller, J.W. Tukey. John Wiley & Sons, 447 p. Available at: https://archive.org/details/understandingrob0000unse/page/n7/mode/1up

51. Chandola, V., Banerjee, A., Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, Vol. 41, No. 3, 1–58. https://doi.org/10.1145/1541880.1541882

52. Ward Jr, J.H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, Vol. 58, Issue 301, 236–244. https://doi.org/10.1080/01621459.1963.10500845

53. Davies, D.L., Bouldin, D.W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-1, Issue 2, 224–227. https://doi.org/10.1109/TPAMI.1979.4766909

54. Caliński, T., Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, Vol. 3, Issue 1, 1–27. https://doi.org/10.1080/03610927408827101

55. Rousseeuw, P.J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, Vol. 20, 53–65. doi.org/10.1016/0377-0427(87)90125-7

About Authors

Lev Alexeevich Bulanov

Post-Graduate Student, Department of Economic Safety of Industrial Complexes, Ural Federal University named after the first President of Russia B.N. Yeltsin, Ekaterinburg, Russia (620002, Ekaterinburg, Mira street, 19); ORCID https://orcid.org/0009-0001-0242-0127 e-mail: levbulanov2013@yandex.ru

Alexei Vladimirovich Kalina

Candidate of Technical Sciences, Associate Professor, Department of Economic Safety of Industrial Complexes, Ural Federal University named after the first President of Russia B.N. Yeltsin, Ekaterinburg, Russia (620002, Ekaterinburg, Mira street, 19), Senior Researcher, The Center of Economic Security, Institute of Economics, The Ural Branch of Russian Academy of Sciences, Yekaterinburg, Russia (620014, Yekaterinburg, Moskovskaya street, 29); ORCID https://orcid.org/0000-0003-0376-2505 e-mail: alexkalina74@mail.ru

Vadim Vasilyevich Krivorotov

Doctor of Economics, Professor, Head of Department of Economic Safety of Industrial Complexes, Ural Federal University named after the first President of Russia B.N. Yeltsin, Ekaterinburg, Russia (620002, Ekaterinburg, Mira street, 19); ORCID https://orcid.org/0000-0002-7066-0325 e-mail: v_krivorotov@mail.ru

For citation

Bulanov, L.A., Kalina, A.V., Krivorotov, V.V. (2025). Clustering of Russian Manufacturing Companies by Indicators of Their Financial Condition Using Machine Learning Technologies. Journal of Applied Economic Research, Vol. 24, No. 2, 584-621. https://doi.org/10.15826/vestnik.2025.24.2.020

Article info

Received March 21, 2025; Revised April 7, 2025; Accepted April 12, 2025.

DOI: https://doi.org/10.15826/vestnik.2025.24.2.020

Download full text article:

08_Bulanov_Kalina_Krivorotov.pdf

~4 MB, *.pdf (Uploaded 15.06.2025) *new

Created / Updated: 2 September 2015 / 20 September 2021

Journal of Applied Economic Research

ISSN 2712-7435

Journal of Applied Economic Research. 2025. Vol. 24. №2. Pp. 584-621