Journal of Applied Economic Research
ISSN 2712-7435
Clustering of Russian Manufacturing Companies by Indicators of Their Financial Condition Using Machine Learning Technologies
Lev A. Bulanov 1, Alexei V. Kalina 1,2, Vadim V. Krivorotov1
1 Ural Federal University named after the First President of Russia B.N. Yeltsin, Yekaterinburg, Russia
2 Institute of Economics, The Ural Branch of Russian Academy of Sciences, Yekaterinburg, Russia
Abstract
Clustering of research objects and combining them into similar groups based on a set of characteristics is an important stage in solving many tasks of socio-economic development, especially tasks related to assessing the state of the socio-economic system, as well as modeling and forecasting indicators of its future development. The purpose of this study is a relative assessment of the financial condition of large Russian manufacturing companies based on data from accounting reporting forms using clusterization methods classified as machine learning without a teacher. The results of such an assessment are subsequently supposed to be used to build a model for assessing the financial condition of companies based on one of the machine learning algorithms with a teacher. The paper offers key indicators of the financial condition of companies, on the basis of which it is proposed to perform their clusterization. They were identified as a result of the analysis of modern methods and approaches to research and assessment of competitiveness and competitive position of companies. When conducting clusterization based on the proposed set of indicators, financial reporting data from 2,249 Russian manufacturing companies based on the results of 2023 were used. Companies with a turnover of more than 2 billion rubles and a staff of more than 251 people were considered as large companies.. K-Means++, hierarchical clustering, and DBSCAN were used as clustering algorithms. In order to obtain the best result, special data preprocessing and selection of the necessary hyperparameters for clustering algorithms were carried out. The quality of the final clustering was assessed using the Davies-Bouldin (DBI) and the Calinski–Harabasz (CHI) scores. The results showed that the production companies under consideration can be combined into a relatively small number of clusters (usually no more than 3) in terms of financial condition, which opens up wide opportunities for building models of the financial condition of companies. Based on the results of using 3 clustering methods, K-Means++ turned out to be the best algorithm by a small margin, the formed centroids of which can be called the average assessment of companies with poor, normal and good financial condition. The quality of the final clustering can be assessed as good.
Keywords
financial analysis; machine learning; financial condition indicators; large company; clusterization of companies; K-Means++; hierarchical clusterization; DBSCAN
JEL classification
D22, G30, C45References
1. Kryzanowski, L., Galler, M., Wright, D.W. (1993). Using artificial neural networks to pick stocks. Financial Analysts Journal, Vol. 49, Issue 4, 21–27. https://doi.org/10.2469/faj.v49.n4.21
2. Porter, M.E. (1990). The Competitive Advantage of Nations. New York, Free Press, 855 p. Available at: https://archive.org/details/competitiveadvan0000port
3. Porter, M.E. (2008). The five competitive forces that shape strategy. Harvard Business Review, Vol. 86, No. 1, 78–93. Available at: https://sistemasgerenciales.wordpress.com/wp-content/uploads/2016/04/the-five-competitive-forces-that-shape-strategy.pdf
4. Cao, X., Shen, X., Liu, Q. (2024). Mechanism of aquaculture competitiveness in China. Aquaculture Reports, Vol. 37, 102195. https://doi.org/10.1016/j.aqrep.2024.102195
5. Tao, X., Cai, W. (2025). The impact of digital finance on export competitiveness: Evidence from Chinese manufacturing enterprises. Finance Research Letters, Vol. 73, 106629. https://doi.org/10.1016/j.frl.2024.106629
6. Liu, Y.-L., Tian, L., Li, C., Wu, Ya. (2024). Analyzing the competitiveness and strategies of Chinese mobile network operators in the 5G era. Telecommunications Policy, Vol. 48, Issue 2, 102652. https://doi.org/10.1016/j.telpol.2023.102652
7. Fang, K., Zhou, Y., Wang, S., Ye, R., Guo, S. (2018). Assessing national renewable energy competitiveness of the G20: A revised Porter's Diamond Model. Renewable and Sustainable Energy Reviews, Vol. 93, 719–731. https://doi.org/10.1016/j.rser.2018.05.011
8. Cibinskiene, A., Dumciuviene, D., Bobinaite, V., Dragašius, E. (2021). Competitiveness of industrial companies forming the value chain of wind energy components: The case of Lithuania. Sustainability, Vol. 13, Issue 16, 9255. https://doi.org/10.3390/su13169255
9. Liu, J., Wei, Q., Dai, Q., Liang, C. (2018). Overview of wind power industry value chain using diamond model: A case study from China. Applied Sciences, Vol. 8, Issue 10, 1900. https://doi.org/10.3390/app8101900
10. Lau, A.K.W., Baark, E., Lo, W.L.W., Sharif, N. (2013). The effects of innovation sources and capabilities on product competitiveness in Hong Kong and the Pearl River Delta. Asian Journal of Technology Innovation, Vol. 21, Issue 2, 220–236. https://doi.org/10.1504/IJTM.2012.047244
11. Lotfi, B., Karim, M. (2016). Competitiveness determinants of Moroccan exports: quantity-based analysis. International Journal of Economics and Finance, Vol. 8, Issue 7, 140–148. https://doi.org/10.5539/ijef.v8n7p140
12. Li, Ya., Yu, H., Shen, Z. (2025). Dynamic prediction of product competitive position: A multisource data-driven competitive analysis framework from a multi-competitor perspective. Journal of Retailing and Consumer Services, Vol. 25, 104289. https://doi.org/10.1016/j.jretconser.2025.104289
13. Kim, S.-A., Park, S., Kwak, M., Kang, C. (2025). Examining product quality and competitiveness via online reviews: An integrated approach of importance performance competitor analysis and Kano model. Journal of Retailing and Consumer Services, Vol. 82, 104135. https://doi.org/10.1016/j.jretconser.2024.104135
14. Buckley, P.J., Pass, C.L., Prescott, K. (1988). Measures of international competitiveness: A critical survey. Journal of Marketing Management, Vol. 4, Issue 2, 175–200. https://doi.org/10.1080/0267257X.1988.9964068
15. Schefczyk, M. (1993). Operational performance of airlines: an extension of traditional measurement paradigms. Strategic Management Journal, Vol. 14, No. 4, 301–317. https://doi.org/10.1002/smj.4250140406
16. Good, D.H., Nadiri, M.I., Roller, L.H., Sickles, R.C. (1993). Efficiency and productivity growth comparisons of European and U.S. airlines: A first look at the data. The Journal of Productivity Analysis, Vol. 4, 115–125. https://doi.org/10.1007/BF01073469
17. Parkan, C., Wu, M.-L. (1999). Measurement of the performance of an investment bank using the operational competitiveness rating procedure. Omega, Vol. 27, Issue 2, 201–217. https://doi.org/10.1016/S0305-0483(98)00041-3
18. Fleisher, C.S., Bensoussan, B.E. (2015). Business and Competitive Analysis: Effective Application of New and Classic Methods. Second Edition. New Jersey, Pearson, 448 p. https://doi.org/10.24883/IberoamericanIC.v2i2.37
19. Prescott, J.E., Grant, J.H. (1988). A manager's guide for evaluating competitive analysis techniques. Interfaces, Vol. 18, Issue 3, 10–22. https://doi.org/10.1287/inte.18.3.10
20. Steinhaus, H. (1956). Sur la division des corps matériels en parties. Bulletin L’Académie Polonaise des Science, Vol. 4, No. 12, 801–804. (In French). Available at: http://laurent-duval.eu/Documents/Steinhaus_H_1956_j-bull-acad-polon-sci_division_cmp-k-means.pdf
21. Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, Vol. 28, Issue 2, 129–137. https://doi.org/10.1109/TIT.1982.1056489
22. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. University of California Press, 281–297. Available at: https://sci2s.ugr.es/keel/pdf/algorithm/congreso/1967-MacQueen-MSP.pdf
23. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, Issue 7, 881–892. https://doi.org/10.1109/TPAMI.2002.1017616
24. Ostrovsky, R., Rabani, Yu., Schulman, L.J., Swamy, C. (2006). The effectiveness of Lloyd-type methods for the k-means problem. Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06). IEEE, 165–176. https://doi.org/10.1109/FOCS.2006.75
25. Ahmed, M., Seraj, R., Islam, S.M.S. (2020). The k-means algorithm: A comprehensive survey and performance evaluation. Electronics, Vol. 9, Issue 8, 1295. https://doi.org/10.3390/electronics9081295
26. Arthur, D., Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding. Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2007). Society for Industrial and Applied Mathematics3600 University City Science Center Philadelphia, 1027–1035. https://doi.org/10.1145/1283383.1283494
27. Abdulnassar, A.A., Nair, L.R. (2023). Performance analysis of Kmeans with modified initial centroid selection algorithms and developed Kmeans9+ model. Measurement: Sensors, Vol. 25, 100666. https://doi.org/10.1016/j.measen.2023.100666
28. Ay, M., Özbakır, L., Kulluk, S., Gülmez, B., Öztürk, G., Özer, S. (2023). FC-Kmeans: Fixed-centered K-means algorithm. Expert Systems with Applications, Vol. 211, 118656. https://doi.org/10.1016/j.eswa.2022.118656
29. Li, J., Li, J., Wang, D., et al. (2023). Hierarchical and partitioned planning strategy for closed-loop devices in low-voltage distribution network based on improved KMeans partition method. Energy Reports, Vol. 9, 477–485. https://doi.org/10.1016/j.egyr.2023.05.161
30. He, J., Jiang, D., Zhang, D., Li, J., Fei, Q (2022). Interval model validation for rotor support system using Kmeans Bayesian method. Probabilistic Engineering Mechanics, Vol. 70, 103364. https://doi.org/10.1016/j.probengmech.2022.103364
31. Nielsen, F. (2016). Introduction to HPC with MPI for Data Science. Springer, 282 p. https://doi.org/10.1007/978-3-319-21903-5
32. Sibson, R. (1973). SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, Vol. 16, Issue 1, 30–34. https://doi.org/10.1093/comjnl/16.1.30
33. Defays, D. (1977). An efficient algorithm for a complete link method. The Computer Journal, Vol. 20, Issue 4, 364–366. https://doi.org/10.1093/comjnl/20.4.364
34. Eppstein, D. (2000). Fast hierarchical clustering and other applications of dynamic closest pairs. Journal of Experimental Algorithmics (JEA), Vol. 5, 1–es. https://doi.org/10.1145/351827.351829
35. Riyahi, M., Martín, A.G. (2025). Optimizing Capacity Expansion Modeling with a Novel Hierarchical Clustering and Systematic Elbow Method: A Case study on Power and Storage Units in Spain. Energy, Vol. 323, 135788. https://doi.org/10.1016/j.energy.2025.135788
36. Tang, Z., Wang, L., Guo, S., Liang, G., Zhang, W., Zhang, L., Rui, M., Guan, G., Wang, Yu. (2025). Study on modular design methodology of marine SMR system based on fuzzy hierarchical clustering and improved genetic algorithm. Progress in Nuclear Energy, Vol. 185, 105739. https://doi.org/10.1016/j.pnucene.2025.105739
37. Zhang, X., Wang, Y.-L., Byun, H. (2025). Enhancing energy saving and reducing latency by divisive hierarchical clustering with Multi-UAVs in WSANs. Applied Soft Computing, 112861. https://doi.org/10.1016/j.asoc.2025.112861
38. Ester, M., Kriegel, H.P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the 1996 Knowl-edge Discovery and Data Mining (KDD’96). AAAI Press, 226–231. Available at: https://cdn.aaai.org/KDD/1996/KDD96-037.pdf
39. Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X. (2017). DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems, Vol. 42, No. 3, 1–21. https://doi.org/10.1145/3068335
40. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD '99). ACM New York, 49–60. https://doi.org/10.1145/304182.304187
41. Campello, R.J., Moulavi, D., Sander, J. (2013). Density-based clustering based on hierarchical density estimates. In: Advances in Knowledge Discovery and Data Mining. Pacific-Asia Conference on Knowledge Discovery and Data Mining. Edited by J. Pei, V.S. Tseng, L. Cao, H. Motoda, G. Xu. Springer, 160–172. https://doi.org/10.1007/978-3-642-37456-2_14
42. Campello, R.J., Moulavi, D., Zimek, A., Sander, J. (2015). Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 10, No. 1, 1–51. https://doi.org/10.1145/2733381
43. Bíró, P. Kovács, B.B.H., Novák, T., Erdélyi, M. (2025). Cluster parameter-based DBSCAN maps for image characterization. Computational and Structural Biotechnology Journal, Vol. 27, 920–927. https://doi.org/10.1016/j.csbj.2025.02.037
44. Ozer, F.C., Tuydes-Yaman, H., Dalkic-Melek, G. (2024). Increasing the precision of public transit user activity location detection from smart card data analysis via spatial-temporal DBSCAN. Data & Knowledge Engineering, Vol. 153, 102343. https://doi.org/10.1016/j.datak.2024.102343
45. Mardani, K., Maghooli, K., Farokhi, F. (2025). Segmentation of coronary arteries from X-ray angiographic images using density based spatial clustering of applications with noise (DBSCAN). Biomedical Signal Processing and Control, Vol. 101, 107175. https://doi.org/10.1016/j.bspc.2024.107175
46. Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, Vol. 43, 59–69. https://doi.org/10.1007/BF00337288
47. Urme, O., Reza, S., Adham, Md I., Sattar, G.S. (2025). Arsenic, manganese, and iron concentration in groundwater of northwestern part of Bangladesh using self-organizing maps: Implication for health risk assessment. Heliyon, Vol. 11, Issue 2, e41805. https://doi.org/10.1016/j.heliyon.2025.e41805
48. Boubekki, A., Kampffmeyer, M., Brefeld, U., Jenssen, R. (2021). Joint optimization of an autoencoder for clustering and embedding. Machine Learning, Vol. 110, No. 7, 1901–1937. https://doi.org/10.1007/s10994-021-06015-5
49. Pulgar, F.J., Charte, F., Rivera, A.J., del Jesus, M.J. (2018). AEkNN: An AutoEncoder kNN-based classifier with built-in dimensionality reduction. ArXiv Preprint arXiv:1802.08465, 35 p. https://doi.org/10.48550/arXiv.1802.08465
50. John Wiley & Sons (1983). Understanding Robust and Exploratory Data Analysis. Edited by D.C. Hoaglin, F. Mosteller, J.W. Tukey. John Wiley & Sons, 447 p. Available at: https://archive.org/details/understandingrob0000unse/page/n7/mode/1up
51. Chandola, V., Banerjee, A., Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, Vol. 41, No. 3, 1–58. https://doi.org/10.1145/1541880.1541882
52. Ward Jr, J.H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, Vol. 58, Issue 301, 236–244. https://doi.org/10.1080/01621459.1963.10500845
53. Davies, D.L., Bouldin, D.W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-1, Issue 2, 224–227. https://doi.org/10.1109/TPAMI.1979.4766909
54. Caliński, T., Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, Vol. 3, Issue 1, 1–27. https://doi.org/10.1080/03610927408827101
55. Rousseeuw, P.J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, Vol. 20, 53–65. doi.org/10.1016/0377-0427(87)90125-7
About Authors
Lev Alexeevich Bulanov
Post-Graduate Student, Department of Economic Safety of Industrial Complexes, Ural Federal University named after the first President of Russia B.N. Yeltsin, Ekaterinburg, Russia (620002, Ekaterinburg, Mira street, 19); ORCID https://orcid.org/0009-0001-0242-0127 e-mail: levbulanov2013@yandex.ru
Alexei Vladimirovich Kalina
Candidate of Technical Sciences, Associate Professor, Department of Economic Safety of Industrial Complexes, Ural Federal University named after the first President of Russia B.N. Yeltsin, Ekaterinburg, Russia (620002, Ekaterinburg, Mira street, 19), Senior Researcher, The Center of Economic Security, Institute of Economics, The Ural Branch of Russian Academy of Sciences, Yekaterinburg, Russia (620014, Yekaterinburg, Moskovskaya street, 29); ORCID https://orcid.org/0000-0003-0376-2505 e-mail: alexkalina74@mail.ru
Vadim Vasilyevich Krivorotov
Doctor of Economics, Professor, Head of Department of Economic Safety of Industrial Complexes, Ural Federal University named after the first President of Russia B.N. Yeltsin, Ekaterinburg, Russia (620002, Ekaterinburg, Mira street, 19); ORCID https://orcid.org/0000-0002-7066-0325 e-mail: v_krivorotov@mail.ru
For citation
Bulanov, L.A., Kalina, A.V., Krivorotov, V.V. (2025). Clustering of Russian Manufacturing Companies by Indicators of Their Financial Condition Using Machine Learning Technologies. Journal of Applied Economic Research, Vol. 24, No. 2, 584-621. https://doi.org/10.15826/vestnik.2025.24.2.020
Article info
Received March 21, 2025; Revised April 7, 2025; Accepted April 12, 2025.
DOI: https://doi.org/10.15826/vestnik.2025.24.2.020
Download full text article:
~4 MB, *.pdf
(Uploaded
15.06.2025)
Created / Updated: 2 September 2015 / 20 September 2021
© Federal State Autonomous Educational Institution of Higher Education «Ural Federal University named after the first President of Russia B.N.Yeltsin»
Remarks?
select the text and press:
Ctrl + Enter
Portal design: Artsofte
©Ural Federal University named the first President of Russia B.N.Yeltsin (Website)