January 2024

Explainable Machine Learning Techniques to Predict Muscle Injuries in Professional Soccer Players through Biomechanical Analysis

Authors: Mailyn Calderon-Diaz 1, 2, 3, Rony Silvestre Aguirre 4, Juan P. Vasconez 1, Roberto Yanez 4, Matias Roby 4, Marvin Querales 5, Rodrigo Salas 2, 3, 6


  1. Faculty of Engineering, Universidad Andres Bello, Santiago 7550196, Chile
  2. Ph.D. Program in Health Sciences and Engineering, Universidad de Valparaiso, Valparaiso 2362735, Chile
  3. Millennium Institute for Intelligent Healthcare Engineering (iHealth), Valparaiso 2362735, Chile
  4. Laboratorio de Biomecanica, Centro de Innovacion Clinica MEDS, Santiago 7691236, Chile
  5. School of Medical Technology, Universidad de Valparaiso, Valparaiso 2362735, Chile
  6. School of Biomedical Engineering, Universidad de Valparaiso, Valparaiso 2362735, Chile

Journal: Sensors - January 2024, Volume 24, Issue 1, Article no. 119 (DOI: 10.3390/s24010119)

There is a significant risk of injury in sports and intense competition due to the demanding physical and psychological requirements. Hamstring strain injuries (HSIs) are the most prevalent type of injury among professional soccer players and are the leading cause of missed days in the sport. These injuries stem from a combination of factors, making it challenging to pinpoint the most crucial risk factors and their interactions, let alone find effective prevention strategies.

Recently, there has been growing recognition of the potential of tools provided by artificial intelligence (AI). However, current studies primarily concentrate on enhancing the performance of complex machine learning models, often overlooking their explanatory capabilities. Consequently, medical teams have difficulty interpreting these models and are hesitant to trust them fully.

In light of this, there is an increasing need for advanced injury detection and prediction models that can aid doctors in diagnosing or detecting injuries earlier and with greater accuracy. Accordingly, this study aims to identify the biomarkers of muscle injuries in professional soccer players through biomechanical analysis, employing several ML algorithms such as decision tree (DT) methods, discriminant methods, logistic regression, naive Bayes, support vector machine (SVM), K-nearest neighbor (KNN), ensemble methods, boosted and bagged trees, artificial neural networks (ANNs), and XGBoost. In particular, XGBoost is also used to obtain the most important features.

The findings highlight that the variables that most effectively differentiate the groups and could serve as reliable predictors for injury prevention are the maximum muscle strength of the hamstrings and the stiffness of the same muscle. With regard to the 35 techniques employed, a precision of up to 78% was achieved with XGBoost, indicating that by considering scientific evidence, suggestions based on various data sources, and expert opinions, it is possible to attain good precision, thus enhancing the reliability of the results for doctors and trainers. Furthermore, the obtained results strongly align with the existing literature, although further specific studies about this sport are necessary to draw a definitive conclusion.


Figure 2. Biomechanical test procedure.


Keywords: machine learning explainability, sport medicine, hamstring injuries, soccer player, XGBoost

A notable disparity exists between academic research outcomes and their practical implementation in medical practice. Medical professionals hesitate to rely on decisions generated by opaque black box models lacking comprehensive and easily understandable explanations. Consequently, ML techniques utilized in clinical settings typically avoid complex models in favor of simpler and more interpretable ones, albeit at the expense of precision or intricacy. In this context, applying the XGBoost technique instills confidence in the outcomes and offers a more interpretable perspective from a medical standpoint. The results from this technique indicate that favorable precision can be achieved by incorporating scientific evidence, suggestions based on diverse data sources, and expert opinions, thereby enhancing the trustworthiness of the results for doctors and trainers.

Moreover, the obtained results strongly align with the existing literature, although additional specific studies within this sport remain imperative to establish a definitive statement. As already known, the prediction of hamstring injuries in soccer using machine learning techniques is a constantly evolving area of research. In this sense, it would be ideal to consider additional variables such as anthropometric measurements, training levels, nutritional conditions, physiological variables, biometric data, medical images, on-field performance data, and even genetic variables. In terms of ML analytics, the integration of machine learning models with fuzzy logic can be investigated to create hybrid systems. These systems can leverage the capabilities of machine learning and the interpretability of fuzzy logic to improve accuracy and model understanding. The development of systems that use fuzzy logic to translate the rules extracted by machine learning models into a language understandable by sports professionals would facilitate both the interpretation of model decisions and preventative action.