| Name: | Description: | Size: | Format: | |
|---|---|---|---|---|
| 1.59 MB | Adobe PDF | 
Advisor(s)
Abstract(s)
The escalating demand and complexity of monitoring services handled by Network Operations Centers (NOCs) have led Mobile Network Operators (MNOs) to prioritize automated solutions for network fault detection and diagnosis. Consequently, various Machine Learning (ML)-based Root Cause Analysis (RCA) systems have been developed, however their lack of explainability poses a challenge due to the predominantly black-box nature of ML models. This paper addresses this issue by presenting a supervised clustering methodology capable of integrating both glass-box and black-box models, the latter complemented by post-hoc explainability techniques. While black-box models excel in predictive capabilities, necessitating post-hoc techniques for explainability, glass-box models prioritize transparent decision-making, fostering a clearer understanding of the model’s behavior. This work delineates a methodology for performing RCA of faults in the User Downlink (DL) Average Throughput Key Performance Indicator (KPI), simultaneously comparing the performance of black-box models (Light Gradient-Boosting Machine (LightGBM) and Extreme Gradient Boosting (XGBoost)) with glass-box models (Logistic Regression (LR) and Explainable Boosting Machine (EBM)). Results revealed that the LightGBM black-box algorithm coupled with the SHapley Additive exPlanations (SHAP) method demonstrated superior performance in fault detection and diagnosis, without compromising the overall explainability. Consequently, it was possible to identify faults related to radio conditions, low network usage in specific user groups, low network capacity, and mobility issues. The paper concludes with practical mitigation strategies for each identified fault cluster.
The escalating demand and complexity of monitoring services handled by Network Operations Centers (NOCs) have led Mobile Network Operators (MNOs) to prioritize automated solutions for network fault detection and diagnosis. Consequently, various Machine Learning (ML)-based Root Cause Analysis (RCA) systems have been developed, however their lack of explainability poses a challenge due to the predominantly black-box nature of ML models. This paper addresses this issue by presenting a supervised clustering methodology capable of integrating both glass-box and black-box models, the latter complemented by post-hoc explainability techniques. While black-box models excel in predictive capabilities, necessitating post-hoc techniques for explainability, glass-box models prioritize transparent decision-making, fostering a clearer understanding of the model’s behavior. This work delineates a methodology for performing RCA of faults in the User Downlink (DL) Average Throughput Key Performance Indicator (KPI), simultaneously comparing the performance of black-box models (Light Gradient-Boosting Machine (LightGBM) and Extreme Gradient Boosting (XGBoost)) with glass-box models (Logistic Regression (LR) and Explainable Boosting Machine (EBM)). Results revealed that the LightGBM black-box algorithm coupled with the SHapley Additive exPlanations (SHAP) method demonstrated superior performance in fault detection and diagnosis, without compromising the overall explainability. Consequently, it was possible to identify faults related to radio conditions, low network usage in specific user groups, low network capacity, and mobility issues. The paper concludes with practical mitigation strategies for each identified fault cluster.
The escalating demand and complexity of monitoring services handled by Network Operations Centers (NOCs) have led Mobile Network Operators (MNOs) to prioritize automated solutions for network fault detection and diagnosis. Consequently, various Machine Learning (ML)-based Root Cause Analysis (RCA) systems have been developed, however their lack of explainability poses a challenge due to the predominantly black-box nature of ML models. This paper addresses this issue by presenting a supervised clustering methodology capable of integrating both glass-box and black-box models, the latter complemented by post-hoc explainability techniques. While black-box models excel in predictive capabilities, necessitating post-hoc techniques for explainability, glass-box models prioritize transparent decision-making, fostering a clearer understanding of the model’s behavior. This work delineates a methodology for performing RCA of faults in the User Downlink (DL) Average Throughput Key Performance Indicator (KPI), simultaneously comparing the performance of black-box models (Light Gradient-Boosting Machine (LightGBM) and Extreme Gradient Boosting (XGBoost)) with glass-box models (Logistic Regression (LR) and Explainable Boosting Machine (EBM)). Results revealed that the LightGBM black-box algorithm coupled with the SHapley Additive exPlanations (SHAP) method demonstrated superior performance in fault detection and diagnosis, without compromising the overall explainability. Consequently, it was possible to identify faults related to radio conditions, low network usage in specific user groups, low network capacity, and mobility issues. The paper concludes with practical mitigation strategies for each identified fault cluster.
Description
Keywords
 Mobile networks   Mobile networks   Root cause analysis   Root cause analysis   Machine learning   Machine learning   Explainable AI   Explainable AI   SHAP   SHAP 
Pedagogical Context
Citation
Cilínio, M., Pereira, M., Duarte, D., Mata, L., & Vieira, P. (2024). Unraveling the root causes of faults in mobile communicatios: A comparative analysis of diferente model explainability techniques. AEU-International Journal of Electronics and Communications, 181, 1-8. https://doi.org/10.1016/j.aeue.2024.155339
Publisher
Elsevier
