Name: | Description: | Size: | Format: | |
---|---|---|---|---|
28.53 MB | Adobe PDF |
Abstract(s)
A Inteligência Artificial (IA) é uma área que tem vindo a ganhar destaque recentemente, particularmente a subárea de Aprendizagem Automática (AA). Nesta subárea, predominam algoritmos de aprendizagem caixa-preta, isto é, modelos cujos parâmetros internos não são diretamente observáveis pelo operador. As redes neuronais são um exemplo típico deste caso, dado que possuem um número de parâmetros de tal ordem de grandeza que a sua interpretação pelo ser humano torna-se impossível. Para além da elevada dimensionalidade, a própria estrutura deste tipo de modelos não pode ser interpretada, na medida em que dificilmente se consegue dar sentido aos parâmetros. Um contexto onde a precisão dos algoritmos é crucial é no domínio médico, onde a decisão de um algoritmo impactará a saúde de um indivíduo. Neste contexto, revela-se até perigoso não estabelecer nexos de causalidade entre os dados de entrada e a resposta de um modelo. Sem acesso ao “raciocínio” do modelo este não se pode contestar e portanto ter-se-ia de o aceitar ou descartar de forma dogmática, já que não existem justificações. Para tentar mitigar este efeito surge a eXplainable Artificial Intelligence (XAI). Esta área tem o objetivo de extrair explicações sobre as tomadas de decisão de modelos. Dois métodos de destaque desta área são o Local Interpretable Model-agnostic Explanations (LIME) e o SHapley Additive exPlanations (SHAP). Estas duas técnicas tiram partido apenas dos dados de entrada e saída do modelo sem aceder aos componentes internos do mesmo. Quando os dados estão num formato tabular, as explicações extraídas por estas técnicas indicam quais as características mais relevantes para uma determinada resposta do modelo. Neste trabalho, exploram-se várias técnicas de explicabilidade, através da sua aplicação em datasets sintéticos gerados controladamente e datasets do domínio médico e subsequente análise dos resultados obtidos. Foram analisados dois datasets reais. Primeiramente, foi analisado um dataset de domínio tabular designado por Diagnosis AlzheimeR WIth haNdwriting (DARWIN), que visa a deteção de Alzheimer através de tarefas de escrita. Neste conjunto de dados atingiu-se uma taxa de acerto média de 91% por parte dos classificadores Random Forest (RF) e Explainable Boosting Machine (EBM). Em segundo lugar foi analisado um dataset de imagens de ressonância magnética que visa distinguir três tipos de tumor. Neste dataset destacou-se a Convolutional Neural Network (CNN) ResNet50 atingiu uma taxa de acerto média de 90%.
Abstract Artificial Intelligence (AI) is an area that has been gaining prominence recently, particularly the sub-area of Machine Learning (ML). In this sub-area, black-box learning algorithms predominate, i.e. models whose internal parameters are not directly observable by the operator. Neural networks are a typical example of this, since they have such a large number of parameters that they are impossible for humans to interpret. In addition to the high dimensionality, the very structure of this type of model cannot be interpreted, as it is difficult to make sense of the parameters. One context where the accuracy of algorithms is crucial is in the medical field, where an algorithm’s decision will impact on an individual’s health. In this context, it is even dangerous not to establish causal links between the input data and the response of a model. Without access to the model’s “reasoning”, it cannot be challenged and would therefore have to be dogmatically accepted or discarded, since there is no justification. To try to mitigate this effect, eXplainable Artificial Intelligence (XAI) emerges. This area aims to extract explanations about decision-making from models. Two prominent methods in this area are Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP). These two techniques only take advantage of the model’s input and output data without accessing its internal components. When the data is in a tabular format, the explanations extracted by these techniques indicate which characteristics are most relevant to a given model response. In this work, various explainability techniques are explored by applying them to controllably generated synthetic datasets and datasets from the medical domain and subsequently analyzing the results obtained. Two real datasets were analyzed. Firstly, a tabular domain dataset called Diagnosis AlzheimeR WIth haNdwriting (DARWIN) was analyzed, which aims to detect Alzheimer’s through writing tasks. On this dataset, the Random Forest (RF) and Explainable Boosting Machine (EBM) classifiers achieved an average hit rate of 91%. Secondly, a dataset of magnetic resonance images was analyzed in order to distinguish three types of tumor. In this dataset, Convolutional Neural Network (CNN) stood out. ResNet50 achieved an average hit rate of 90%.
Abstract Artificial Intelligence (AI) is an area that has been gaining prominence recently, particularly the sub-area of Machine Learning (ML). In this sub-area, black-box learning algorithms predominate, i.e. models whose internal parameters are not directly observable by the operator. Neural networks are a typical example of this, since they have such a large number of parameters that they are impossible for humans to interpret. In addition to the high dimensionality, the very structure of this type of model cannot be interpreted, as it is difficult to make sense of the parameters. One context where the accuracy of algorithms is crucial is in the medical field, where an algorithm’s decision will impact on an individual’s health. In this context, it is even dangerous not to establish causal links between the input data and the response of a model. Without access to the model’s “reasoning”, it cannot be challenged and would therefore have to be dogmatically accepted or discarded, since there is no justification. To try to mitigate this effect, eXplainable Artificial Intelligence (XAI) emerges. This area aims to extract explanations about decision-making from models. Two prominent methods in this area are Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP). These two techniques only take advantage of the model’s input and output data without accessing its internal components. When the data is in a tabular format, the explanations extracted by these techniques indicate which characteristics are most relevant to a given model response. In this work, various explainability techniques are explored by applying them to controllably generated synthetic datasets and datasets from the medical domain and subsequently analyzing the results obtained. Two real datasets were analyzed. Firstly, a tabular domain dataset called Diagnosis AlzheimeR WIth haNdwriting (DARWIN) was analyzed, which aims to detect Alzheimer’s through writing tasks. On this dataset, the Random Forest (RF) and Explainable Boosting Machine (EBM) classifiers achieved an average hit rate of 91%. Secondly, a dataset of magnetic resonance images was analyzed in order to distinguish three types of tumor. In this dataset, Convolutional Neural Network (CNN) stood out. ResNet50 achieved an average hit rate of 90%.
Description
Dissertação para obtenção do Grau de Mestre em Engenharia Informática e Multimédia
Keywords
Aprendizagem automática Domínio médico Explicabilidade Inteligência artificial Inteligência artificial explicável Interpretabilidade Artificial intelligence Explainability Explainable artificial intelligence Interpretability Machine learning Medicine
Citation
MOREIRA, Alexandre Vilela Reis de Melo – Explicabilidade de modelos de classificação no domínio médico. Lisboa: Instituto Superior de Engenharia de Lisboa. 2024. Dissertação de Mestrado.