| Name: | Description: | Size: | Format: | |
|---|---|---|---|---|
| 4.16 MB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
A Inteligência Artificial (IA) e a Data Science são dois dos assuntos mais debatidos atualmente na Saúde. A IA tem sido considerada como uma parte essencial na resolução de problemas nesse campo e mesmo um quarto paradigma da Ciência como um todo, junto com as componentes Teórica, Experimental e Computacional. Neste trabalho são feitos testes com algoritmos de Machine Learning para a classificação multiclasse de três datasets de imagens de Ressonância Magnética de tumores cerebrais. O objetivo principal dos testes com datasets diferentes é verificar a influência que os dados têm sobre a acurácia de cada um dos algoritmos; foram testados diferentes algoritmos e analisado qual deles (e com quais ajustes de parâmetros, quando há) que apresenta melhor desempenho; foram também testados diferentes splits dos dados para averiguar o quanto a divisão deles interfere no resultado da acurácia. Foram utilizados três datasets (conjuntos de imagens) com 4 categorias de tumor cerebral: glioma, meningioma, tumor da glândula pituitária e sem tumor. O primeiro dataset é composto por 2870 imagens diferentes, o segundo por 7020 imagens, e o terceiro é um subconjunto do segundo, com 2870 imagens, com a mesma distribuição por categorias que o primeiro. Os algoritmos usados foram o Decision Tree (com os parâmetros definidos por defeito), o Random Forest (com variação do número de estimadores); o Linear Discriminant Analysis (que não tem parâmetros para ajustar); o Support Vector Machine com os 4 kernels diferentes disponíveis (Linear, Polinomial, RBF e Sigmoide) e variando o parâmetro de regularização; e o k-Nearest Neighbors, variando o número de vizinhos. Além disso, em cada caso também foram testados três splits diferentes – separação entre dados de treino e teste: 70%, 80% e 90% dos dados para treino. Como medida de desempenho foi usada a acurácia, e realizados testes de significância estatística entre resultados. O algoritmo com melhores resultados foi o Random Forest, que atingiu 96,425%, seguido pelo kernel RBF do Support Vector Machine, com 93,615%. Os resultados são comparáveis a outros trabalhos publicados que apresentam metodologias semelhantes.
ABSTRACT - Artificial Intelligence (AI) and Data Science are two of the most debated subjects currently in Health. AI has been considered an essential part of problem-solving in this field, and even a fourth paradigm of Science as a whole, along with the Theoretical, Experimental, and Computational components. In this work, tests are conducted with Machine Learning algorithms for the multiclass classification of three datasets of MRI images of brain tumors. The main objective of tests with different datasets is to verify the influence that the data have on the accuracy of each of the algorithms; different algorithms were tested and analyzed to which algorithm (and with which parameter settings, if any) had the best performance; different splits were also tested to see how much the division interferes with the accurate result. Three datasets (image sets) with 4 brain tumor categories were tested: glioma, meningioma, pituitary gland tumor, and no tumor. The first dataset is composed of 2870 different images, the second of 7020 images, and the third is a subset of the second, with 2870 images, with the same distribution by categories as the first. The algorithms used for testing were Decision Tree (with parameters defined by default), Random Forest (with variation in the number of estimators); Linear Discriminant Analysis (which has no parameters to adjust); Support Vector Machine with 4 different kernels available (Linear, Polynomial, RBF and Sigmoid) and varying the regularization parameter; and the k-Nearest Neighbors, varying the number of neighbors. Furthermore, in each case, three different splits were also tested – separation between training and test data: 70%, 80%, and 90% of the training data. Accuracy was used as the performance measure and statistical significance tests were performed between results. The algorithm with the best results was Random Forest, which reached an accuracy of 96.425%, followed by the Support Vector Machine's RBF kernel, with 93.615%. The results are comparable to other published works that present similar methodologies.
ABSTRACT - Artificial Intelligence (AI) and Data Science are two of the most debated subjects currently in Health. AI has been considered an essential part of problem-solving in this field, and even a fourth paradigm of Science as a whole, along with the Theoretical, Experimental, and Computational components. In this work, tests are conducted with Machine Learning algorithms for the multiclass classification of three datasets of MRI images of brain tumors. The main objective of tests with different datasets is to verify the influence that the data have on the accuracy of each of the algorithms; different algorithms were tested and analyzed to which algorithm (and with which parameter settings, if any) had the best performance; different splits were also tested to see how much the division interferes with the accurate result. Three datasets (image sets) with 4 brain tumor categories were tested: glioma, meningioma, pituitary gland tumor, and no tumor. The first dataset is composed of 2870 different images, the second of 7020 images, and the third is a subset of the second, with 2870 images, with the same distribution by categories as the first. The algorithms used for testing were Decision Tree (with parameters defined by default), Random Forest (with variation in the number of estimators); Linear Discriminant Analysis (which has no parameters to adjust); Support Vector Machine with 4 different kernels available (Linear, Polynomial, RBF and Sigmoid) and varying the regularization parameter; and the k-Nearest Neighbors, varying the number of neighbors. Furthermore, in each case, three different splits were also tested – separation between training and test data: 70%, 80%, and 90% of the training data. Accuracy was used as the performance measure and statistical significance tests were performed between results. The algorithm with the best results was Random Forest, which reached an accuracy of 96.425%, followed by the Support Vector Machine's RBF kernel, with 93.615%. The results are comparable to other published works that present similar methodologies.
Description
Mestrado em Tecnologias de Física Médica
Keywords
Machine learning Classificação Tumor cerebral Física médica Classification Brain tumor Medical physics
Pedagogical Context
Citation
Garzillo MJ. Classificação de tumores cerebrais com algoritmos de machine learning [dissertation]. Lisboa: Escola Superior de Tecnologia da Saúde de Lisboa/Instituto Politécnico de Lisboa; 2022.
Publisher
Instituto Politécnico de Lisboa, Escola Superior de Tecnologia da Saúde de Lisboa
