Name: | Description: | Size: | Format: | |
---|---|---|---|---|
1.33 MB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
A indústria de software desempenha um papel essencial no mundo moderno em quase todos os domínios. As vulnerabilidades são predominantes nos sistemas de software e podem resultar num impacto negativo na segurança informática. Embora existam ferramentas para detetar códigos vulneráveis, sua precisão e eficácia ainda é uma questão de pesquisa desafiante. Para definir mecanismos que identificam vulnerabilidades, muitas soluções existentes requerem trabalho árduo dos especialistas. O constante aumento do número de vulnerabilidades reveladas tornou-se uma preocupação importante na indústria de software e no campo da cibersegurança, o que implica que as atuais abordagens para a deteção de vulnerabilidades exigem melhorias adicionais. Isso tem motivado investigadores nas comunidades de engenharia de software e segurança cibernética a aplicar aprendizagem automática para reconhecimento de padrões e características de códigos vulneráveis. Seguindo esta linha de pesquisa, este trabalho apresenta um sistema de deteção de vulnerabilidades baseado em aprendizagem automática que usa análise estática de código para extrair dependências no código e construir o conjunto de dados a partir destes. A dataset foi recolhida a partir da National Vulnerability Database (NVD) e o SAMATE. A dataset contém códigos fonte Java com as vulnerabilidades Null pointer deference e command injections como alvos selecionados para caso de estudo. A Control Flow Graph (CFG) foi utilizada em conjunto com as técnicas de análise estática de código para extração de caracteristicas. Os resultados experimentais demonstram que nossa ferramenta pode alcançar significativamente menos falsos negativos (com um número razoável de falsos positivos) em comparação com outras abordagens. Além disso, aplicamos a ferramenta a produtos de software reais e fomos capazes de identificar vulnerabilidades, apesar do número de falsos positivos.
Software industry plays an essential role in modern world in almost all fields. Vulnerabilities are predominant in software systems and can result in a negative impact to the computer security. Although there are tools to detect vulnerable code, their accuracy and efficacy is still a challenging research question. To define features that identify vulnerabilities, many existing solutions require hard work from human experts. The constant increasing number of revealed security vulnerabilities have become an importante concern in the software industry and in the field of cybersecurity, implying that the current approaches for vulnerability detection demand further improvement. This has motivated researchers in the software engineering and cybersecurity communities to apply machine learning for patterns recognition and characteristics of vulnerable code. Following this research line, this work presents a machine learning based vulnerability detection system that uses static-code analysis to extract dependencies in the code and build data features from these. The dataset was collected from the National Vulnerability Database (NVD) and test cases NIST SAMATE project and contains Java code as selected target programming language with Null pointer deference and command injections vulnerabilities as selected weaknesses. The data samples were generated from the source code of the vulnerable files by utilizing a control flow graph (CFG) to extract features. Data-flow analysis techniques were also used for feature extraction. Experimental results demonstrate that our tool can achieve significantly fewer false negatives (with a reasonable number of false positives) compared to other approaches. We further applied the tool to real software products and were able to identify vulnerabilities, despite the number of false positives.
Software industry plays an essential role in modern world in almost all fields. Vulnerabilities are predominant in software systems and can result in a negative impact to the computer security. Although there are tools to detect vulnerable code, their accuracy and efficacy is still a challenging research question. To define features that identify vulnerabilities, many existing solutions require hard work from human experts. The constant increasing number of revealed security vulnerabilities have become an importante concern in the software industry and in the field of cybersecurity, implying that the current approaches for vulnerability detection demand further improvement. This has motivated researchers in the software engineering and cybersecurity communities to apply machine learning for patterns recognition and characteristics of vulnerable code. Following this research line, this work presents a machine learning based vulnerability detection system that uses static-code analysis to extract dependencies in the code and build data features from these. The dataset was collected from the National Vulnerability Database (NVD) and test cases NIST SAMATE project and contains Java code as selected target programming language with Null pointer deference and command injections vulnerabilities as selected weaknesses. The data samples were generated from the source code of the vulnerable files by utilizing a control flow graph (CFG) to extract features. Data-flow analysis techniques were also used for feature extraction. Experimental results demonstrate that our tool can achieve significantly fewer false negatives (with a reasonable number of false positives) compared to other approaches. We further applied the tool to real software products and were able to identify vulnerabilities, despite the number of false positives.
Description
Dissertação para obtenção do Grau de Mestre em Engenharia Informática e de Computadores
Keywords
Análise estática de código Aprendizagem automática Control flow graph Deteção de vulnerabilidades Extração de caracteristicas Control flow graph Data-flow analysis Machine learning Reaching definitions Static-code analysis
Citation
Conté, Sana - Software weaknesses detection using static-code analysis and machine learning techniques. Lisboa: Instituto Superior de Engenharia de Lisboa, 2023. Dissertação de Mestrado