Name: | Description: | Size: | Format: | |
---|---|---|---|---|
13.07 MB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
A crescente complexidade e sofisticação das ameaças no ciberespaço têm impulsionado a procura por soluções inovadoras e eficientes no campo da cibersegurança. Neste contexto, conduziu-se uma investigação com o objetivo de avaliar a viabilidade de Large Language Models (LLMs) no que concerne à automatização da geração de código e configurações no âmbito da cibersegurança. A investigação centrou-se em mecanismos de ciberdefesa e aplicações de educação em cibersegurança, com particular ênfase em soluções de geração de honeypots, malware e exercícios de Capture The Flag (CTF). Foram avaliados sete modelos, incluindo o GPT-4, Gemini Pro e Claude Opus 3. A metodologia de avaliação assentou no desenvolvimento de dois mecanismos de avaliação, sendo o primeiro um novo benchmark Cybersecurity Language Understading (CSLU), baseado no Massive Multitask Language Understanding, constituído por questões de escolha múltipla sobre diversos domínios do conhecimento. As prompts foram concebidas com o intuito de avaliar o estado de conhecimento de cada modelo relativamente aos tópicos supracitados. O segundo mecanismo consistiu na avaliação da consistência, criatividade e adaptabilidade dos modelos referente à geração de artefactos. Os resultados evidenciaram uma notória proeminência referente ao tópico de malware, com quatro destes a alcançarem a pontuação máxima. Por outro lado, o desempenho na tarefa de CTF revelou uma maior variação de resultados. De um modo geral, os modelos GPT- 4, Gemini Pro e Claude 3 Opus demonstraram resultados consistentemente superiores entre os modelos estudados. Num segundo momento, pretendeu-se desenvolver uma ferramenta baseada na web, com o objetivo de fornecer uma prova de conceito dos estudos anterior realizados. A referida ferramenta, recorrendo aos melhores LLMs estudados, permite ao utilizador criar e lançar automaticamente serviços de segurança, como os mencionados honeypots ou exercícios de CTF. De uma perspetiva global, estas descobertas sugerem que a aplicação de LLMs em atividades de cibersegurança pode ser altamente vantajosa.
Abstract The growing complexity and sophistication of threats in cyberspace has driven the search for innovative and efficient solutions in the field of cybersecurity. In this context, research was conducted to assess the feasibility of Large Language Models (LLMs) for automating code generation and configurations in the field of cybersecurity. The research focused on cyberdefence mechanisms and cybersecurity education applications, with particular emphasis on solutions for generating honeypots, malware and Capture The Flag (CTF) exercises. Seven models were evaluated, including the GPT-4, Gemini Pro and Claude Opus 3. The evaluation methodology was based on the development of two evaluation mechanisms, the first being a new benchmark Cybersecurity Language Understading (CSLU), based on the Massive Multitask Language Understanding, consisting of multiple-choice questions on various areas of knowledge. The prompts were designed to assess the state of knowledge of each model in relation to the aforementioned topics. The second mechanism consisted of assessing the consistency, creativity and adaptability of the models in terms of artefact generation. The results showed a notable prominence in the topic of malware, with four of them achieving the maximum score. On the other hand, performance in the CTF task revealed a greater variation in results. In general, the GPT-4, Gemini Pro and Claude 3 Opus models showed consistently superior results among the models studied. As a second step, the aim was to develop a web-based tool to provide a proof of concept of the previous studies. This tool, using the best LLMs studied, allows the user to automatically create and launch security services, such as the aforementioned honeypots or CTF exercises. From a global perspective, these findings suggest that the application of LLMs in cybersecurity activities can be highly advantageous.
Abstract The growing complexity and sophistication of threats in cyberspace has driven the search for innovative and efficient solutions in the field of cybersecurity. In this context, research was conducted to assess the feasibility of Large Language Models (LLMs) for automating code generation and configurations in the field of cybersecurity. The research focused on cyberdefence mechanisms and cybersecurity education applications, with particular emphasis on solutions for generating honeypots, malware and Capture The Flag (CTF) exercises. Seven models were evaluated, including the GPT-4, Gemini Pro and Claude Opus 3. The evaluation methodology was based on the development of two evaluation mechanisms, the first being a new benchmark Cybersecurity Language Understading (CSLU), based on the Massive Multitask Language Understanding, consisting of multiple-choice questions on various areas of knowledge. The prompts were designed to assess the state of knowledge of each model in relation to the aforementioned topics. The second mechanism consisted of assessing the consistency, creativity and adaptability of the models in terms of artefact generation. The results showed a notable prominence in the topic of malware, with four of them achieving the maximum score. On the other hand, performance in the CTF task revealed a greater variation in results. In general, the GPT-4, Gemini Pro and Claude 3 Opus models showed consistently superior results among the models studied. As a second step, the aim was to develop a web-based tool to provide a proof of concept of the previous studies. This tool, using the best LLMs studied, allows the user to automatically create and launch security services, such as the aforementioned honeypots or CTF exercises. From a global perspective, these findings suggest that the application of LLMs in cybersecurity activities can be highly advantageous.
Description
Keywords
LLM Cibersegurança Exercícios de CTF Honeypots Deteção de malware Large Language Models AI in cybersecurity CTF challenges Malware detection