| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 5.44 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
Nos Ășltimos anos, os assistentes digitais tornaram-se cada vez mais populares como meio de interação entre utilizadores e sistemas computacionais. No entanto, a maioria das soluçÔes existentes Ă© proprietĂĄria e fortemente integrada em ecossistemas fechados, limitando a flexibilidade e transparĂȘncia. Esta tese propĂ”e o desenvolvimento de um Assistente Digital (AD) para ambiente desktop Windows, modular, extensĂvel e baseado em tecnologias de acesso aberto. O sistema integra reconhecimento automĂĄtico de fala (ASR), sĂntese de fala (TTS), processamento de linguagem natural (PLN) e uma interface grĂĄfica interativa. A arquitetura modular permite substituir ou expandir funcionalidades sem comprometer o nĂșcleo do sistema. Um modelo de linguagem em larga escala (LLM) Ă© utilizado para interpretar comandos em linguagem natural, garantindo flexibilidade na compreensĂŁo de instruçÔes. Foram implementadas funcionalidades como execução de comandos locais e integração com serviços externos (Google Calendar e Gmail). Todos os comandos foram avaliados com LLM de diferentes dimensĂ”es. Os resultados mostraram que o desempenho estĂĄ diretamente ligado ao modelo utilizado: modelos menores apresentaram falhas ocasionais, enquanto os de maior escala garantiram elevada precisĂŁo e consistĂȘncia. Em todos os casos, os tempos mĂ©dios de resposta mantiveram-se baixos, na ordem dos dĂ©cimos de segundo. Para avaliar a usabilidade, foi aplicado um questionĂĄrio baseado na mĂ©trica SUS a 15 utilizadores, com resultados muito positivos (pontuação mĂ©dia de 90.33 em 100). Os participantes mostraram facilidade na execução das tarefas e sugeriram melhorias relevantes. A solução confirma a viabilidade de um AD modular, expansĂvel e open source para desktop. O trabalho constitui uma base sĂłlida para futuras evoluçÔes, permitindo a integração de novos mĂłdulos e adoção de diferentes LLM, representando um passo relevante no desenvolvimento de assistentes digitais mais abertos e adaptĂĄveis.
Abstract In recent years, digital assistants have become increasingly popular as a means of interaction between users and computer systems. However, most existing solutions are proprietary and tightly integrated into closed ecosystems, limiting both flexibility and transparency. This thesis proposes the development of a Personal Digital Assistant (PDA) for the Windows desktop environment, designed to be modular, extensible, and based on open technologies. The developed system integrates Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Natural Language Processing (NLP), and an interactive graphical interface. Its modular architecture allows for the replacement or expansion of features without compromising the system core. A large language model (LLM) is used to interpret natural language commands, providing flexibility in understanding user instructions. Features such as execution of local commands and integration with external services (Google Calendar and Gmail) were implemented. All commands were evaluated using LLMs of different sizes. The results showed that system performance is closely tied to the model used: smaller models exhibited occasional errors, while larger models provided high precision and consistency. In all scenarios, the assistant maintained low average response times, typically under one second. To assess usability, a SUS-based questionnaire was conducted with 15 participants, yielding highly positive results (average score of 90.33 out of 100). Participants successfully completed the proposed tasks and provided relevant suggestions for future improvements. The proposed solution confirms the feasibility of a modular, extensible, and open-source PDA for desktop environments. The work lays a solid foundation for future evolution, enabling the integration of new modules and adoption of different LLMs according to the application context, marking a relevant step towards more open and adaptable digital assistants.
Abstract In recent years, digital assistants have become increasingly popular as a means of interaction between users and computer systems. However, most existing solutions are proprietary and tightly integrated into closed ecosystems, limiting both flexibility and transparency. This thesis proposes the development of a Personal Digital Assistant (PDA) for the Windows desktop environment, designed to be modular, extensible, and based on open technologies. The developed system integrates Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Natural Language Processing (NLP), and an interactive graphical interface. Its modular architecture allows for the replacement or expansion of features without compromising the system core. A large language model (LLM) is used to interpret natural language commands, providing flexibility in understanding user instructions. Features such as execution of local commands and integration with external services (Google Calendar and Gmail) were implemented. All commands were evaluated using LLMs of different sizes. The results showed that system performance is closely tied to the model used: smaller models exhibited occasional errors, while larger models provided high precision and consistency. In all scenarios, the assistant maintained low average response times, typically under one second. To assess usability, a SUS-based questionnaire was conducted with 15 participants, yielding highly positive results (average score of 90.33 out of 100). Participants successfully completed the proposed tasks and provided relevant suggestions for future improvements. The proposed solution confirms the feasibility of a modular, extensible, and open-source PDA for desktop environments. The work lays a solid foundation for future evolution, enabling the integration of new modules and adoption of different LLMs according to the application context, marking a relevant step towards more open and adaptable digital assistants.
Descrição
Palavras-chave
Assistente digital pessoal Interação do utilizador Modelos de linguagem de grande escala Processamento de linguagem natural Reconhecimento de fala SĂntese de fala Large language models Natural language processing Personal digital assistant Speech recognition User interaction Text to speech
