Mestrado em Informática
URI Permanente para esta coleção
Nível: Mestrado Acadêmico
Ano de início:
Conceito atual na CAPES:
Ato normativo:
Periodicidade de seleção:
Área(s) de concentração:
Url do curso:
Navegar
Submissões Recentes
- ItemAnálise de arquiteturas baseadas em transformers na transcrição de fala e descrição de áudio de fundo simultâneos em cenários sonoros mistos(Universidade Federal do Espírito Santo, 2025-03-26) Silva, João Vitor Roriz da; Boldt, Francisco de Assis; https://orcid.org/0000-0001-6919-5377; http://lattes.cnpq.br/0385991152092556; Badue, Claudine Santos; https://orcid.org/0000-0003-1810-8581; http://lattes.cnpq.br/1359531672303446; https://orcid.org/; http://lattes.cnpq.br/8121638031129636; Souza, Alberto Ferreira de; https://orcid.org/0000-0003-1561-8447; http://lattes.cnpq.br/7573837292080522; Paixão, Thiago Meireles; https://orcid.org/0000-0003-1554-6834; http://lattes.cnpq.br/2961730349897943This work investigates how two specialized neural networks—a speech transcription model (Whisper) and a general audio captioning model (Prompteus)—can be jointly leveraged to process mixed audio inputs containing both speech and non-speech events. We construct the Clotho Voice dataset by merging speech recordings from the Common Voice 5.1 corpus and general sounds from the Clotho 2.1 dataset. Through a series of controlled experiments, we examine how each model’s performance degrades when presented with overlapping speech and background sounds. Results show that Whisper excels at transcription when speech dominates the input signal, yet its accuracy diminishes in the presence of substantial non speech noise. Conversely, Prompteus demonstrates high performance in purely background oriented settings but exhibits a decline in descriptive capability as speech levels increase. We also highlight how preprocessing steps—such as normalization and resampling—impact borderline cases, revealing that subtle audio features are crucial for robust event detection in challenging acoustic environments. Our findings underscore the importance of tailored training and data augmentation strategies to mitigate performance loss in mixed audio scenarios. By integrating the complementary strengths of speech-focused and background focused models, we offer a pathway toward more comprehensive audio understanding systems suitable for noisy, real-world applications, including industrial automation and assistive technologies. This research paves the way for developing hybrid frameworks that capture both spoken language and context-rich environmental cues in a single, unified approach
- ItemTRAJES: um arcabouço para geração e avaliação de modelos de predição de trajetórias veiculares(Universidade Federal do Espírito Santo, 2024-12-12) Krohling, Breno Aguiar; Comarela, Giovanni Ventorim; Mota, Vinícius Fernandes Soares; https://orcid.org/0000-0002-8341-8183; Dias, Diego Roberto Colombo; Rettore, Paulo Henrique LopesVehicle trajectories prediction enables traffic management optimization and facilitates solutions that require knowledge of where a vehicle, or its driver, is heading. To use such information on a large scale, it is necessary to employ models capable of generalizing complex movement patterns across an entire region or city. To achieve this, an end-to-end framework called TRAJES (Trajectory Estimator) was proposed to generate models from urban vehicle mobility data, using trajectories consisting only of geolocation information. The model generation and selection are based on concrete metrics, such as the actual distance between predicted and real points, and the proposed Hit Race Accuracy metric, which evaluates model performance based on regions of interest throughout the entire city. The framework was employed to create models capable of predicting vehicle positions in both the near and distant future, tested on real-world datasets collected in the cities of Porto and San Francisco. The results demonstrated the ability to generalize effective models for both prediction scenarios, indicating their viability as an intermediate step for external solutions, particularly those requiring knowledge of a vehicle’s future region.
- ItemAnalysis of bias in GPT language models through fine-tuning with anti-vaccination speech(Universidade Federal do Espírito Santo, 2024-12-02) Turi, Leandro Furlam; Badue, Claudine; Souza, Alberto Ferreira de; https://orcid.org/0000-0003-1561-8447; Pacheco, Andre Georghton Cardoso; Almeida Junior, Jurandy Gomes deWe examined the effects of integrating data containing divergent information, particularly concerning anti-vaccination narratives, in training a GPT-2 language model by fine-tuning it using content from anti-vaccination groups and channels on Telegram. Our objective was to analyze the model’s ability to generate coherent and rationalized texts compared to a model pre-trained on OpenAI’s WebText dataset. The results demonstrate that fine-tuning a GPT-2 model with biased data leads the model to perpetuate these biases in its responses, albeit with a certain degree of rationalization, highlighting the importance of using reliable and high-quality data in the training of natural language processing models and underscoring the implications for information dissemination through these models. We also explored the impact of data poisoning by incorporating anti-vaccination messages combined with general group messages in different proportions, aiming to understand how exposure to biased data can influence text generation and the introduction of harmful biases. The experiments highlight the change in frequency and intensity of anti-vaccination content generated by the model and elucidate the broader implications for reliability and ethics in using language models in sensitive applications. This study provides social scientists with a tool to explore and understand the complexities and challenges associated with misinformation in public health through the use of language models, particularly in the context of vaccine misinformation.
- ItemFrameWeb-LD : uma abordagem baseada em ontologias para a Integração de Sistemas de Informação Web e a Web Semântica(Universidade Federal do Espírito Santo, 2017-11-20) Celino, Danillo Ricardo; Vítor Estêvão Silva Souza; https://orcid.org/0000-0003-1869-5704; http://lattes.cnpq.br/2762374760685577; https://orcid.org/0000-0002-6570-2164; http://lattes.cnpq.br/6786947145681297; Almeida, João Paulo Andrade; https://orcid.org/0000-0002-9819-3781; http://lattes.cnpq.br/4332944687727598; Siqueira, Frank Augusto; https://orcid.org/0000-0002-8275-5751; http://lattes.cnpq.br/6246567808516505With the enormous amount of data available on the Web, Linked Data technologies have been proposed to try and achieve the vision of the Semantic Web, allowing the efficient access, discovery and combination of the available data. Such data should be published in a structured way and bound to known vocabularies, so they can be understood by software agents. Moreover, the abstract conceptual models behind this data, i.e., their ontologies, can also have a great influence in the adoption of a Linked Data set and its vocabularies. In 2007, a Web Engineering method for the design and development of Web applications based on frameworks, named FrameWeb, was proposed, along with an extension of the method, called S-FrameWeb, that proposed the subsequent integration of the applica tion’s data with the Semantic Web. Given the advances of the literature in this area of research, such as well-founded ontologies and the evolution of Linked Data technologies, we propose an evolution of S-FrameWeb called FrameWeb-LD, an approach for the integration of Web-based Information Systems with the Semantic Web. Our proposal uses well-founded languages and methods for the construction of ontologies and aids developers in publishing the application’s data and services in the Web of Data, by offering a system atic process that brings to architectural design models how the data from the system is integrated with Semantic Web vocabularies and a tool that generates automatically most of the source code related to Linked Data publishing
- ItemLane marking detection and classification using spatial-temporal feature pooling(Universidade Federal do Espírito Santo, 2023-07-31) Torres, Lucas Tabelini; Santos, Thiago Oliveira dos; https://orcid.org/0000-0001-7607-635X; http://lattes.cnpq.br/5117339495064254; https://orcid.org/0000-0001-5371-6692; http://lattes.cnpq.br/0954275990134963; Moreira, Gladston Juliano Prates; https://orcid.org/0000-0001-7747-5926; http://lattes.cnpq.br/9902619084565293; Varejão, Flavio Miguel; https://orcid.org/0000-0002-5444-1974; http://lattes.cnpq.br/6501574961643171The lane detection problem has been extensively researched in the past decades, especially since the advent of deep learning. Despite the numerous works proposing solutions to the localization task (i.e., localizing the lane boundaries in an input image), the classification task has not seen the same focus. Nonetheless, knowing the type of lane boundary, particularly that of the ego lane, can be very useful for many applications. For instance, a vehicle might not be allowed by law to overtake depending on the type of the ego lane. Beyond that, very few works take advantage of the temporal information available in the videos captured by the vehicles: most methods employ a single-frame approach. In this work, building upon the recent deep learning-based model LaneATT, we propose an approach to exploit the temporal information and integrate the classification task into the model. This is accomplished by extracting features from multiple frames using a deep neural network (instead of only one as in LaneATT). Our results show that the proposed modifications can improve the detection performance on the most recent benchmark (VIL-100) by 2.34%, establishing a new state-of-the-art. Finally, an extensive evaluation shows that it enables a high classification performance (89.37%) that serves as a future benchmark for the field.