Explorando métodos de seleção de variáveis e fusão de dados em regressão por vetores de suporte : uma aplicação em petroleômica
Nenhuma Miniatura disponível
Data
2024-03-28
Autores
Cunha, Pedro Henrique Pereira da
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal do Espírito Santo
Resumo
Support Vector Regression (SVR) is considered a black-box machine learning method and has stood out in chemometrics over the past decades, achieving results superior or equal to methods already established in academia. As a black-box method, it is challenging to understand the cause/effect relationship. To address this, variable selection can be applied, a strategy that aims to identify the most influential variables in building the model. This work proposes the development of two variable selection methods - Permutation Subwindow Analysis (SPA) and Noise-Incorporated Permutation Subwindow Analysis (NISPA) - to apply in SVR combined with infrared. SPA and NISPA provided the most accurate models for kinematic viscosity, saturates, and aromatic content. The root mean square error of prediction (RMSEP) for SPA and NISPA were, respectively, 14.3% and 14.6% for kinematic viscosity, 4.7% and 4.4% for saturates content, and 3.4% and 3.1% for aromatic content. Therefore, SPA and NISPA, in addition to generally obtaining faster, more accurate, and more parsimonious models, revealed the most important variables for building SVR models. Another way to improve a model is data fusion, but this strategy has been little studied in SVR. Thus, data fusion was studied using NIR, MIR, and NMR of ¹H and ¹³C combined using low, medium, and high-level fusion. The models generated by data fusion were superior to the models without fusion for most tests. In API density, the application of medium-level fusion using PCA combining MIR and NIR developed a model with better parameters than the model without data fusion. By applying medium level fusion with GA to predict pour point, combining NIR and NMR of ¹H, it was possible to surpass models without fusion, as well as models found in the literature. In total nitrogen, high-level fusion with MIR and NMR of ¹H proved to be statistically better than models without data fusion. This demonstrates that it is possible to extract new information for SVR modeling using data fusion and obtain statistically better models than those derived from isolated analytical sources
Descrição
Palavras-chave
Máquina de vetores de suporte , Seleção de Variáveis , Fusão de dados , Petróleo , Aprendizagem de máquina