Aprendizagem de máquina na solução de problemas químicos: floresta aleatória aplicada à espectrometria na região do infravermelho
Nenhuma Miniatura disponível
Data
2023-02-13
Autores
Nascimento, Márcia Helena Cassago
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal do Espírito Santo
Resumo
Chemometrics began in the 1970s with the publication of a series of studies entitled "Computerized Learning Applied Machines to Chemical Problems" which express the motivation for the emergence of this field of study: the need for multivariate methods developed by chemists to solve chemical problems. Over 54 years, this area has expanded and presented solutions for increasingly complex data generated by modern Analytical Chemistry. Among the machine learning methods adapted to problems from the chemical point of view, this study contributes to a greater understanding, adaptation, and application of the random forest (RF) method. It is an ensemble-based method of learning multiple classifier systems. RF can be as a multivariate calibration model or pattern recognition, the latter being the focus of this thesis. In addition to the historical context, we describe adaptations proposed for the RF method to solve Chemistry problems with different analytical techniques and approaches. In this study, we applied RF for unsupervised pattern recognition as a screening method in a case study of suspected fuel fraud of diesel samples submitted to Fourier transform spectroscopy in the mid-infrared region (FT-MIR). The interpretation of the URF through a principal coordinate graph (PCoA) allowed the screening of samples with adulteration confirmed by the test of physical-chemical parameters. In addition, we adapted and applied the URF method to contribute to another field of study: biospectroscopy. A large part of the studies in this field is to develop alternative diagnosis methods or liquid biopsy. It is possible through biofluids, and spectroscopy associated with chemometric methods to extract information from biochemical changes caused by the disease or infectious agent. We adapted URF to identify a discriminant structure in spectroscopic data from two studies: a noninvasive diagnosis of COVID-19 from saliva samples analyzed by FT-MIR, and a proposal for pattern recognition and diagnosis of COVID-19 from nasopharyngeal swab and FT-MIR. In the first, an ensemble of classification models distinguished saliva samples from COVID19-infected people with an accuracy of 85%, a sensitivity of 93%, and a specificity of 74%. In another, URF was a comprehensive and innovative way: a starting point for selecting relevant variables and input data for classification models. With the URF as input data for classification models, we classified biofluid samples collected with two types of swabs with 87.6% accuracy, 93.6% sensitivity, 79.4% specificity, and 0.898 F-Score. Different approaches in this study contribute to disseminating the versatility and efficiency of the RF method, in addition to innovating its adaptation, taking advantage of the potential of this method for the different problems addressed.
Descrição
Palavras-chave
Quimiometria , Seleção de variáveis , Região do infravermelho médio