REDE NORDESTE DE BIOTECNOLOGIA UNIVERSIDADE FEDERAL DO ESPÍRITO SANTO CENTRO DE CIÊNCIAS DA SAÚDE PROGRAMA DE PÓS-GRADUAÇÃO EM BIOTECNOLOGIA CHRISTIANE MARA GOULART UNOBTRUSIVE TECHNIQUE BASED ON INFRARED THERMAL IMAGING FOR EMOTION RECOGNITION IN CHILDREN- WITH-ASD- ROBOT INTERACTION VITÓRIA 2019 CHRISTIANE MARA GOULART UNOBTRUSIVE TECHNIQUE BASED ON INFRARED THERMAL IMAGING FOR EMOTION RECOGNITION IN CHILDREN- WITH-ASD- ROBOT INTERACTION VITÓRIA 2019 Tese de Doutorado apresentada ao Programa de Pós-Graduação em Biotecnologia da Rede Nordeste de Biotecnologia (RENORBIO) do ponto focal Espírito Santo – Universidade Federal do Espírito Santo (UFES), como requisito parcial para obtenção do título de Doutora em Biotecnologia. Orientador: Prof. Dr. Teodiano Freire Bastos- Filho Coorientadora: Profª. Drª. Eliete Maria de Oliveira Caldeira CHRISTIANE MARA GOULART UNOBTRUSIVE TECHNIQUE BASED ON INFRARED THERMAL IMAGING FOR EMOTION RECOGNITION IN CHILDREN- WITH-ASD- ROBOT INTERACTION Tese apresentada ao Programa de Pós-Graduação em Biotecnologia da Rede Nordeste de Biotecnologia (RENORBIO) do ponto focal: Universidade Federal do Espírito Santo (UFES), como requisito parcial para obtenção do título de Doutora em Biotecnologia. Tese defendida em 19 de fevereiro de 2019. VITÓRIA 2019 Banca examinadora: Prof. Dr. Teodiano Freire Bastos-Filho Universidade Federal do Espírito Santo Orientador Profª. Drª. Eliete Maria de Oliveira Caldeira Universidade Federal do Espírito Santo Coorientadora Profª. Drª. Adriana Madeira Alvares da Silva Universidade Federal do Espírito Santo Examinadora Profª. Drª. Sônia Alves Gouvêa Universidade Federal do Espírito Santo Examinadora Prof. Dr. Adriano de Oliveira Andrade Universidade Federal de Uberlândia Examinador Profª. Drª. Kimberley Adams University of Alberta (Canadá) Examinadora Dedico este trabalho a todas as pessoas com Transtorno do Espectro Autista, seus familiares e profissionais da área. AGRADECIMENTOS Eu agradeço a Deus pelo constante amor e presença em minha vida, por todas as oportunidades concedidas para o meu aprendizado e evolução e por me capacitar para o alcance dos meus objetivos. Agradeço a minha família pelo suporte incondicional. A meus pais, Bolivar e Simone, minha irmã, Adriana, e meu marido, Samuel, que sempre me encorajam e estão do meu lado. Eles são meus tesouros! Agradeço aos meus orientadores, Teodiano e Eliete, pela orientação e por acreditarem em mim e em meu trabalho. Especialmente, agradeço aos amigos Carlos, Denis e Vinícius pelos ensinamentos e constante suporte. Eu também sou grata ao Hamilton, Douglas, Alvaro e Guilherme pelas valiosas contribuições para o projeto de pesquisa. Eu sou grata às Escolas Municipais de Ensino Fundamental de Vitória (Éber Louzada Zippinotti, Álvaro de Castro Mattos e Marechal Mascarenhas de Moraes), Associação dos Amigos dos Autistas do ES (AMAES), profissionais da área, pais e todas as crianças pela participação, confiança em meu trabalho e maravilhosas experiências vivenciadas. Obrigada a todos os colegas do Núcleo de Tecnologia Assistiva (NTA) da UFES por todas as ajudas, conselhos, risadas e agradável ambiente de trabalho. Finalmente, eu agradeço a RENORBIO, ao PPG-Biotec – UFES, CAPES e FAPES pelo suporte financeiro, e a todos aqueles que contribuíram com este trabalho direta ou indiretamente. “What counts in life is not the mere fact that we have lived. It is what difference we have made to the lives of others that will determine the significance of the life we lead.” Nelson Mandela RESUMO Emoções são relevantes para as relações sociais, e indivíduos com Transtorno do Espectro Autista (TEA) possuem compreensão e expressão de emoções prejudicadas. Esta tese consiste em estudos sobre a análise de emoções em crianças com desenvolvimento típico e crianças com TEA (idade entre 7 e 12 anos), por meio do imageamento térmico infravermelho (ITIV), uma técnica segura e não obtrusiva (isenta de contato), usada para registrar variações de temperatura em regiões de interesse (RIs) da face, tais como testa, nariz, bochechas, queixo e regiões periorbital e perinasal. Um robô social chamado N-MARIA (Novo-Robô Autônomo Móvel para Interação com Autistas) foi usado como estímulo emocional e mediador de tarefas sociais e pedagógicas. O primeiro estudo avaliou a variação térmica facial para cinco emoções (alegria, tristeza, medo, nojo e surpresa), desencadeadas por estímulos audiovisuais afetivos, em crianças com desenvolvimento típico. O segundo estudo avaliou a variação térmica facial para três emoções (alegria, surpresa e medo), desencadeadas pelo robô social N-MARIA, em crianças com desenvolvimento típico. No terceiro estudo, duas sessões foram realizadas com crianças com TEA, nas quais tarefas sociais e pedagógicas foram avaliadas tendo o robô N-MARIA como ferramenta e mediador da interação com as crianças. Uma análise emocional por variação térmica da face foi possível na segunda sessão, na qual o robô foi o estímulo para desencadear alegria, surpresa ou medo. Além disso, profissionais (professores, terapeuta ocupacional e psicóloga) avaliaram a usabilidade do robô social. Em geral, os resultados mostraram que o ITIV foi uma técnica eficiente para avaliar as emoções por meio de variações térmicas. No primeiro estudo, predominantes decréscimos térmicos foram observados na maioria das RIs, com as maiores variações de emissividade induzidas pelo nojo, felicidade e surpresa, e uma precisão maior que 85% para a classificação das cinco emoções. No segundo estudo, as maiores probabilidades de emoções detectadas pelo sistema de classificação foram para surpresa e alegria, e um aumento significativo de temperatura foi predominante no queixo e nariz. O terceiro estudo realizado com crianças com TEA encontrou aumentos térmicos significativos em todas as RIs e uma classificação com a maior probabilidade para surpresa. N-MARIA foi um estímulo promissor capaz de desencadear emoções positivas em crianças. A interação criança-com-TEA-e-robô foi positiva, com habilidades sociais e tarefas pedagógicas desempenhadas com sucesso pelas crianças. Além disso, a usabilidade do robô avaliada por profissionais alcançou pontuação satisfatória, indicando a N-MARIA como uma potencial ferramenta para terapias. Palavras-chaves: Transtorno do Espectro Autista. Emoções. Imageamento Térmico Infravermelho. Robô Social ABSTRACT Emotions are relevant for the social relationships, and individuals with Autism Spectrum Disorder (ASD) have emotion understanding and expression impaired. This thesis consists of studies about emotion analysis in typically developing children and children with ASD (aged between 7 and 12 years), through infrared thermal imaging (IRTI), a safe and unobtrusive (contact-free) technique, used to record temperature variations in facial regions of interest (ROIs), such as forehead, nose, cheeks, chin, periorbital and perinasal regions. A social robot called N-MARIA (New-Mobile Autonomous Robot for Interaction with Autistics) was used as emotional stimulus and mediator for social and pedagogical tasks. The first study evaluated the facial thermal variations for five emotions (happiness, sadness, fear, disgust and surprise), triggered by affective audio-visual stimuli, in typically developing children. The second study evaluated the facial thermal variation for three emotions (happiness, surprise and fear), triggered by the social robot N-MARIA, in typically developing children. In the third study, two sessions were carried out with children with ASD, in which social and pedagogical tasks were evaluated having the robot N- MARIA as tool and mediator of the interaction with the children. An emotional analysis through facial thermal variation was possible in the second session, in which the robot was the stimulus to trigger happiness, surprise or fear. Moreover, professionals (teachers, occupational therapist and psychologist) evaluated the usability of the social robot. In general, the results showed IRTI as an efficient technique to evaluate emotions through thermal variations. In the first study, predominant thermal decrements were observed in most ROIs, with the highest emissivity variations induced by disgust, happiness and surprise, and an accuracy greater than 85% for the classification of the five emotions. In the second study, the highest probabilities of emotions detected by the classification system were for surprise and happiness, and a significant temperature increase was predominant in the chin and nose. The third study performed with children with ASD found significant thermal increase in all ROIs and a classification with the highest probability for surprise. N-MARIA was a promising stimulus able to trigger positive emotions in children. The child-with-ASD-and-robot interaction was positive, with social skills and pedagogical tasks successfully performed by the children. In addition, the usability of the robot assessed by professionals achieved great score, indicating N-MARIA as a potential tool for therapies. Keywords: Autism Spectrum Disorder. Emotions. Infrared Thermal Imaging. Social Robot. CONTENTS INTRODUCTION ................................................................................................................. 13 BIOTECHNOLOGY AND ASSISTIVE TECHNOLOGY ................................................. 14 CONTEXT OF THE THESIS ......................................................................................... 15 HYPOTHESIS ............................................................................................................... 16 CONTRIBUTION .......................................................................................................... 17 GOALS ......................................................................................................................... 17 STRUCTURE ................................................................................................................ 18 LITERATURE REVIEW ....................................................................................................... 19 AUTISM SPECTRUM DISORDER ............................................................................... 20 PHYSIOLOGICAL SIGNALS AND EMOTIONS ............................................................ 23 EMOTIONS IN SOCIAL RELATIONSHIPS ................................................................... 25 SENSORS FOR RECORDING PHYSIOLOGICAL SIGNALS ....................................... 26 SOCIAL ROBOTS AND ASD ........................................................................................ 42 N-MARIA ...................................................................................................................... 49 CHAPTER 1 ........................................................................................................................ 56 CHAPTER 2 ........................................................................................................................ 93 CHAPTER 3 ...................................................................................................................... 131 CONCLUSIONS ................................................................................................................ 182 FUTURE WORKS ....................................................................................................... 184 SOCIAL RESPONSIBILITY ........................................................................................ 185 SCIENTIFIC PUBLICATIONS ..................................................................................... 186 REFERENCES ............................................................................................................... 190 ANNEX .......................................................................................................................... 208 APPENDIX ..................................................................................................................... 211 13 Introduction 14 Biotechnology and Assistive Technology Biotechnology is a broad area in which biological processes, organisms, cells or cellular components are exploited to develop new technologies to useful applications in research, agriculture, industry and the clinic (NATURE.COM, 2017). Associated with this definition, several biological signals may be used in studies and works that aim to improve life quality of people in the health and medical fields. Such studies and works also may be included in the Assistive Technology that is becoming more and more targeted in Brazil. Assistive Technology refers to the devices used to support or replace any impaired function, ranging from simple, low-cost, low-tech gadgets (e.g., button, hooks and reaches) to complex high-tech equipment (e.g., power wheelchairs and computer-aided speech devices) (COOK and HUSSEY, 2002). Assistive Technology also may be defined as an interdisciplinary area of knowledge that encompasses products, resources, methodologies, strategies, practices and services that aim to promote the functionality related to the activity and participation of people with disabilities, inabilities or reduced mobility, aiming at their autonomy, independence, quality of life and social inclusion (BRASIL, 2009). There are many categories of Assistive Technology according to different forms of organization and application (BRASIL, 2009). Basically, these categories comprise (BERSCH, 2013): aids for daily living and practical life; aids to enhance visual ability and resources that increase information to people with low vision or blind; aids for hearing impaired or blind people; architectural designs for accessibility; vehicle adaptations; aids of mobility; orthoses and prostheses; postural adequacy; systems of 15 environmental control; resources of accessibility to computer; aid for sport and leisure; augmentative and alternative communication. The Assistive Technology Group (NTA – acronym translated from Portuguese: Núcleo de Tecnologia Assistiva) at Federal University of Espirito Santo (UFES) encompasses research projects that apply knowledge in robotics, prosthesis, virtual environments, brain-computer interface, smart environments, as well as biological signals, such brain and muscles signals, eye tracking, thermal images, facial expressions, among others, which are aimed for rehabilitation, monitoring and therapies for people with several kinds of disorders (NTA - UFES, 2017). Context of the thesis Emotions dictate our behavior and social relations, and their comprehension is much important and increasingly required in current times by the emotional intelligence theme, since this is dependent on the quality of life and the balances of all pillars of the human life (GOLEMAN, 1995; VIEIRA 2017). Therefore, emotions are subjects in multidisciplinary areas of research (neurology, psychology, sociology and computer science) (KROUPI, YAZDANI and EBRAHIMI, 2011). The understanding of emotions allows people to be able to identify intentions and possible emotions of other individuals (Theory of Mind) and, then, adopt appropriate responses (HAPPÉ, 1994). Individuals with Autism Spectrum Disorder have difficulty interpreting others' emotions, expressing them and communicating (APA, 2013; HAPPÉ, 1994). 16 Robots have been used for aiding the social and cognitive development of children with ASD (CABIBIHAN et al., 2013; SCASSELLATI, ADMONI and MATARI´C, 2012). There are several studies about human-machine interaction that discuss the possibility of identification and recognition of emotions by computational systems or robots (PICARD, VYZAS and HEALEY, 2001; PICARD, 2003; SCHEUTZ, SCHERMERHORN and KRAMER, 2006). In this context, this proposed work consists of the unobtrusive analysis of emotions in children with autism spectrum during their interactions with a new social robot as an emotional stimulus, developed at UFES and termed N-MARIA (New-Mobile Autonomous Robot for Interaction with Autistics). In addition to stimulate social skills and record physiological signs of children with ASD, the robot N-MARIA will be evaluated by professionals in relation to its usability as a potential tool of support in therapies. This study has approval of the Ethics Committee of Federal University of Espirito Santo, under number 1,121,638 (ANNEX A). Hypothesis From all aforementioned, the following investigated subjects (hypothesis) of the thesis were formulated: 1. Obtrusive methods have been used to acquire physiological signals in order to identify and recognize emotions, such as electrocardiography (ECG) and electroencephalography (EEG) (VALENZA et al. 2014; NASEHI and POURGHASSEM, 2012). However, they generate discomfort (RUSLI et al. 2016). Individuals with ASD can felt sensitivity to touch (MINSHEW and HOBSON, 2008), making it difficult to perform exams. Thermal camera 17 is the unobtrusive (contact-free) and efficient sensor to record body thermal variations through infrared thermal imaging (IRTI), which enables the analysis of emotions in children with ASD. 2. Several international works exhibit successful interaction between children with ASD and robots (ROBINS et al. 2010 (a); KIM et al. 2013; WON and ZHONG, 2016; BOCCANFUSO et al. 2017). In addition, our previous studies demonstrated a positive children-with-ASD-robot interaction (VALADÃO et al., 2016; GOULART et al., 2018). The social robot (N-MARIA) is a useful emotional stimulus to trigger emotions in children with ASD and able to record physiological signals for emotion analysis. Contribution The contribution of this work is to present a social robot with a thermal camera for emotion analysis in children with ASD. The robot is able to mediate social and pedagogical tasks, be an emotion stimulus and record facial thermal images of children with ASD, in an unobtrusive way, characterizing a great contribution to areas of emotion recognition and social robotics, focusing on its application for ASD field. Goals The main goals of this thesis are: 1) Evaluate thermal variations involved in the emotion expressions; 2) Evaluate the robot as an affective stimulus in the interaction with children with ASD; 18 3) Evaluate social and pedagogical tasks performed by children with ASD during their interaction with the robot in a minimum of two sessions; 4) Assess the social robot as potential tool in therapies, through surveys applied to professionals of this area. Structure This thesis approaches the literature review; chapter 1 that addresses the emotion recognition in typically developing children through the variation of the facial emissivity recorded by infrared thermal imaging (IRTI); chapter 2 that discusses an automatic detection method of facial regions of interest (ROIs) for the emotion recognition through IRTI in typically developing children in the interaction with the robot N-MARIA; chapter 3 that describes the evaluation of social skills and pedagogical tasks and the emotional analysis in children with ASD during the interaction with the robot N-MARIA; and general conclusions. 19 Literature review 20 Autism Spectrum Disorder Autism Spectrum Disorder consists of a broad spectrum of clinical manifestations, characterized commonly with a dyad of general symptoms: 1) impairments in reciprocal social communication and interaction; 2) restricted/repetitive patterns of behavior, interests, or activities (WIEGIEL et al., 2010; APA, 2013). Depending on the variety and severity of symptoms, the spectrum can assume mild, moderate or severe / intense levels (APA, 2013). Considered a neurobehavioral condition, ASD does not have a cure and its occurrence is related to multifactorial conditions in which a genetic factor set and environmental stressors act at particular times during brain development, triggering an autistic phenotype (CASANOVA, 2015). Studies suggest that different genes associated with different brain regions and different cognitive impairments and functional abnormalities can generate the distinct levels of ASD (WIEGIEL et al., 2010). Recently, a burgeoning interest in regards to this condition can be due to the rising prevalence rates of ASD and the concomitant societal, educational, and financial problems (CASANOVA, 2015). The global average of estimated prevalence of ASD is 62/10,000, according to studies of Elsabbagh et al. (2012), which implies that 1 in 160 children has an Autism Spectrum Disorder, affecting more boys than girls, in a ratio of approximately 4:1 (WHO, 2017; FOMBONNE, 2009). The frequency reported for ASD is of 1% of the world population, and the increase of its rates can be related to an expansion of the diagnostic criteria, increased awareness of the disorder, diagnosis at earlier ages, differences in study methodology, recognition that ASD is a lifelong condition, or, purely, a true rise in its 21 frequency (APA, 2013; MATSON and KOZLOWSKI, 2011; SUN and ALLISON, 2010). In Brazil, it was estimated about 500 thousand people with ASD in 2010 (GOMES et al., 2015), nevertheless there is lack of epidemiological studies able to estimate the exact number and location of people with ASD in the country, in order to establish more effective action policies (MELLO et al., 2013; CANO, 2016). Studies based on prevalence of ASD are important to quantify the increase of the cases of this disorder (MELLO et al., 2013). Some studies of prevalence estimated in other countries can be observed in Table 1. The prevalence rate variation between countries is possibly linked to studies conducted with distinct methodologies, diagnostic procedures and population size (SUN and ALLISON, 2010). Table 1. Estimated median prevalence in percentage of ASD reported in some countries. Country Year Prevalence (%) Sample Source United States of America 2010 1.5 Children aged 8 years CHRISTENSEN et al., 2016 Canada 2007 0.61 0.22 0.43 Children and adults1 Children aged 0-19 years2 SSCSAST, 2007 22 Country Year Prevalence (%) Sample Source Adults3 United Kingdom 1988 to 2001 1.2 Children aged 9-10 years BAIRD et al., 2006 Norway - 0.9 Children aged 7-9 years POSSERUD et al., 2010 China 1987 to 2008 0.1 Children aged 0-18 years SUN and ALLISON, 2010 Japan 1971 to 2008 0.2 Children aged 0-18 years SUN and ALLISON, 2010 South Korea 2005 to 2008 2.6 Children aged 7-12 years KIM et al., 2011 23 Country Year Prevalence (%) Sample Source Venezuela 2005 to 2006 0.2 Children aged 3-9 years MONTIEL–NAVA, C. and PEÑA, J. A., 2008 Physiological signals and emotions Physiological patterns originate from the Central Nervous System (CNS) and the Peripheral Nervous System (SNP) (KOELSTRA et al., 2012), and their recognition becomes potentially useful in the evaluation and quantification of stress, anger and other emotions that influence health, and also assumes important applications in medicine, entertainment and human-computer interaction (PICARD et al., 2001). The emotional state is defined as sets of changes related to neurophysiological and hormonal responses, and facial, body and vocal behaviors, triggered by somatic and/or neurophysiological activity (LEWIS, 2008). Physiological signals, such as heartbeat, breath, bodily temperature, perspiration, muscle tension, brain signals, pupil diameter, among others, may vary due to impactful events or stimuli, such as harmful event, attack, threat, surprises, and thus, are able to characterize emotional states. When a person is positively or negatively excited, the sympathetic nerves of the Autonomic Nervous System (ANS) are activated, triggering physiologic responses, whose patterns are detectable and inevitable, i.e., are less susceptible to 24 conscious control. In the opposite, speech, gestures or body expressions are responses which may be voluntarily mutable by the humans, masking emotion expressions (PAVLIDIS et al., 2007; NHAN and CHAU, 2010; JERRITTA et al., 2011). Evolutionarily, in order to ensure the survival of the individual, the physiological responses were and are modulated in response to “fight-or-flight” reactions resulting from stress situations. During such reactions, changes in organ and tissue functions are elicited by the sympathetic system, such as an increase in the delivery of well-oxygenated and nutrient-rich blood to the working skeletal muscles; augmented heart rate and myocardial contractility so that the heart pumps more blood per minute; and stimulation of vascular smooth muscle to trigger widespread vasoconstriction, particularly in the organs of the gastrointestinal system and in the kidneys. The vasoconstriction caused by the sympathetic stimulation redistributes the blood away from these metabolically inactive tissues and towards the contracting muscles, whereas bronchodilation eases the air movement in and out of the lungs to maximize the uptake of oxygen from the atmosphere and the elimination of carbon dioxide from the body. There are an improved rate of breakdown of glycogen into its component glucose molecules (glycogenolysis) and formation of new glucose from noncarbohydrate sources (gluconeogenesis) in the liver that increases the concentration of glucose molecules in the blood. This is necessary for the brain since glucose is the only nutrient molecule that it can use to form metabolic energy. Moreover, there is an enhanced rate of lipolysis in adipose tissue in order to increase the concentration of fatty acid molecules in the blood. Consequently, skeletal muscles consume these fatty acids to form metabolic energy for contraction. Still, the sympathetic system elicits a generalized sweating that enables the thermoregulation 25 during these conditions of increased physical activity and heat production. Lastly, the eyes are adjusted such that the pupil dilates, letting more light in toward the retina (mydriasis) and the lens adapts for distance vision (MCCORRY, 2007). Throughout human evolution, emotion played an essential role in decisive moments, such as in the orientation of impasses or decision-makings, which are important to be managed solely by the intellect (GOLEMAN, 1995). As recurring challenges were repeated, an emotional repertoire was used to guarantee the survival of the human species, and consequently, this emotional repertoire has been recorded in the human nervous system as innate and automatic inclinations of the heart (GOLEMAN, 1995). Many findings of research from neuroscience and psychology highlight the critical role of emotion in rational and intelligent behavior, then, each type of emotion experienced predisposes the human to an immediate action or signalizes to one direction (PICARD, VYZAS and HEALEY, 2001; GOLEMAN, 1995). Emotions in social relationships Emotion is present in all aspects of human life and is a continuous adaptive mechanism related to the purpose of human interaction and expression, as a reaction to stimuli or events (KOELSTRA et al., 2012). Therefore, emotions are a great deal of interest and attention in many areas of research, such as neurology, psychology, sociology and computer science (KROUPI, YAZDANI and EBRAHIMI, 2011). Moreover, studies mention the relevance of emotions in the interpersonal relationships, as potent facilitators of cognitive processes (in decision-makings, for example) and important contributors to 26 many illnesses (stress, for example) (PAVLIDIS et al., 2007). In many aspects of the day- to-day lives, they contribute to communication between humans (NIE et al., 2011). Emotional states frequently mold social relationships. Thus, understanding them allows people to identify intentions of other individuals, besides adopting appropriate responses. The ability to recognize and label emotions is a social competence that is able to progress since the early childhood to the development of adaptive social behavior. (BAL et al., 2010). The individuals with ASD have lack of ability to recognize and differentiate emotions in themselves or in the displays of others (HAPPÉ, 1994; RUSLI et al., 2016). This difficulty in the estimation of emotional state of other people may be relative to the cortical impairments in discrimination between stimuli, found in early event-related potential peaks. Such impairments could result from deficits in the early stage of signals perception (YANG et al., 2011). In addition to isolated neural causes, the emotion recognition difficulties are associated with altered attentional, perceptual and cognitive processes, as individuals with ASD process faces differently and show reduced attention to faces and facial expressions. This fact can be due to the mentalistic and emotional information conveyed by the eyes and facial expressions, which may be hard to be read for people with ASD (GOLAN et al., 2010). Many studies have explored emotion recognition in individuals with ASD, taking into account the core deficits in ASD relative to the impairments in reciprocal social interactions and social behaviors (BAL et al., 2010). Sensors for recording physiological signals 27 A considerable limitation to current physiological approaches is the need of implantation or direct contact of sensors with the user, with many of them being impractical for most routine user environments (PURI et al., 2005). Sensors able to record physiological signals may be classified, in general, as invasive or non-invasive. Invasive sensors penetrate tissues of interest, in order to eliminate possible interferences in the acquisition of signals from the mechanical barriers composed of layers of skin, tissues or bones and, consequently, to obtain a signal with major quality. However, invasive sensors generate pain and risks to health of individual, with a considerable limitation to the record of physiological signs (PAVLIDIS et al., 2007; MERLA and ROMANI, 2007). On the other hand, sensors that do not penetrate tissues are classified as non-invasive and are much used in studies in order to record and quantify various types of biological signals. Such non-invasive devices might be intrusive, such as probes, which enter through body openings, generating some discomfort to the patient; or obtrusive, such as ECG (electrocardiogram) electrodes, which are put over the body, i.e., with some physical contact. The non-intrusive or unobtrusive sensors are contact-free. Figure 1 shows common instances of non-invasive devices broadly used in clinical and research areas. The human physiological information through non-invasive devices has been useful to draw biophysiological inferences about a variety of health symptoms and psychological states, in addition to biometrics, security and surveillance, criminal investigation and human-computer interaction (PAVLIDIS et al., 2007; KHAN, WARD and INGLEBY, 2009). In the literature, numerous works describe physiologic analysis methods that use sensors able to capture such signals in order to evaluate and recognize emotions, as well stress levels, monitor patients in physical rehabilitation who need more frequent healthcare, and 28 detect concealed speech declared by guilty persons during investigations, through the polygraph, for example, popularly known as a lie detector, and others (KREIBIG, 2010; JERRITTA et al., 2011; JOVANOV et al., 2005; POLLINA et al., 2006). Figure 1. Non-invasive devices and sensors used for physiological signal estimation: brain rhythms (EEG), heart rate (ECG), respiratory rate, temperature, peripheral and skin conductance. Source: Goulart et al. (2014) Considering the possible sensitivity to touch by individuals with ASD (MINSHEW and HOBSON, 2008), unobtrusive devices may be an useful alternative that enables biological and behavioral analysis for studies relative to the recognition of emotions and common behavior patterns in individuals with ASD. It is worth commenting that there is a paucity of studies evaluating autonomic activity in children with ASD through unobtrusive sensors (BAL et al., 2010). 29 Thermal camera The unobtrusive device chosen for our study is the Therm-App®, a low-cost infrared thermal camera able to record body temperature variations. This camera can be attached to Android devices enabling one to display, record, and share infrared thermal images for ‘night vision’ and ‘thermography’ applications. Table 2 exhibits some technical specifications of the thermal device (THERM-APP, 2017). Table 2. Technical specifications of the Therm-App®. Weight 138 g Size 55 x 65 x 40 mm Minimal requirements Android 4.1 and above Resolution 384 x 288 pixels (> 110,000 pixels) 17 µ thermal detector Long Wavelengths Infrared 7.5-14 μm Range of lens options 6.8 mm (55° x 41°) 13 mm (29° x 22°) 19 mm (19° x 14°) 35 mm (11° x 8°) Focus Manual, 0.2 m to infinity Frame rate 8.7 Hz Operating temperature -10 °C to + 50 °C Accuracy ± 3 ºC Sensitivity Noise Equivalent Temperature Difference < 0.07 ºC Temperature range calibration 5 – 90 ºC Color palettes Hot white, hot black, iron, rainbow, grey, vivid 30 Image processing modes (viewing modes) Night Vision (optimizes hot object detection) Thermography (provides a clean and accurate basic temperature reading) Infrared thermal imaging (IRTI) provides recordings of physiological parameters, which are associated with the affective states and indicate the cutaneous temperature, as well as blood flow, cardiac pulse, breathe pattern and skin temperature (RUSLI et al., 2016). Considered an upcoming, promising and ecologically valid method, IRTI has been adopted in a variety of studies involving human emotions (IOANNOU, GALLESE and MERLA, 2014; DI GIACINTO et al., 2014). Thermal image generation is expressed by the following equation: 𝑊 = 𝜀𝜎𝑇4 , where W is the radiant emittance (W / cm2); 𝜀 is the emissivity (estimated at 0.98–0.99 for human skin); 𝜎 is the Stefan–Boltzmann constant (5.6705 x 10−12 W / cm2 K4); and T is the temperature (K) (SUGIMOTO, YOSHITOMI and TOMITA, 2000). In addition to being an unobtrusive (contact-free) and highly accurate technique, other advantage of the thermography is the reduction of noises, often evidenced in other physiological measures (NHAN and CHAU, 2009; STEMBERGER, ALLISON and SCHNELL, 2010). In addition, it can be evaluated on both sides of the body, enabling the assessment of asymmetries in temperature and evaluation of larger areas of the skin, not limiting the analysis to the small regions, as with electrodes used in other physiological analysis (RIMM-KAUFMAN and KAGAN, 1996). Moreover, infrared imaging made 31 dynamically may provide a potential physiological access pathway, becoming a powerful tool for inferring psycho-physiological signs, differentiating baseline states of affective states, whereas preserving an ecological and natural context (NHAN and CHAU, 2009; IOANNOU et al., 2013). On the other hand, disadvantages present in thermal imaging consist of artifacts from both environmental effects and metabolic effects of digestion, occlusion of regions of interest (ROI) by eyeglasses or hair bands (STEMBERGER, ALLISON and SCHNELL, 2010). In corporeal thermoregulation, bodily receptors constantly monitor body and ambient temperatures, in order to maintain the internal body temperature of humans between 36.5 and 37 °C, and they are peripheral (in the skin) or central (in the spinal cord) (GUYTON and HALL, 2006; BRUNO, MELNYK and VÖLCKNER, 2017). Such thermoregulation is performed by temperature regulating centers located in the hypothalamus, through neural feedback mechanisms (GUYTON and HALL, 2006). Blood vessels are deeply distributed under the skin, and their arrangement can be seen in Figure 2. This arrangement comprises the continuous venous plexus and the arteriovenous anastomoses, which supply blood to the venous plexus and most exposed parts of the body. The conduction of heat to the skin by blood is controlled by the degree of vasoconstriction (cooling) or vasodilatation (warming) of arterioles and arteriovenous anastomoses, basically caused by the inhibition or stimulation of the sympathetic centers in the hypothalamus in response to changes in central body temperature and environment (GUYTON and HALL, 2006). 32 Figure 2. Circulation of the skin. Source: Guyton and Hall (2006). In fact, the average diameter of blood vessels is around 10.15 μm, very small to be detected by infrared cameras (limited by the spatial resolution); on the other hand, the skin that is directly above blood vessels is on average 0.1 °C warmer than the adjacent skin, beyond the thermal accuracy of current infrared cameras (WU, LIN and XIE, 2008). Thermal images of the face may provide biometric measurements on human emotions, based on the idea that there are variations of the temperature in various regions of the face according to emotional experience (SALAZAR-LÓPEZ et al., 2015; PAVLIDIS, EBERHARDT and LEVINE, 2002). The human face and body emit both in the mid- (3-5 µm) and far- infrared (8-12 µm) bands, which can be recorded by thermal cameras, producing 2D infrared images (thermograms), enabling sensing temperature variations in the face at a distance (PAVLIDIS, LEVINE and BAUKOL, 2000). 33 Taking into account that physiological variables, such as superficial blood flow and cardiac pulse, are related in some way to the heat transfer mechanism of the human body (PAVLIDIS et al., 2007), thermography may measure the association between cardiovascular physiology and mental and emotional states reflected in the body (STEMBERGER, ALLISON and SCHNELL, 2010). The response generated by the autonomic nervous system to stress triggers variations in skin temperature, i.e., during an emotional or physical threat, a complex prompting of cutaneous heat variation occurs, involving skin and inner tissues, local vasculature and metabolic activity (SALAZAR- LÓPEZ et al., 2015; IOANNOU et al., 2013). When elevated feelings of alertness, anxiety or fear are experienced by individuals, high levels of adrenaline regulate the blood flow, causing abrupt changes in local skin temperature through redistribution of blood flow in superficial blood vessels, as well as conduction of heat from the blood to the surface of the skin, which is apparent in the human face where the layer of flesh is very thin (STEMBERGER, ALLISON and SCHNELL, 2010; PAVLIDIS; LEVINE and BAUKOL, 2000). The activation of facial muscles requires blood flow, and the set of branches and sub- branches of vessels that innervate the face muscles evidences the heating of the skin, which may be qualified and quantified by infrared thermal thermography (ZHU, TSIAMYRTZIS and PAVLIDIS, 2007; CRUZ-ALBARRAN et al., 2017). More than twenty muscles comprise the face, and their contractions and relaxations are responsible by the generation of several facial expressions. The set of muscles, arteries and veins that compose the face is shown anatomically in Figures 3, 4 and 5, respectively. 34 Figure 3. Facial muscles. Source: Putz and Pabst (2006) 35 Figure 4. Head arteries. Source: Putz and Pabst (2006) 36 Figure 5. Head veins. Source: Putz and Pabst (2006) Figure 6 shows the thermal representation for identification of the regions of interest (ROIs) along with a vascular representation of the major vessels affecting the subcutaneous temperature of the face. 37 Figure 6. Most common vessels related to facial regions of interest in studies with thermography. Source: Berkovitz et al. (2013) and Ioannou, Gallese and Merla (2014). Facial expressions are complex muscular patterns resulting from one or more motions or positions of the facial muscles, and their recognition is indispensable before understanding feeling or mental states (JARLIER et al., 2011; SUGIMOTO, YOSHITOMI and TOMITA, 2000). The muscle influence on temperature may be observed in the work of Sugimoto, Yoshitomi and Tomita (2000), where the temperature of the cheek region increased with a high frequency of unconscious (natural) smile performed during a game performance and tended to be maintained or decreased, when the frequency of unconscious smile was low. In contrast, during an artificial smile, there was not an increment of the temperature in the cheek region. Jarlier et al. (2011) examined the relation between the intensity and the speed of muscle contractions and the specificity of the heat pattern produced. They detected a temperature increment in the zygomaticus 38 region during a smile (requested action) and a decrease in temperature in the frontalis region during the raising of the brows (requested action). During stress, an increment in blood flow occurs to the forehead region, i.e., the activation of the corrugator muscle requires more blood, drawn from the supraorbital vessels, leading to an increase in temperature in such local (PURI et al., 2005; ZHU, TSIAMYRTZIS and PAVLIDIS, 2007). Moreover, another indicative of stress is the elevated perfusion levels in the periorbital area manifested as higher skin temperatures (PURI et al., 2005). In thermography, the facial ROIs most studied are forehead, cheeks, periorbital region and nose. Many studies disclose more thermal variations on the nose, being the most reliable region for detecting psychophysiological arousal (IOANNOU et al., 2013). Table 3 mentions some studies relative to facial thermal variations according to emotional states in humans and animals. Table 3. Facial temperature according to emotional states. AUTHORS EXPERIMENTAL SITUATION EMOTIONAL STATE SKIN TEMPERATURE (average ΔTemperature (oC)) FACIAL LOCAL Mizukami et al. (1990) Simple mother- infant separation Mother is replaced by stranger (in infants aged 2-4 months old) Mental stress Decrement (0.3) Forehead 39 AUTHORS EXPERIMENTAL SITUATION EMOTIONAL STATE SKIN TEMPERATURE (average ΔTemperature (oC)) FACIAL LOCAL Pavlidis, Levine and Baukol (2000) 1) Sound loud noise 2) Gum chewing 3) Leisure walking Patterns of anxiety, alertness, and/or fearfulness 1) Increment over periorbital area; decrement over the cheeks and increment over the neck (over the carotid) 2)Increment in the chin area 3)Decrement in the nasal area *(Pixel Average Variation Values) Periorbital area Nasal area Cheeks Chin area Neck area Levine, Pavlidis and Cooper (2001) Sudden loud sound Fear Decrement Increment Cheek Around the eye Pavlidis, Eberhardt and Levine (2002) Volunteers commit a mock crime and then testify to their innocence Anxiety Increment Eye area Nakayama et al. (2005) Monkeys facing a threatening person Negative emotional states Decrement (+/- 0.2) Nose Nozawa et al. (2006) Loud and explosive sound The Fight or Flight reaction (high stress) Increment (0.18) Procerus muscle and cheek Merla and Romani (2007) 1)Electric stimulation 2)Presence of strangers while wrongly performing a 1)Fear to feel pain 2)Embarrassment 3) Sexual arousal 1)Decrement (0.6 ± 0.3) 2) Decrement 3) Increment 1) Face (particularly in the perioral region) 2) Face (particularly in 40 AUTHORS EXPERIMENTAL SITUATION EMOTIONAL STATE SKIN TEMPERATURE (average ΔTemperature (oC)) FACIAL LOCAL stroop test task 3) Watch movies the perioral region) 3) Periorbital, forehead, mouth, and nose Zhu,Tsiamyrtzis and Pavlidis (2007) Lie detection in a mock crime scenario Anxiety Increment Forehead (supraorbital vessels) Nakanishi and Imai-Matsumura (2008) Laughter (when a mother plays with an infant) Pleasant emotion (in infants aged 2-10 months old) Decrement (between 0.5 and 2.0) Forehead, cheek and nose (in this, more pronounced decrease) Kuraoka and Nakamura (2011) Audiovisual Alone auditory or visual (In monkeys) Negative emotions (low valence and high arousal) Decrement (stronger) (0.76, 0.35 and 0.45) Decrement (0.17 for both stimulus) Nose Nose Robinson et al. (2012) Participants received feedback (about their social skills) Negative and positive emotions Increment (0.23 0.21 0.28 0.54) Brow Eyes Cheeks Mouth Ioannou et al. (2013) Toy mishap Guilt Decrement (0.05) Nose Di Giacinto et al. (2014) Mild posttraumatic stress disorder (PTSD) patients and control subjects Fear Decrement (up to 2.0) Nose 41 AUTHORS EXPERIMENTAL SITUATION EMOTIONAL STATE SKIN TEMPERATURE (average ΔTemperature (oC)) FACIAL LOCAL under a sudden acoustic stimulus Salazar-López et al. (2015) IAPS images Video clips (Contagious laughter condition) Video clip (watching a person suffering pain) Suffering pain Religious video clips + Lord’s Prayer and Personal prayers Portraits of loved people Negative valence – low arousal Negative valence – high arousal Positive valence – high arousal Positive valence – low arousal High empathy Low empathy High empathy Low empathy High empathy Low empathy Love Love Decrement (0.85) Increment (0.96) Increment (1.66) Increment (1.01) Decrement (1.4) Decrement (0.7) Decrement (1.1) Decrement (0.7) Decrement (1.3) Decrement (0.9) Decrement (1.1) Increment (1.6) Increment (1.5) Nose Nose and mouth Nose, forehead and mouth General Nose Nose Nose Nose Nose Nose Nose Face Face 42 Social robots and ASD Social robots have been designed to interact with humans, evoking social behaviors and perceptions in people with whom they interact (KIM et al., 2013), and thus, increasing more natural and engaged contact. Robots classified as socially assistive focus on assistance based on the social interaction, aiming at automating supervision, coaching, motivation and companionship aspects (TAPUS, MATARI´C and SCASSELLATI, 2007). They aim to establish a relationship with the user that leads toward intended therapeutic goals; provides a benefit to a caregiver by monitoring multiple aspects of the patient and providing ongoing quantitative assessments; in addition to establishing engagement and having the user enjoying interactions with the robot (FEIL-SEIFER and MATARI´C, 2011). Socially assistive robots can aid several kinds of therapies for individuals affected by stroke, incapacitating aging, dementia, and Autism Spectrum Disorder (ASD) (SCASSELLATI, ADMONI and MATARI´C, 2012; FEIL-SEIFER and MATARI´C, 2011). IROMEC (Interactive Robotic Social Mediators as Companions) is a project that investigates how autonomous and interactive robotic toys can become social mediators by encouraging children with different special needs (autism, mild mental retardation and severe motor impairment) to explore the variety of individual play styles and collaborative games (interaction with colleagues, caregivers, teachers, parents and others) (ROBINS et al., 2010b). When designed for interaction with children with ASD, these robots might assist therapists and caretakers in the development of cognitive, behavioral and social abilities of these children. Such robots can be promising as an intervention tool, because the interaction 43 between children with ASD and robots is likely positive, as robots tend to be more predictable, simpler and easier to understand than humans (DUQUETTE et al., 2008; ROBINS et al., 2010a). These robots aim to be useful in pedagogical treatments, enabling an optimistic interaction with the children, as well as calling their attention and stimulating them to get contact with the surrounding environment (ROBINS et al., 2010b; SCASSELLATI, ADMONI and MATARI´C, 2012). Studies that investigate the use of socially assistive robots in therapies for ASD usually emphasize specific goals for an ideal human–robot interaction comprising in increased joint attention, eye contact, child initiated interactions, verbal and non-verbal communication, turn-taking, imitative game and overall use of language (BOCCANFUSO et al., 2017). These robots can have several shapes, being classified as anthropomorphic (resemble humans-humanoids), non- anthropomorphic (resemble animals or cartoon like-toys) and non-biomimetic (not resemble any biological species) (CABIBIHAN et al.; 2013). Anthropomorphic or humanoids robots are used to interact with humans, trying to mimic some human aspects, like playing soccer, dancing, speaking and playing instruments (SCASSELLATI, ADMONI and MATARI´C, 2012; ROBINS et al., 2010b; DUQUETTE, MICHAUD and MERCIER, 2008). Instances of these types of robots are cited below and represented in Figure 7. KASPAR is a humanoid robot that moves its head and arms, articulating gestures to interact with children with ASD, and has touch sensors, which measure the tactile interaction of the child with it (ROBINS et al., 2010a). Another robot is the doll-Robota, which performs a bodily interaction through imitative games using its legs, arms, and head, and is able to stimulate other social interaction skills, such as eye gaze, touch, joint attention, turn taking and communicative competence (BILLARD et al., 2007). 44 Tito is a humanoid robot composed of wheels, arms, eyes, nose, mouth (for smiling), head (that moves), and hair (that may be illuminated). It emits vocal messages and can sustain autonomous action or be teleoperated by therapists, acting as a mediator in order to stimulate shared attention, physical proximity and imitation of facial expressions and gestures (DUQUETTE, MICHAUD and MERCIER, 2008). CHARLIE (Child-centred Adaptative Robot for Learning in an Interactive Environment) is a low-cost prototype of a small autonomous interactive robot designed to be toy-like for assisted intervention. It has some degrees of movement in hands and heads, and a speaker, which support the performance of interactive games that can be teleoperated by therapists in order to stimulate imitation, shared attention and turn-taking (BOCCANFUSO et al., 2017). Another example of humanoid robot is the RobokindTM Zeno R25, which has a face that displays several reasonably human-like facial expressions in real-time, and a set of sensors and cameras to emit stimuli, to capture behavioral signals of the child and provide him/her with reinforcement. It stimulates eye contact, joint attention, body imitation and facial imitation, promoting basic emotion recognition (PALESTRA et al., 2016). The robot Ono has a face capable of displaying a variety of emotions, touch sensors throughout the body, and verbal ability, to provide support for a therapist in a therapeutic environment (ZUBRYCKI and GRANOSIK, 2016). NAO is a humanoid social robot, well known by institutions and researchers for a variety of applications, including in the field of ASD. With its capability of moving its body (with LED lightening) and verbalizing, it is able to stimulate movement imitation in interactive games, proximity, physical touch, following of instructions, among other applications (LI, JIA and FENG, 2016; SUZUKI and LEE, 2016). 45 Figure 7. Examples of humanoid robots: a. KASPAR; b. Robota; c. Tito; d. CHARLIE; e. Zeno R25; f. Ono; g. NAO. Source: a. Robins et al. (2010b); b. Billard et al. (2007); Cabibihan et al. (2013); c. Duquette, Michaud and Mercier (2008); d. Boccanfuso et al. (2017); e. Palestra et al. (2016); f. Zubrycki and Granosik (2016); g. Suzuki and Lee (2016). As examples of non-anthropomorphic robots, PLEO, Paro and KEEPON can be mentioned, which are shown in Figure 8. PLEO consists of a dinosaur-robot designed to express emotions, through body movements and simple vocalizations, triggering verbalization and interaction with another person during games, in which it acts as a mediator (KIM et al., 2013). Paro is a seal robot able to recognize speech and detect 46 sound source direction. It is composed of tactile sensors, mobile parts (neck, paddles and eyelids) and represents facial expressions (happy and sad), besides having four senses: sight, audition, balance and tactile, designed to coexist with people, providing them joy and relaxation through physical interaction (SHIBATA, KAWAGUCHI and WADA, 2012). KEEPON is a little yellow dummy robot, shaped to execute emotional and attention exchange with children with ASD (KOZIMA, NAKAGAWA and YASUDA, 2005), capable of aiding and encouraging them to perform interpersonal communication in a playful way and relaxed mood, stimulating social interactions with robots, peers, and caretakers (KOZIMA, NAKAGAWA and YASUDA, 2005; KOZIMA, MICHALOWSKI and NAKAGAWA, 2009). 47 Figure 8. Examples of non-anthropomorphic robots: a. PLEO; b. Paro; c. KEEPON. Source: a. Kim et al. (2013); b. Shibata, Kawaguchi and Wada (2012); c. Kozima, Nakagawa and Yasuda (2005). Non-biomimetic robots are creature-like robots, for example the robots ROBUS, Roball, Pekee and Labo-1, which are shown in Figure 9. Another common feature among these robots is the mobility. ROBUS (ROBot Université de Sherbrooke) is a mobile robotic in which several shapes might be implemented with different functionalities (moving parts, people following and interactive games), aiming to increase the child's attention and to make the environment around her/him more interactive (MICHAUD and CLAVET, 2001). Roball is a spherical mobile robot that has autonomous movements, such as spinning, shaking or pushing, and it is composed of proprioceptive sensors that enable the robot to adapt its behavior, through vocalizations and motions performed in relation to interaction modes with the children (MICHAUD et al., 2007). Pekee is a mobile robot that proposes to engage the children and encourage interaction, stimulating approaching, tactile interaction and shared engagement with the mediator (SALTER, DAUTENHAHN and BOEKHORST, 2006). Labo-1 is a mobile robot used in trials on interactive games with children with ASD, equipped with heat sensors for the detection of the child, and an optional voice generation device to elaborate speech phrases (DAUTENHAHN, 2007). The locomotion of robots is a good attraction to catch the child’s attention, since children with ASD, especially, tend to be more attracted towards moving things (CABIBIHAN et al., 2013). The mobility of the robot allows many possibilities of ways to interaction with the children (SALTER, DAUTENHAHN and BOEKHORST, 2006; CABIBIHAN et al., 2013). 48 Figure 9. Examples of non-biomimetic robots: a. ROBUS; b. Roball; c. Pekee; d. Labo-1. Source: a. Michaud and Clavet (2001); b. Michaud et al. (2007); c. Cabibihan et al. (2013); Salter, Dautenhahn and Boekhorst (2006); d. Dautenhahn (2007). With the increasing development of the machine emotional intelligence in robots, these are able to recognize human emotions (PICARD, VYZAS and HEALEY, 2001) and, also, express them. Thus, socially assistive robots might act as social actors, trying to exhibit emotional cues with potential to affect people in the same manner as the emotional expressions of other people do (MATSUMOTO et al., 2015). This point is very interesting to improve the human-robot interaction, making it more natural, and to understand better human emotion and behavior, as those of children with ASD, for instance, since they have 49 difficulty in expressing emotions and they interact well with robots (DUQUETTE, MICHAUD and MARCIER, 2008). N-MARIA Several aspects from the aforementioned robots were taken into consideration to the building of the new version of the robot N-MARIA (New-Mobile Autonomous Robot for Interaction with Autistics): -dynamic face with expressions of six basic emotions (happiness, sadness, anger, disgust, fear and surprise, beyond neutral); -vocalizations with ready dialogues; -autonomous movements (locomotion); -safe and entertaining structure composed of touch sensors and thermal camera. With these aspects, it is expected that the robot N-MARIA will catch the child’s attention easily. According to the premises that mention that children with ASD prefer predictable, stable environments and have difficulty in understanding facial expressions and other social cues, the aforementioned aspects were carefully researched in the literature to be applied in robot N-MARIA (CABIBIHAN et al., 2013; GIULLIAN et al., 2010; WOODS, 2006; ROBINS et al., 2007; MICHAUD, DUQUETTE and NADEAU, 2003). To know the children’s expectations about robots, in order to aid in the new structure to build the robot N-MARIA, drawings were requested from 187 typically developing children 50 (78 girls and 109 boys) aged between 6 and 12 years (X̅ = 8.4 and SD = 1.43), students of the partner schools of this research, according to their own criteria about shape, functions and contexts of robots. One hundred eighty seven drawings were made and analyzed considering some physical aspects, based on the research of Woods (2006):  Appearance (human-machine, animal-machine, machine-like)  Made of locomotion  Shape  Design categories (overall appearance – car, animal, human, machine…)  Facial features  Gender  Functionality (toy, friend, machine)  Other features: accessories or extra characteristics. The analysis showed that the majority of the robots designed presented eyes, nose, mouth and teeth with humanoid aspects, seeming to be male gender (24%) and female gender (22%), and the others had undefined gender. 78% of drawings showed humanoid robots under a geometric body (square - 59%), and with legs and feet for locomotion (53%). 75% of the robots were demonstrated as friendly. Examples of some drawings are showed in Figure 10. 51 Figure 10. Some examples of robots drawn by children. According to literature, to make the robot more attractive to the children, it should have a ludic shape, with balanced human-like and mechanical-like features, since if it looks too human, it may lead the children to fear it or not be interested in it, whereas if it looks too mechanical, the child would be more interested in examining it, instead of interacting with it (GIULLIAN et al., 2010; VALADÃO et al., 2016). Moreover, the robot height should be similar to children’s height, in order to allow eye-level interaction (VALADÃO et al., 2016). 52 An important part of the body developed for the N-MARIA robot was the head, composed of a tablet, in which six emotional facial expressions are dynamically displayed, and a Bluetooth speaker. To make the child-robot interaction more entertaining and friendly, the face consists of simple caricatures with eyes that blink, eyelashes, eyebrows, nose and mouth, presenting a color for each expression of emotion, inspired from the film “Inside Out” (name in Portuguese: “Divertidamente”) (Figure 11). Thus, sad face is blue, disgust is green, angry is red, fear is purple, happy is yellow and surprise is orange. The neutral face is white. When the robot verbalizes, the face becomes neutral and the mouth moves along with pre-programed dialogues. The aforementioned features about the robot’s face can increase interest and attention of the child and ensure a natural interaction (LEE, KIM and KANG, 2012). Figure 11. Emotional face expressions of the N-MARIA robot. From left to right, the upper pictures refer to the following feelings: happiness, disgust, surprise and neutral; the lower pictures, sadness, anger and fear. The facial animation was made using two different software: Piskel, a free software used to elaborate all the images, whose transitions are made in a conjunct of layers, so that animation can be smoothed. The other software is the Unity 3D, which makes possible 53 the creation of animations, triggered by clicks on the screen, and the capacity to import a program written in C# to Android. Details about the development and assessment of dynamic faces of the N-MARIA can be found in Santos et al. (2017). N-MARIA has 1.41 m height, near to a typical child’s height aged nine-ten years old. Its body was built with soft and malleable materials, such as EVA (Ethylene Vinyl Acetate), foams and fabrics that were molded and are able to protect internal devices and sensors, as well as the children. The mobile platform (Pioneer 3 DX) enables locomotion for the N- MARIA, and a 360° laser sensor (LiDAR - Light Detection and Ranging) allows the child’s localization in the environment. Two NUCs (Next Unit Computing) control the robot, one for the control of the Pioneer and LiDAR, and another for the acquisition of thermal and RGB images and touch sensors. A tablet displays the facial expressions and the preprogramed dialogues (exhibited through the speaker connected to the tablet via Bluetooth). The energy source for the two NUCs is a 60 Ah battery, placed on the Pionner. The Pioneer, battery and one of NUCs are hidden by a colorful skirt. The N-MARIA, as well as its interior part, can be seen in Figure 12. 54 Figure 12. Robot N-MARIA (left) and its inner part (right). In addition to being able to interact with the child, the robot can be also controlled by the therapist for social pedagogical activities with children, using another tablet. The control of the robot by the therapist is characterized by an interface developed that enables the robot to perform all the preprogramed commands. The interface was designed using C# and Unity languages. A computational server was created to establish the communication between a main NUC and the therapist’s tablet. The commands performed by the therapist via tablet are stored and executed instantly. The server is exempt of communication with Internet, since the server is in the physical space of the NUC. The server communication is made through Wireless Local Area Network (WLAN). Figure 13 shows the graphic user interface (GUI) with some commands installed in the tablet of control. 55 Figure 13. Graphic user interface installed in the tablet for controlling the robot. 56 Chapter 1 57 Emotion Analysis in Children through Facial Emissivity of Infrared Thermal Imaging Christiane Goulart, Carlos Valadão, Denis Delisle-Rodríguez, Eliete Caldeira, Teodiano Bastos-Filho Abstract Physiological signals may be used as objective markers to identify emotions, which play relevant roles in social and daily life. To measure these signals, the use of contact-free techniques, such as Infrared Thermal Imaging (IRTI), is indispensable to individuals who have sensory sensitivity. The goal of this study is to propose an experimental design to analyze five emotions (disgust, fear, happiness, sadness and surprise) from facial thermal images of typically developing (TD) children aged 7-11 years using emissivity variation, as recorded by IRTI. For the emotion analysis, a dataset considered emotional dimensions (valence and arousal), facial bilateral sides and emotion classification accuracy. The results evidence the efficiency of the experimental design with interesting findings, such as the correlation between the valence and the thermal decrement in nose; disgust and happiness as potent triggers of facial emissivity variations; and significant emissivity variations in nose, cheeks and periorbital regions associated with different emotions. Moreover, facial thermal asymmetry was revealed with a distinct thermal tendency in the cheeks, and classification accuracy reached a mean value greater than 85%. From the results, the emissivity variations were an efficient marker to analyze emotions in facial thermal images, and IRTI was confirmed to be an outstanding technique to study 58 emotions. This study contributes a robust dataset to analyze the emotions of 7-11-year- old TD children, an age range for which there is a gap in the literature. Keywords: Emissivity Variation. Emotion Analysis. Emotion Classification. Facial Thermal Asymmetry. Infrared Thermal Imaging. Valence. Introduction Studies focused on emotions has increased worldwide, mainly due to their importance for the interpersonal relationship field but also as they are considered potent facilitators of cognitive processes and contributors to many illnesses [1]. In many aspects of daily life, emotions frequently mold social relationships, contributing to communication between humans [2] and enabling the identification of a person’s intention to adopt appropriate responses [3]. In the emotion analysis field, valence (pleasure) and arousal (intensity) are consistent emotion dimensions for emotional perception, which have shown correlation with physiological signals, such as signals from facial muscle activity, skin conductance, heart rate, startle response and brain waves [4–6]. Moreover, efforts for emotion recognition through physiological markers are evident in many studies that show accuracy varying between 60% and 90% using electrocardiography (ECG) [7], electroencephalography (EEG) [8–10] and thermography signals [11–13]. Human faces play an important role in conforming facial expressions and revealing emotions; they are not totally symmetric, with emotions being more strongly expressed in the left side of the face [14]. Facial expressions are derived from both facial muscle activation and influences of the autonomic responses (pallor, blush, pupil size, sweat), 59 revealing threatening or attractive events experienced by the person [15,16]. The set of branches and sub-branches of vessels that innervate the face demonstrates the heating of the skin, which may be related to emotions and studied through Infrared Thermal Imaging (IRTI) [13,17]. IRTI is an upcoming, promising and ecologically valid technique that has been increasingly adopted in studies involving human emotions, which may be associated with physiological parameters [18–20]. In addition, it is a relevant, contact-free technique for people's comfort without the usage of sensors on the body [21]. Although IRTI has been widely employed in studies with adult subjects for emotion analysis and recognition [12,13,22,23], similar studies in children have rarely been addressed [18,24,25]. To the best of our knowledge, an experimental design for emotion analysis by IRTI applied in typically developing (TD) children aged between 7 and 11 years old has not been addressed to date. The goal of this work is to propose an experimental design to analyze five emotions (disgust, fear, happiness, sadness and surprise) evoked in 7-11-year-old TD children through facial emissivity changes detected by IRTI. An emotion analysis dataset, relative to emotional dimensions (valence and arousal), facial bilateral regions and emotion classification accuracy, was considered. Materials and Methods Participants 60 This study was approved by the Ethics Committee of the Federal University of Espirito Santo (UFES) (Brazil) under number 1,121,638. Twenty-eight children (12 females and 16 males, age range: 7 – 11 years old, M = 9.46 years old, SD = 1.04 years old) participated in experiments. Eleven percent were between 7 and 8 years old, and eighty- nine percent were between 9 and 11 years old. The recruited children group was defined taking into account the age range mainly corresponding to middle childhood. The children were recruited through cooperation agreements established with three elementary schools of Vitoria from Brazil. The children’s teachers cooperated with the selection of children according to our inclusion and exclusion criteria; the former consisted of ages between 7 and 11 years old and absence of traumatic experiences and phobias, and the latter consisted of the occurrence of other neurological disorders that affect the development of the brain, usage of glasses and any medicine. The parents or legal guardians of the children gave written informed consent in accordance with the Declaration of Helsinki. In addition, the children who wanted to participate in the experiments also gave their written terms of assent. Recording For the thermal data acquisition, a Therm-App® infrared thermal imaging camera was used, which has spatial resolution of 384 × 288 ppi, temperature sensitivity < 0.07°C, and frame rate of 8.7 Hz. The image normalization was configured to associate lower temperatures with darker pixels (lower emissivity) and higher temperatures with lighter pixels (higher emissivity) with a pixel intensity rate ranging from 0 to 255. Stimuli 61 To evoke emotions in the children, audio-visual stimuli were used, as these are considered the most popular and effective way to elicit emotions [26]. A psychologist supervised the affective video selection, which were obtained from the Internet. Five videos (with duration from 40 to 130 s) were selected to evoke the following five emotions: happiness (funny scenes, compilation of babies laughing), disgust (revolting scenes, such as larvae eaten by humans – beetle larva), fear (tenebrous scenes, with sporadic appearance of haunting – lights out), sadness (compilation of abandoned dogs) and surprise (unexpected scenes, such an animal doing improbable actions, mouse trap survivor – commercial). An additional video with positive emotional content (full movie trailer – Toy Story 3) was also selected to allow better understanding of the experiment. SAM (Self-Assessment Manikins) [27] was used for affective self-assessment by children for each video; it consists of a point scale from 1 to 9 based on valence (pleasure emotions) and arousal (intensity) dimensions [4,27]. Procedure The experiments were carried out in the morning, between 7 a.m. to 12 p.m., with room temperature held at 22°C. The experimental procedure was performed as described in Figure 1. In the test room, the child was invited to sit comfortably, and questions about her/his health condition were asked (How do you feel today? Any pain in the body? Did you take any medicine these days? Did you practice any physical activity this morning?). These questions were asked because the thermal analysis of the face region may be influenced by some symptoms, such as stuffy or runny nose, sneezing, headache and fever, as well 62 as by physiological changes due to activity of the autonomic nervous system [28,29]. In sequence, the tasks of the experiment were briefly explained, and it was asked if the child really would like to participate. Once the answer was confirmed, the child signed the term of assent after reading it together with the examiner. To avoid and eliminate possible interference during the facial thermal image recording, long hair and fringes were tied and held with a clamp, respectively, and jewelry or diadems were removed. In sequence, SAM, the scale of emotion self-assessment, was explained. Figure 1. General scheme of the experimental design. The child was brought into a calm state for at least 20 min in order to adapt her/his body to the room temperature, allowing her/his skin temperature to stabilize for baseline recordings [12,20]. The researcher asked the child not to move and maintain quiet breathing. Meanwhile, brief questions were asked, such as about her/his daily routine, which contributed to the process of relaxation, confidence and proximity to the researcher, with the child becoming more comfortable and less shy or excited during the test. 63 Next, the child was invited to sit down comfortably on another chair facing a 19-inch screen, with the thermal camera at a distance of approximately 85 cm. The thermal camera was connected to a tablet running Android 4.4.2, in order to acquire the facial thermal images through the Therm-App® application. The child’s head was not kept fixed in a position in order to guarantee the spontaneity of the emotion expression, avoiding any discomfort [29]. However, the child was asked to avoid moving her/his head and putting the hands on the face. In the case of unwanted scenes, she/he was advised to close the eyes, if wished. The baseline period was recorded before the video display. The affective videos were displayed at the screen to evoke emotions in the following order: positive content video (for training), disgust, fear, happiness, sadness and surprise, according to the psychologist’s guidelines of our research group. To avoid the predominance of negative emotions in the child, the psychologist also recommended that the last video exhibition was positive stimuli, in order to positively sensitize children at the end of the experiment (mechanism called empathy) [30]. Both a black screen (displayed for 4 s) and a beep sound preceded each video, whereas SAM was performed after each one. The child indicated which representation of the SAM corresponded to her/his feeling, and, then, the examiner recorded this information. The experiments were individually performed, and the thermal image recording lasted approximately 11 min in total. Thermal data analysis Data pre-processing 64 Figure 2 shows the pre-processing outcomes for a subject who gave written informed consent to publish his thermal images according to the PLOS consent form. Figure 2. Representation of the pre-processing sequence of the facial thermal images and the eleven regions of interest (ROIs). i) ROIs: LF - forehead (left side); RF - forehead (right side); LPO - left periorbital region; RPO - right periorbital region; TN - tip of nose; LR - left cheek; RC - right cheek; LPN - left perinasal region; RPN - right perinasal region; LCh - chin (left side) and RCh - chin (right side). The images of the faces were extracted from the thermal images using median and Gaussian filters with further binarization to convert the gray scale image into pure black and white (BW). Small sets of non-connected pixels were deleted to improve image quality, facilitating the foreground identification (face - lighter) and background (darker). Using the BW image, it was possible to detect the face boundaries by finding the uppermost point of the head (the uppermost white pixel), and the left and right limits (the leftmost and rightmost pixels, respectively, near the centroid area). Then, a bottom part of 65 the head was inferred by using a proportional distance from the uppermost head point to the centroid. Eventually, this bounding box was applied to the original image, and the region obtained by cropping into the limits of such area was used to further calculate the statistics. Forehead, tip of nose, cheeks, chin, periorbital and perinasal regions were the facial regions of interest (ROIs) chosen to extract the affective information, as indicated in [20,31]. Squares were manually positioned on the ROIs of the face in the first frame of each set of the thermal images, enabling an automatic square placement on the subsequent frames of the same video. Then, a visual inspection of the ROI bounds was carried out throughout the recordings to ensure the correct position of the squares on the thermal image. The bounds of each ROI had fixed proportions (width and height) based on the child’s face width [13], such as 6.49% for nose, 14.28% for forehead, 3.24% for periorbital region, 9.74% for cheek, 3.24% for perinasal region, and 5.19% for chin. The ROIs were considered taking into account the bilateral regions of the face in order to analyze the facial thermal symmetry [32,33]. For this purpose, a virtual line symmetrically dividing both sides of the face was used, taking as reference the procerus muscle and the nose. Thus, eleven ROIs were used in our study, as shown in Figure 2i. Feature extraction Specific segments of the thermal image recordings were considered and selected for the emotion analysis, per child: 3 s of baseline period (before the audio-visual stimuli exhibition, with the face in neutral state and without emotional stimulus) and 30 s 66 corresponding to each affective video (selection from moments with the highest climax of emotional content and elicitation). During the baseline period, the child was sitting comfortably and relaxed, looking to the camera and not moving. The thermal images were processed by cropping the ROIs in each frame and further calculating their mean, variance and median values (features), as described below. Let 𝐑𝑘 ∈ ℝ𝑚 × 𝑛 be the ROI described by several pixels 𝑅𝑖𝑗 in a range from 0 to 255 (gray scale 8 bits), where 𝑘 is the current ROI being processed and 𝐾 the number of ROIs. It is possible to extract from each ROI the feature vector 𝐅𝑘 = {𝑓1, 𝑓2, … , 𝑓14}, to obtain patterns related to the emotions. The features were extracted from each ROI, as presented in Equation 1. 𝑓1 = �̅� = 1 𝑚⋅𝑛 ∑ ∑ 𝑅𝑖𝑗 𝑛 𝑗=1 𝑚 𝑖=1 , (1) where 𝑓1 is the mean emissivity obtained from all pixels 𝑅𝑖𝑗 , and 𝑖 and 𝑗 are the rows and columns of 𝐑, respectively. 𝑓2 = 𝜎2 = 1 (𝑚⋅𝑛)−1 ∑ ∑ (𝑅𝑖𝑗 − �̅�) 2𝑛 𝑗=1 ,𝑚 𝑖=1 (2) where 𝑓2 is the emissivity variance obtained from all pixels. Similarly, 𝑓3 and 𝑓4 features, which are based on the variance average, were calculated by rows and columns, respectively, as shown in Equations 3 and 4. Here, �̅�𝑖 is the mean value of the row 𝑖, while �̅�𝑗 is the mean value of the each column 𝑗. 𝑓3 = 1 𝑚 ∑ 1 (𝑛−1) ∑ (𝑅𝑖𝑗 − �̅�𝑖) 2𝑛 𝑗=1 𝑚 𝑖=1 , (3) 67 𝑓4 = 1 𝑛 ∑ 1 (𝑚−1) ∑ (𝑅𝑖𝑗 − �̅�𝑗) 2𝑚 𝑖=1 𝑛 𝑗=1 . (4) Additionally, the following features 𝑓5, 𝑓6 and 𝑓7 were computed by the median operator, as shown in Equations 5 to 7, respectively. 𝑓5 is the median value considering all pixels 𝑅𝑖𝑗 in a unique column vector, while 𝑓6 and 𝑓7 were calculated applying the median operator by rows and columns, respectively. 𝑓5 = 𝑚𝑒𝑑𝑖𝑎𝑛(𝐑) , (5) 𝑓6 = 1 𝑚 ∑ 𝑚𝑒𝑑𝑖𝑎𝑛(𝐑i) ,𝑚 𝑖=1 (6) 𝑓7 = 1 𝑛 ∑ 𝑚𝑒𝑑𝑖𝑎𝑛(𝐑j) 𝑛 𝑗=1 . (7) Finally, the previous seven features 𝑓1 to 𝑓7 were used to compute the other seven features by subtracting similar features of consecutive ROIs, as shown in Equation 8, where 𝑘 is the current ROI being processed. 𝑓𝑐+7 = Δ𝑓𝑐(𝑘) = 𝑓𝑐(𝑘) − 𝑓𝑐(𝑘 − 1), 2 ≤ 𝑘 ≤ 𝐾. (8) Then, as each frame has 11 ROIs, and each ROI has 14 features, the total number of features was 154. Feature selection For the feature selection, the feature vectors were analyzed according to the training set, searching for those features that minimize the classification errors. Thus, the Neighborhood Component Analysis (NCA) method was used to learn the feature weights using a regularization process [34,35]. 68 Let T={(x1,y1), (x2,y2)..., (xi,yi)..., (xn,yn)} be the training set, where N is the number of samples, and xi is a m-dimensional feature vector with class label yi ∈ {1, 2...,C}. The Mahalanobis distance between the points xi and xj is given by Equation 9 [34]: 𝑑(𝑥𝑖, 𝑥𝑗) = (𝑥𝑖 − 𝑥𝑗) 𝑇 𝑊𝑇𝑊(𝑥𝑖 − 𝑥𝑗), (9) where W is the transformation matrix, and d is the Mahalanobis distance. If W is a diagonal matrix, Equation 10 can be expressed as follows [35]: 𝑑(𝑥𝑖 , 𝑥𝑗) = ∑ 𝜔𝑙 2|(𝑥𝑖𝑙 − 𝑥𝑗𝑙)|𝑑 𝑙=1 , (10) where wl is a weight associated with the lth feature. In particular, each point xi selects another point xj as neighbor with probability pij. Then, a differentiable cost function may be used, which is based on the stochastic (“soft”) neighbor assignment in the transformed space, as shown in Equation 14. 𝑝𝑖𝑗 = 𝑒 −𝑑(𝑥𝑖,𝑥𝑗) ∑ 𝑒−𝑑(𝑥𝑖,𝑥𝑘) 𝑘≠𝑖 , 𝑝𝑖𝑖 = 0, (11) 𝑝𝑖 = ∑ 𝑦𝑖𝑗𝑝𝑖𝑗𝑗 , (12) 𝜉(𝐖) = ∑ ∑ 𝑦𝑖𝑝𝑖𝑗 − 𝜆 ∑ 𝜔𝑙 2𝑑 𝑙=1𝑗𝑖 , (13) 𝜕𝜉(𝐖) ∂ω𝑙 = 2 ( 1 𝜎 ∑ (𝑝𝑖 ∑ 𝑝𝑖𝑗|𝑥𝑖𝑙 − 𝑥𝑗𝑙| − ∑ 𝑦𝑖𝑗𝑝𝑖𝑗|(𝑥𝑖𝑙 − 𝑥𝑗𝑙)|𝑗𝑗≠𝑖 ) − 𝜆𝑖 ) 𝜔𝑙, (14) where pij is the probability of xi select xj as its nearest neighbor; pi is the probability that the point xi will be correctly recognized; yij = 1 for yi = yj, and yij = 0 otherwise; λ is a regularization parameter that can be fitted using cross-validation, and σ is the width of the probability distribution. Let κ = exp(z/σ) be a kernel function with kernel width σ. If σ → 0, 69 only the nearest neighbor of the query sample can be selected as its reference point, while to σ → ∞, all of the points have the same chance of being selected apart from the query point. More details can be found in [35]. Emotion classification The five videos used to evoke the five emotions (disgust, fear, happiness, sadness and surprise) were labeled. For this purpose, segments of 30 s from each video were labeled as having a high potential to trigger the desired emotion. Thus, groups of patterns linked to a same emotion were obtained. To evaluate the emotion recognition, the training and validation sets were chosen for several runs of cross-validation (k = 3). Here, both training and validation sets were formed in each run, selecting only patterns that correspond to the same segment. Afterwards, on each set, 11 ROIs were located on the face to obtain feature vectors (154 features) (see details in Feature extraction section). The feature vectors of the training set were analyzed through a supervised method for feature selection (of low computational cost) based on NCA [34,35]. Thus, the dimensionality of these feature vectors was reduced, enhancing the class separation and the classification stage. In this instance, the feature vectors of the training set were normalized, using both mean and standard deviation values as reference. Then, the validation set was reduced, taking into account the relevant features, and normalized using the same reference values (mean and standard deviation) obtained from the training set. Finally, Linear Discriminant Analysis (LDA) was used as classifier [36,37]. Statistical analysis 70 Means and standard deviations were calculated to evaluate the valence and arousal dimensions from SAM. Average values of the emissivity variations were calculated to obtain the mean heat signature in each ROI throughout emotion elicitation, taking as reference the baseline period. Thus, emotion data from the affective videos were compared with data from the baseline. A comparison between the bilateral ROI data of the face was also accomplished to verify the facial thermal asymmetry related to the evoked emotions. Student’s t test (α = 0.05) using a Bonferroni correction was used to verify the significance of such emissivity variations in the facial ROIs. In this study, the data normal distribution was verified applying the Kolmogorov-Smirnov test, rejecting the null hypothesis (normal distribution) at the 5% significance level. For this reason, a logarithmic transformation with base two was applied on the data before applying Student’s t test. Furthermore, indices such as accuracy, true positive rate (TPR), Kappa and false positive rate (FPR) were used to evaluate the emotion classification. Results Valence and arousal analysis The mean SAM scores for valence and arousal dimensions calculated for the 28 participant children are shown in Table 1. From Table 1, it is possible to see that the valence scores were pronounced (close to the extreme values of the scale, 1 and 9, corresponding to negative and positive emotions, respectively), whereas arousal scores, close to number five, exhibited a general moderate intensity. Although arousal scores were not substantial, valence scores obtained suggested that the affective videos did trigger specific emotions in TD children. 71 Table 1. Means and standard deviations (SD) of SAM performed by 28 children. Disgust Fear Happiness Sadness Surprise Valence Mean (SD) Arousal Mean (SD) Valence Mean (SD) Arousal Mean (SD) Valence Mean (SD) Arousal Mean (SD) Valence Mean (SD) Arousal Mean (SD) Valence Mean (SD) Arousal Mean (SD) 2.64 (1.64) 4.93 (2.69) 2.96 (1.69) 5.71 (3.28) 8.68 (0.82) 5.36 (3.03) 1.68 (1.44) 3.25 (2.69) 7.93 (1.49) 5.68 (3.29) Thermal data analysis Feature selection Figure 3 shows the feature selection frequency from each ROI for each child, considering three cross-validations for the random selection of the affective segments. The tip of the nose was highlighted due to its high feature selection frequency, shown by white squares in Figure 3a. Figure 4 shows the contribution of each ROI in the feature selection frequency. Seven features were selected: mean, median, variance, mean of the medians on columns, mean of the medians on rows, mean of the variances on columns and mean of the variances on rows. The highest mean values of selection frequency were mean (2.31) and mean of the medians (2.22 and 2.08, for columns and rows, respectively). 72 Figure 3. Feature selection maps for each ROI. Figure 4. Mean frequency of feature selection per ROIs per subject. 73 Both Figures 3 and 4 show the independent contribution of the bilateral ROIs for the feature selection, inferring the facial thermal asymmetry. Emotion classification The results obtained by the classification for the five emotions had a mean accuracy higher than 85% for the 28 subjects (Figure 5a), with Kappa higher than 81% (Figure 5b). Moreover, the true positive rate was higher than 80% for four emotions, except sadness (Figure 5c). The accuracy reached by the classifier was of 89.88% for disgust, 86.57% for fear, 88.22% for happiness, 74.70% for sadness, and 86.93% for surprise. On the other hand, the false positive rate had a mean value of 3.62% and values lower than 5% for classification errors, with 3.27% for disgust, 4.18% for fear, 4.54% for happiness, 3.50% for sadness and 2.63% for surprise. Figure 5. Performance of the emotion classification. 74 Facial emissivity variation For the emotion analysis in comparison to the baseline period, the mean emissivity variations were calculated from the eleven ROIs (LF: forehead (left side); RF: forehead (right side); LPO: left periorbital region; RPO: right periorbital region; TN: tip of nose; LR: left cheek; RC: right cheek; LPN: left perinasal region; RPN: right perinasal region; LCh: chin (left side) and RCh: chin (right side)). The pixel values from all ROIs were used for each analyzed frame. Figure 6 shows significant emissivity variations generated by the emotions in the ROIs in relation to the baseline period, considering the thermal tendency (increasing, decreasing or stable). Moreover, Figure 6 shows significant emissivity variations generated by the emotions between the bilateral ROI pairs. As a result, significant emissivity decreases were observed in the tip of the nose for disgust, fear and happiness, periorbital regions for happiness, sadness and surprise, perinasal regions for disgust and happiness, chin for happiness, sadness and surprise, and forehead for disgust and surprise (Figure 6). Figure 6 shows significant emissivity variations between all ROI pairs for the five emotions, evidencing the thermal asymmetry of the face. In ROI asymmetry analysis, it is possible to notice the significant variations between the cheek pairs with divergent thermal tendencies (emissivity increases in the left cheek and decreases in the right one) for the five emotions. Moreover, divergent thermal tendencies were observed between periorbital region pairs for disgust and perinasal region pairs for sadness and surprise. Figure 6 also shows a significant variation in the right periorbital region and right cheek, with thermal decreases, and left cheek, with thermal increase, for all emotions. 75 Figure 6. Emissivity variation analysis in the selected ROIs, considering the five emotions, taking as reference the baseline (in green) and the ROI pairs (in blue). The green highlights (below line 0) indicate the significance of emissivity variations in the ROIs triggered by each emotion in relation to the baseline. The blue highlights (above line 0) indicate the significance of emissivity variations occurred between the bilateral ROIs to verify the facial thermal asymmetry. Legend: EM vs BL means Emotion versus Baseline; ns means no significant emissivity variation (p-value > 0.05), while * (p-value ≤ 0.05), ** (p-value ≤ 0.01) and *** (p-value ≤ 0.001) indicate significant emissivity variation. Figure 7 shows emissivity decrease (in relation to baseline) in the nose of a child during the emotions. The child’s parents gave written informed consent to publish his thermal images according to the PLOS consent form. 76 Figure 7. Representative frames of the emissivity decrease (in relation to baseline) in nose of a child during the emotions. Pixel intensity: 0-255. Discussion Feature selection and emotion classification According to Figures 3 and 4, tip of nose, cheeks (especially the right one) and chin (mainly its left side) presented higher contribution of features over all cross-validation. This way, the features’ means and mean of medians were highly selected, as shown in Figure 4, with a similar contribution of ROIs located on the right side of the cheeks, left periorbital and perinasal regions. A growing number of studies aim at an automatic classification of emotions through extraction of information from thermal images [11–13]. The classification method based on LDA used here to recognize five emotions (disgust, fear, happiness, sadness and surprise) reached mean accuracy of 85.25% and Kappa of 81.26%. Specifically, accuracy values of 89.88% for disgust, 88.22% for happiness, 86.93% for surprise, 86.57% for fear and 74.70% for sadness were achieved, confirming the effectiveness of our proposed experimental design. 77 Regarding the emotion identification and classification through IRTI of other works with adults, our classification performance was similar. For example, in [13], the authors proposed a system that achieved accuracy of 89.90% using another emotion set (anger, happiness, sadness, disgust and fear) with twenty-five adult subjects. In [12], the authors proposed to distinguish between baseline and affective states of twelve adult subjects in (high and low) levels of arousal and valence through visual stimuli (static images). These researchers found an accuracy of approximately 80% between baseline and high arousal and valence levels, while for baseline versus low arousal and valence levels, they obtained accuracy of 75%. In [38], a deep Boltzmann machine (DBM) was used for emotion recognition from thermal infrared facial images through thermal databases of facial expressions, considering 38 adult subjects and evaluating the valence recognition (positive versus negative). They obtained accuracy of 62.9% for classification of negative and positive valence. In [22], a computational model of facial expression recognition of face thermal images was proposed, using eigenfaces to extract features from a face image dataset of only one adult subject, through Principal Component Analysis (PCA). The evaluated emotions were anger, happiness, disgust, sad and neutral, and the proposed system reached accuracy of approximately 97%. In [39], histogram feature extraction and a multiclass Support Vector Machine (SVM) were used as emotional analysis method to classify four emotions (happiness, sadness, anger and fear) from thermal images of 22 subjects available in the Kotani Thermal Facial Expression (KTFE) Database. The authors achieved a classification average accuracy of 81.95%. In [40], the authors used thermal image processing, Neural Network (NN) and Back Propagation (BP) to recognize neutral, happy, surprise and sad facial expressions of one female, obtaining a mean accuracy of 90%. 78 Valence and arousal analysis and ROIs The valence dimension represents the state between unhappiness and happiness, whereas arousal is the state between relaxation and activation [41]. In our study, the valence values were pronounced towards the extremities (1 and 9) of the SAM rating (Table 1). Figure 6 showed that happiness, surprise and disgust were the inducers of the greatest emissivity variations in the children’s faces, in relation to the baseline. Such emotions triggered significant emissivity variations in pairs of the periorbital region and cheeks and nose. In general, the ROIs that had more significant emissivity variations were the cheeks, periorbital regions and nose. Some studies highlighted the association between thermal variation in the face and emotional dimensions (valence and arousal), such as in [32]. The authors reported higher correlation between facial thermal changes and arousal than valence, with stimuli obtained from a picture database. In fact, many findings on thermal variations note temperature changes for high arousal settings, mainly associated with high anxiety levels [17,25,42]. In literature, for valence dimension, temperature increments in brows, cheeks and around the eyes were observed in adults, with brows and cheeks related to negative valence and eyes related to positive valence [16]. On the other hand, temperature decrements in forehead, cheek and nose were evidenced in babies, along with pleasant emotions [24]. The work described in [43] evaluates the thermal variation in the perinasal region, considering facial expressions, in order to distinguish examples of negative (unpleasant) and positive (pleasant) arousal, distress and eustress conditions, respectively. For 79 eustress (with positive facial expressions), they found locally elevated perinasal signal, whereas for distress (with negative facial expressions), they found fluctuating perinasal signal. The observed differences can be related to the muscle deformation presented during the facial expression, beyond perspiration (found in distress conditions). Therefore, valence effects could be related to muscular deformations due to facial expressions. In general, for valence, decreased facial temperature is considered a sign of negative emotions [24], which is confirmed in Figure 6. This predominant decrease observed in most ROIs may be either the reflex of the subcutaneous vasoconstriction under the control of a sympathetic activation mediating the central activation [15,20] or perspiration, a physiological phenomenon from the sympathetic autonomous system, which occurs due to the absorption of the latent heat by perspiration pore activation, decreasing the local thermal emission [44]. Such decreases were detected in the ROI pair of the forehead region during disgust and surprise; the periorbital region during happiness, sadness and surprise; the perinasal region during disgust and happiness; the chin during surprise, happiness and sadness; and finally, the nose during disgust, fear and happiness. On the other hand, forehead was the ROI that had the least variation of emissivity compared to other ROIs. The tendency towards no significant changes in the forehead is consistent with the fact that it has the most stable temperature [20,32]. In general, emissivity decreases were mainly found in the right cheek and right side of the periorbital and perinasal regions for negative emotions (disgust, fear and sadness), and in the periorbital region and chin for positive emotions (happiness and surprise). Decreases in the right cheek and increases in the left cheek were observed for all emotions. 80 For negative (disgust and fear) and positive (happiness) emotions, a significant emissivity decrement in the nose was particularly noticed (see Figure 6 and Figure 7), in accordance with findings in the literature that indicate nasal temperature decr