MULTIMODAL EMOTION RECOGNITION

MULTIMODAL EMOTION RECOGNITION

The design of several collaborative learning actions has not been knowledgeable by the affective dynamics learners that pass through. During the collaborative actions, the emotions have normally been predicted as having an impact in learning and researchers have discover them very hard to research and model them. The current advances in the neuroscience, biomedical engineering and data mining have gained the researchers attention to this subject. We are at a point where important and significant accuracy in recognizing the fundamental emotional states is possible through a number of approaches. The recognition of affective & mental states offers a magnifying glass into the processes that involved in the collaborative learning activities.

Multimedia indexing is very popular developing techniques that allow people to effectively find media. The other method such as content based methods become compulsory when dealing with huge databases as people cannot possibly annotate all the relevant content. Emotions are intrinsic in human beings and are recognized to be very important for the natural communications, memory, decision making and many other cognitive activities. The present technologies enables for exploring the emotional space by mean of content based analysis of audio and video but also fortunate to other modalities such as human physiology.

 

Multimedia information indexing and the retrieval research are all about the developing algorithms, tools and interfaces that allow people to search and find the content in all probable forms. Although research and exploring in this field have achieved the major steps forward that  enables computers to search texts in quick and correct ways, although difficulties still exists when dealing with the various different media such as images, videos or audio. The present commercial search methods generally rely on metadata as captions or keywords. On the web this metadata is typically extracted and extrapolated through the text which is surrounded the media and assuming a direct semantic connection between the two. On the other hand, in most of the cases this information is not enough, complete, accurate and exact; in other cases this information is not even available. The content based methods are developed to search through the semantic information intrinsically which is carried by the media themselves. One of the major challenges in the content based multimedia retrieval remains the bridging of the semantic gap that refers to the difference of the abstraction which subsists between extracted low level features and the high level features which is requested by the humans’ natural queries. The ability to identify the emotions is intrinsic in human beings and is known to be very significant for the natural interactions, memory and decision making and many other cognitive activities. Therefore, we argue that the emotion which is expressed in the form of media such as movies or songs that could be used for the tasks of indexing and retrieval or automatic summarization.

 

The information about the emotion that better depicts a movie for e.g. can be used to index that particular movie by genre like categories. In other scenario such as in adventure or musical movies the connection among emotions and film genre is less clear. But still in these types of cases, there may be the connection between the development of the emotions in films & their various classifications.

Albeit studies for the indexing and retrieval community admit that the emotions are significant characteristic of media and that they may be used in various fascinating ways such as semantic tags only little efforts have been done to connect the emotions to content based indexing and retrieval of multimedia. These works represent the interest for such a kind of emotional content based retrieval systems, but quite lack as an appropriate estimation study. In addition, we argue that the emotions should not symbolize the media characterization only but also the many other tags about the content of the media which will be used together with emotions that complete systems.

This example representing the significance of a multi disciplinary approach in which one could be trying to get an action movie, one possibility is to look for the explosions or the gunfights but at the same explosions will be also there in a documentary about the controlled building demolitions and the gunfights may be recorded in a range of shooting. The other possibility will be to recognize an action movie only through its emotion evolution but this recognition may be too complex. In these types of cases, both these unimodal systems have more chances to not pass the task and retrieving non relevant movies. By combining the two systems might make possible of having good results with moderately low complexity: videos may be selected which consist explosions & documentaries could be cut off because their usual mood and their emotion dynamics are generally very different from the one that contains in action movies.

  • Anger is best represented using the x coordinates of the eyes and of the upper lip and the information about the correct alignment of the eyebrows and for the audio we will make efficient use energy and the first LPC.
  • Disgust is represented with the x coordinates of the eyes, the upper lip and the nose, while using audio features the information of the distances of the eye region other than the first MFCC must be avoided.
  • Fear is primarily recognized using the video features, only the pitch appear to return good results for audio features.
  • Happiness is recognized by the coordinates of the mouth and the y coordinates of the mouth corners. The distance chin to mouth may be used too; for the audio features we will generally rely on the 3rd formant.
  • Sadness is recognized using the features and generally audio seem to better discriminate between sadness and all other emotions also.
  • Surprise is recognized by the use of the x coordinates of eyes, upper lip, nose, the mean face x displacement and the right eyebrow alignment

Emotions are the essential to human lives and play a vital role in communication between people. This interactional phenomenon has been expansively studied by the various research groups worldwide. A variety of applications enables the computers to read emotions that involve also human computer interfaces. These are quite an affective computer system which is termed as an active area of research that includes recognizing emotions and also generating appropriate responses. There are several approaches for the recognition of emotions by considering static image of face, speech and keyboard stroke pattern modalities. All these modalities have been investigated individually as unimodal systems. Recognizing the inadequacy of unimodal approaches, the study also explores the multimodal emotion recognition framework by combining each these modalities. The work involves identifying and extracting the appropriate features from each of these modalities and with the use of several classification algorithms for classifying them into emotions. These features are extracted from the face which includes the position of eyes, mouth, eyebrows and nose. We have used several classification algorithms such as bayesian classification and neural networks etc. to classify the emotions.