Impact Factor
7.883
Call For Paper
Volume: 12 Issue 03 March 2026
LICENSE
Multimodal Emotion Recognition Using Visual-text Fusion With Resnet-50 And Svm On The Meld Dataset
-
Author(s):
Ayushi Parmar | Prof. Chandni Sikarwar
-
Keywords:
Multimodal Emotion Recognition, ResNet-50, TF-IDF, Feature Fusion, Support Vector Machine, MELD Dataset, Affective Computing, Computer Vision, Natural Language Processing.
-
Abstract:
This Paper Presents A Multimodal Emotion Recognition Framework That Integrates Visual And Textual Modalities To Improve The Accuracy And Robustness Of Emotion Classification Systems. The Visual Modality Is Processed Using A Pre-trained ResNet-50 Convolutional Neural Network To Extract High-level Spatial Features From Video Frames, While The Textual Modality Is Represented Using TF-IDF–based Embeddings Derived From Transcribed Utterances. The Extracted Features Are Concatenated Through Feature-level Fusion And Classified Using A Support Vector Machine (SVM) Optimized For High-dimensional Data. The Proposed Approach Is Evaluated On The MELD Dataset, Which Contains Synchronized Video And Text Samples Annotated With Seven Emotion Classes. Experimental Results Demonstrate That The Fusion Of Visual And Textual Features Significantly Outperforms Unimodal Baselines, Achieving An Overall Accuracy Of 89.7%, With Strong Performance Across Precision, Recall, And F1-score Metrics. Additional Qualitative Analysis Confirms The Framework’s Applicability In Real-world Interactive Systems, Supported By An Interface That Displays Real-time Predictions Alongside Actual Labels.
Other Details
-
Paper id:
IJSARTV11I11104280
-
Published in:
Volume: 11 Issue: 11 November 2025
-
Publication Date:
2025-11-12
Download Article