An Intelligent Multi-modal Interview Simulation System Using Large Language Models, Automatic Speech Recognition, And Neural Text-to-speech Synthesis

Manohar Chaudhari; Atharv Kulkarni; Siddhedh Shelar; Sanika Dhanve; Sanmesh Satpute

Volume 12, Issue 6 (June 2026)

An Intelligent Multi-modal Interview Simulation System Using Large Language Models, Automatic Speech Recognition, And Neural Text-to-speech Synthesis

Impact Factor

7.883

Call For Paper

Volume 12 Issue 07

July 2026

Download Paper Format

Copyright Form

Under License Of

Creative Commons Attribution-NonCommercial
4.0 International License

Share on:

Author(s)

Manohar Chaudhari Atharv Kulkarni Siddhedh Shelar Sanika Dhanve Sanmesh Satpute

Abstract

Preparing For Technical Employment Interviews Is A High-stakes Endeavor That Demands Both Domain Expertise And Practiced Verbal Communication. Conventional Preparation Strategies—textbook Study, Static Question Banks, And Peer Mock Sessions—suffer From Well-documented Limitations: They Are Non-personalised, Require Scheduling Coordination, And Provide No Systematic Feedback On Performance. This Paper Presents The AI Interview Assistant (AIIA), A Full-stack, Multi-modal Web Platform That Automates The Entire Interview Simulation Life-cycle. AIIA Integrates Three Distinct AI Services: (1) Google Gemini, A Large Language Model (LLM) Responsible For Context-aware Question Generation, Adaptive Conversational Follow-up, Code Evaluation, And Structured Feedback Synthesis; (2) Assem-blyAI Universal-2, A State-of-the-art Automatic Speech Recognition (ASR) Engine For Real-time Candidate Voice Transcription; And (3) Murf AI FALCON, A Neural Text-to-speech (TTS) Synthesiser That Voices The AI Interviewer Natalie. The System Supports Eight Technical Roles, Three Difficulty Tiers, And Three Code Chal-lenge Formats—write, Fix, And Explain—across Four Program-ming Languages. Interview Sessions Are Stored In A MongoDB Document Database, Enabling Longitudinal Progress Tracking. A Five-category, LLM-generated Feedback Report Is Delivered Upon Session Completion. Empirical Observations Demonstrate That The Five-prompt LLM Orchestration Architecture Produces Contextually Coherent Question Sets And Qualitatively Discrimi-native Performance Assessments. The AIIA System Establishes A Replicable Architectural Template For Deploying Conversational AI Agents In High-stakes Educational Assessment Contexts.

Keywords

Interview Simulation; Large Language Models; Automatic Speech Recognition; Text-to-speech Synthesis; MERN Stack; Educational AI; Prompt Engineering; Conversational Agents

Paper ID

IJSARTV12I6105593

Publication Date

June 2, 2026

Research Area

Computer Engineering

Download Full Article

An Intelligent Multi-modal Interview Simulation System Using Large Language Models, Automatic Speech Recognition, And Neural Text-to-speech Synthesis

Impact Factor

Call For Paper

Volume 12 Issue 07

Download Paper Format

Copyright Form

Under License Of

Author(s)

Abstract

Keywords

Paper ID

Publication Date

Research Area

Submit Your Paper to IJSART

ISSN Number