Impact Factor
7.883
Call For Paper
Volume: 12 Issue 03 March 2026
LICENSE
A Comparative Study Of Svd And Nmf For Semantic Text Clustering Using Kmeans
-
Author(s):
Danish Khan | Prof. Arpana Jaiswal
-
Keywords:
Text Clustering, Singular Value Decomposition (SVD), Non-negative Matrix Factorisation (NMF), KMeans, Dimensionality Reduction, TF-IDF, Semantic Analysis, 20 Newsgroups Dataset, Normalised Mutual Information (NMI).
-
Abstract:
The Exponential Growth Of Unstructured Text Data, Such As News Articles, Social Media Posts, And Online Discussions, Has Created The Need For Effective Methods Of Semantic Organisation. Text Clustering Plays A Vital Role In This Context By Grouping Similar Documents Without Labelled Data. However, The Inherent Challenges Of High Dimensionality And Sparsity In Textual Representations Hinder Clustering Performance. To Address This, Dimensionality Reduction Techniques Are Integrated With Clustering Algorithms. This Paper Presents A Comparative Study Of Two Approaches: Singular Value Decomposition (SVD) Combined With KMeans And Non-negative Matrix Factorisation (NMF) Combined With KMeans. The Experiments Were Conducted Using The 20 Newsgroups Dataset In MATLAB R2024b, With TF-IDF Employed As The Feature Extraction Technique. Results Demonstrate That SVD + KMeans Achieved Superior Clustering Accuracy With A Normalised Mutual Information (NMI) Score Of Approximately 0.55 On The Training Set And 0.50 On The Test Set, Whereas NMF + KMeans Attained Moderate Accuracy (NMI ≈ 0.45) But Offered More Interpretable, Topic-based Clusters. These Findings Confirm The Trade-off Between Accuracy And Interpretability, Suggesting That Method Selection Should Be Based On Specific Application Requirements. The Study Contributes By Providing A Reproducible MATLAB-based Framework And Offering Insights Into The Suitability Of Dimensionality Reduction Strategies For Large-scale Text Clustering.
Other Details
-
Paper id:
IJSARTV11I11104274
-
Published in:
Volume: 11 Issue: 11 November 2025
-
Publication Date:
2025-11-12
Download Article