Forming Dataset of The Undergraduate Thesis using Simple Clustering Methods

Authors

  • Tio Dharmawan University of Jember, Indonesia
  • Chinta ’Aliyyah Candramaya University of Jember, Indonesia
  • Vandha Pradwiyasma Widharta Pukyong National University, Korea, Republic of

DOI:

https://doi.org/10.25124/ijies.v7i01.187

Keywords:

Document Clustering; Text Mining; Relevant Term; Information Retrieval; Topic Identification

Abstract

Each university collects many undergraduate theses data but has yet to process it to make it easier for
students to find references as desired. This study aims to classify and compare the grouping of
documents using expert and simple clustering methods. Experts have done ground truth using OR
Boolean Retrieval and keyword generation. The best cluster was discovered by the experiments using
the K-Means, K-Medoids, and DBSCAN clustering methods and using Euclidean, Manhattan, City
Block, and Cosine Similarity metrics. The cluster with the best Silhouette Score compared to the
accuracy of the categorization of each document. The K-Means clustering method and the Cosine
Similarity metric gave the best results with a Silhouette Score value of 0.105534. The comparison
between ground truth and the best cluster results shows an accuracy of 33.42%. The result shows that
the simple clustering method cannot handle data with Negative Skewness and Leptokurtic Kurtosis.

Downloads

Published

2024-10-16

How to Cite

Tio Dharmawan, Chinta ’Aliyyah Candramaya, & Vandha Pradwiyasma Widharta. (2024). Forming Dataset of The Undergraduate Thesis using Simple Clustering Methods. International Journal of Innovation in Enterprise System, 7(1), 31–40. https://doi.org/10.25124/ijies.v7i01.187

Citation Check

Similar Articles

1 2 3 4 > >> 

You may also start an advanced similarity search for this article.