Implementasi Algoritma TF-IDF Pada Pengukuran Kesamaan Dokumen Adi Ryansyah1 dan Sri Andayani2

Adi  Ryansyah; Sri Andayani

Penulis

Adi Ryansyah
Sri Andayani Universitas Katolik Musi Charitas

Kata Kunci:

documents similarity measure, TF_IDF, vector, cosine simility

Abstrak

Documents similarity measure is a time consuming problem. The large amount of documents and the large number of pages per document are causing the similarity measures to becomes a complicated and hard job to do manually. In this research, a system that can automatically measuring similarity between documents is built by implementing TF-IDF. Measurements are carried by first creating a vector representation of documents being compared. This vector representation containing the weight of each term in the documents. After that, the similarity value are calculated using cosine similarity. The finished system can carry out comparison of documents in pdf or word format. Document comparison can be done using all the chapters in the report, or just a few selected chapters that are considered significant. Based on experiment, it can be concluded that TF-IDF needs at least three documents to be available in the document collection being processes. The test of correlation shows that for document in pdf format, there is a significant correlation between the amount of characters in the document with the processing time.

Implementasi Algoritma TF-IDF Pada Pengukuran Kesamaan Dokumen Adi Ryansyah1 dan Sri Andayani2

Penulis

Kata Kunci:

Abstrak

##submission.downloads##

Diterbitkan

Terbitan

Bagian

Lisensi

Artikel paling banyak dibaca berdasarkan penulis yang sama

Bahasa