Skip to main content

Advances, Systems and Applications

Journal of Cloud Computing Cover Image

Table 1 Summary of datasets used to evaluate the hierarchical clustering algorithms

From: A dockerized framework for hierarchical frequency-based document clustering on cloud computing infrastructures

DatasetDomain# of docs# of terms# of terms after prep.# of classesSource
Classic4Abstracts7095774965764[43]
ReviewsNews articles406922,92712,4315[43]
Tr23TREC documents204583343846[43]
LATimesNews articles627910,02063896[43]
Tr31TREC documents92710,12969467[43]
La2sNews articles307512,43385177[43]
WebKbWeb pages828222,89211,0097[43]
Tr12TREC documents313580542838[43]
Re8News articles7674890153798[43]
Tr11TREC documents414643046329[43]
Tr45TREC documents6908262601610[43]
Tr41TREC documents8787455540610[43]
Oh10Medical documents10503239242510[43]
Dmoz-ScienceWeb pages60005011371912[43]
Dmoz-HealthWeb pages35004217317213[43]
Re0Articles15042886220913[43]
Dmoz-ComputersWeb pages95005011352719[43]
WapWeb pages15608460598820[43]
20 NewsgroupsE-mails18,80845,43416,49920[43]
Re1Articles16573758286325[43]
ACMDigital library349360,76816,31540[43]
New3News articles955826,83314,48344[43]
OpinosisReviews64572693220151[43]
NYTimesNews articles300,000102,66018,001-[44]
PubMedAbstracts8,200,000141,04321,451-[44]