General configurations

Basic configuration


Sample size - doc.


Distance metric

Vectorize

Process

Preprocessed data


Process Doc2vec

Process TFIDF

Different custering data

Advanced configuration


n_clustering_processes


n_evaluation_processes


location


indexing


Result folder


Raw data folder


Raw clustering data folder




Sample clusterings


Sampling fitness

Min samples per clust.


Max sample size


Max iterations


Evaluate


Clustering
Kmeans | DBSCAN | Agglomerative

Vectorizing
TFIDF | Doc2vec | PACT

Evaluation measure
Hom-Com-V | Silhouette | Co-oc

Clustering algorithms

K-MEANS

K-means configuration

Number of clusters



Use Minibatch

Batch size


More configurations

kmeans_init


kmeans_n_init


kmeans_n_job


kmeans_max_iter


kmeans_verbose

DBSCAN:

DBSCAN configuration

Minimium points



Epsilon



More configurations

dbscan_algorithm


dbscan_leaf_size


dbscan_p

Agglomerative:

Agglomerative configuration

Number of clusters



Linkage


Vectorizing algorithms

Doc2vec

Doc2vec configuration

Vector size



Window size



More configurations

doc2vec_dm


doc2vec_alpha


doc2vec_min_alpha


doc2vec_min_count


doc2vec_iter


doc2vec_negative

TFIDF

TFIDF configuration

Vector size



Use PCA

PCA Vector size

PACT

PACT configuration

Use P feature

Use A feature

Use C feature

Generated Configuration

Please select you parameters