IMPACT OF PARAMETERS CHARACTERIZING CLUSTERING ON DATA ANALYSIS RESULTS
DOI:
https://doi.org/10.17770/lner2012vol1.4.1828Keywords:
clustering algorithms, metrics, k–means, cluster validityAbstract
Clustering algorithms are used to group some given objects defined by a set of numerical properties in such a way that the objects within a group are more similar than the objects in different groups. All clustering algorithms have common parameters the choice of which characterizes the effectiveness of clustering. The most important parameters characterizing clustering are: metrics (the distance between cluster elements and cluster centre), number of clusters k and cluster validity criteria. The goal of the paper – to perform the evaluation of the validity of metrics’ choice, to describe the change with respect to the number of clusters for experimental data purposes and to evaluate the credibility of clustering results. As an input data the table describing the rating of Latvian state higher educational institutions for year 2011 has been used and the goal of the experiment was to show, how by using the clustering methods it is possible to analyze the mentioned data in an alternative way.Downloads
References
AGRAWAL, R. et al. Efficient similarity search in sequence databases. Proc. 4th
Int. Conf. On Foundations of Data Organizations and Algorithms, Chicago.1993.
pp. 69–84.
EVERITT, B. Cluster analysis. Edward Arnold, London, 1993.
GAN, G. et al. Data clustering: Theory, algorithms and applications. ASA–SIAM
series on Statistics and Applied Probability, SIAM, Philadelphia, ASA,
Alexandria, VA, 2007.
GRABUSTS, P. Distance Metrics Selection Validity in Cluster Analysis. RTU
zinātniskie raksti. 5. sēr., Datorzinātne. 49. sēj. 2011. 72.–77. lpp.
HAN, J. et al. Geographic Data Mining and Knowledge Discovery. Taylor and
Francis, 2001. 372 pages.
KAUFMAN, L., ROUSSEEUW, P. Finding groups in data. An introduction to
cluster analysis. John Wiley & Sons, 2005.
LI, M. et al. The similarity metric. IEEE Transactions on Information Theory,
vol.50, No. 12, 2004. pp.3250–3264.
VITANYI, P. Universal similarity. ITW2005, Rotorua, New Zealand, 2005.
XU, R., WUNVH, D. Clustering. John Wiley & Sons, 2009. pp. 263–278.
KUZMINA, I. Augstskolu vērtēšana uzkurina kaislības [tiešsaiste]. Laikraksta
“Latvijas Avīze” publikācija [atsauce 2012.g. 15.feb.]. Pieejas veids:
http://la.lv/index.php?option=com_content&view=article&id=314680:augstsk
olu–vrtana–uzkurina–kaislbas&catid=124:aktuli&Itemid=146
Rank of Universities of Latvia [tiešsaiste]. Ranking Web of World Universities
[atsauce 2012.g. 15.feb.]. Pieejas veids:
http://www.webometrics.info/rank_by_country.asp?country=lv
SIR World Report 2011[tiešsaiste]. SCImago Institutions Rankings [atsauce
g. 15.feb.]. Pieejas veids: http://www.scimagoir.com/
Top 400 World Universities [tiešsaiste]. The Times Higher World University
Ranking [atsauce 2012.g. 15.feb.]. Pieejas veids:
http://www.timeshighereducation.co.uk/world–university–rankings/2011–
/top–400.html
QS World University Rankings 2011/2012 [tiešsaiste]. QS Top Universities
[atsauce 2012.g. 15.feb.]. Pieejas veids:
http://www.topuniversities.com/university–rankings/world–university–
rankings/2011