|3/2015 - 20|
Evaluation of Subspace Clustering Using Internal Validity MeasuresOSZUST, M. , KOSTKA, M.
|Click to see author's profile on SCOPUS, IEEE Xplore, Web of Science|
|Download PDF (1,280 KB) | Citation | Downloads: 212 | Views: 1,007|
pattern recognition, data mining, subspace clustering, clustering validation, distance metrics
clustering(19), data(13), information(9), subspace(8), algorithms(7), measures(6), machine(6), evaluation(6), systems(5), review(5)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2015-08-31
Volume 15, Issue 3, Year 2015, On page(s): 141 - 146
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2015.03020
Web of Science Accession Number: 000360171500020
SCOPUS ID: 84940728824
Different clustering algorithms, or even the same algorithm with different input parameters, can produce different data partitioning. Then, clustering validity measures are applied in order to determine which results have better quality than others. External measures can be used for evaluation of clustering algorithms on datasets with known data division. However, in a real scenario such information is not available, and here internal measures are often applied. Subspace clustering techniques can create clusters which utilise different subsets of the full feature space. From this reason, a calculation of internal measures using the full feature space distance metrics (e.g., Euclidean distance) is not justified. In this paper, we propose a novel approach to subspace clustering evaluation with internal quality measures, i.e., we apply distance metrics that are able to handle missing attribute values or are used in dimensionality reduction techniques. Our approach is verified on eight publicly available, widely-used datasets. Obtained results are promising and allow recommending proposed distance metrics to be suitable for calculation of examined internal validation measures.
|References|||||Cited By «-- Click to see who has cited this paper|
| S.-H. Liao, P.-H. Chu, and P.-Y. Hsiao, "Data mining techniques and applications - A decade review from 2000 to 2011," Expert Systems with Applications, vol. 39, no.12, pp. 11303-11311, 2012. |
[CrossRef] [Web of Science Times Cited 146] [SCOPUS Times Cited 229]
 R. Xu and D. C. Wunsch II, Clustering, New York, NY, USA, Wiley/IEEE Press, 2009
 R. Xu and D. C. Wunsch II, "Clustering algorithms in biomedical research: a review," Biomedical Engineering, IEEE Reviews, vol. 3, pp. 120-154, 2010.
[CrossRef] [SCOPUS Times Cited 98]
 A. Nagpal, A. Jatain, and D. Gaur, "Review based on data clustering algorithms," Information & Communication Technologies (ICT), 2013 IEEE Conference on., pp. 298-303, April 2013.
[CrossRef] [SCOPUS Times Cited 16]
 C. C. Aggarwal and C. K. Reddy, Data clustering: algorithms and applications, CRC Press, 2013.
 A. Patrikainen and M. Meila, "Comparing subspace clusterings," IEEE Transactions on Knowledge and Data Engineering, vol. 18:7, pp. 902-916, 2006.
[CrossRef] [SCOPUS Times Cited 63]
 H. P. Kriegel, P. Kroger, and A. Zimek, "Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 3:1, no. 1, 2009.
[CrossRef] [Web of Science Times Cited 296] [SCOPUS Times Cited 483]
 B. S. S. M. zu Eissen and F. Wisbrock, "On cluster validity and the information need of users," in Proc. 3rd Int. Conference on Artificial Intelligence and Applications (AIA 03), 2003.
 L. Parsons, E. Haque, and H. Liu, "Subspace clustering for high dimensional data: a review," ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 90-105, 2004.
 S. Günnemann, I. Färber, E. Müller, I. Assent, and T. Seidl, "External evaluation measures for subspace clustering," in Proceedings of the 20th ACM international conference on Information and knowledge management, ACM, pp. 1363-1372, 2011.
[CrossRef] [SCOPUS Times Cited 36]
 S. Ben-David and M. Ackerman, "Measures of clustering quality: A working set of axioms for clustering," in Proceedings of the Advances in Neural Information Processing Systems, pp. 121-128. 2008.
 N. X. Vinh, J. Epps, and J. Bailey, "Information theoretic measures for clusterings comparison: is a correction for chance necessary?," in Proceedings of the 26th Annual International Conference on Machine Learning, ACM, pp. 1073-1080, 2009.
[CrossRef] [SCOPUS Times Cited 7]
 N. X. Vinh, J. Epps, and J. Bailey, "Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance," Journal of Machine Learning Research, vol. 11, pp. 2837-2854, 2010.
 E. Muller, S. Gunnemann, I. Assent, and T. Seidl, "Evaluating clustering in subspace projections of high dimensional data," in Proceedings of the VLDB Endowment, vol. 2, no. 1, pp. 1270-128, 2009.
 E. Bae and J. Bailey, "Enriched spatial comparison of clusterings through discovery of deviating subspaces," Machine Learning, vol. 98, no. 1-2, pp. 93-120, 2015.
[CrossRef] [Web of Science Times Cited 1] [SCOPUS Times Cited 1]
 M. Hassani, Y. Kim, S. Choi, and T. Seidl, "Subspace clustering of data streams: new algorithms and effective evaluation measures," Journal of Intelligent Information Systems, Springer US, pp. 1-17, 2014.
[CrossRef] [Web of Science Times Cited 2] [SCOPUS Times Cited 3]
 U. Markowska-Kaczmar and A. Hurej, "Evaluation of subspace clustering quality," Hybrid Artificial Intelligence Systems, Springer Berlin Heidelberg, pp. 400-407, 2008.
[CrossRef] [SCOPUS Times Cited 1]
 D. L. Davies and D. W. Bouldin, "A cluster separation measure," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 2, pp. 224-227, 1979.
[CrossRef] [SCOPUS Times Cited 2547]
 C. L. Blake and C. J. Merz, "UCI Repository of machine learning databases http://archive.ics.uci.edu/ml/ ," Irvine, CA: University of California. Department of Information and Computer Science, 1998.
 S. Gajawada, and D. Toshniwal, "Vinayaka: a semi-supervised projected clustering method using differential evolution," International Journal of Software Engineering and Applications (IJSEA), vol. 3, no. 4, pp. 77-85, 2012.
 P. Garcia-Laencina, J. Sancho-Gomez, and A. Figueiras-Vidal, "Pattern classification with missing data: a review," Neural Comput. Appl., vol. 19 no. 2, pp. 263-282. 2010.
[CrossRef] [Web of Science Times Cited 91] [SCOPUS Times Cited 127]
 C. C. Aggarwal, J. L. Wolf, P. S. Yu, C. Procopiuc, and J. S. Park, "Fast algorithms for projected clustering," in ACM SIGMoD Record, vol. 28, no. 2, pp. 61-72, ACM,1999.
 U. Maulik and S. Bandyopadhyay, "Performance evaluation of some clustering algorithms and validity indices," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no.12, pp.1650-1654, 2002.
[CrossRef] [Web of Science Times Cited 468] [SCOPUS Times Cited 579]
 P. J. Rousseeuw, "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis," Computational and Applied Mathematics, vol. 20, pp. 53-65, 1987.
 O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J.-M. Pérez, and I. Perona, "An extensive comparative study of cluster validity indices," Pattern Recognition, vol. 46, no. 1, pp. 243-256, 2013.
[CrossRef] [Web of Science Times Cited 142] [SCOPUS Times Cited 179]
 G. E. A. P. A. Batista and M. C. Monard, "Experimental comparison of k-nearest neighbour and mean or mode imputation methods with the internal strategies used by C4.5 and CN2 to treat missing data," University of Sao Paulo, 2003.
 E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, "Dimensionality reduction for fast similarity search in large time series databases," Knowledge and information Systems, vol. 3, no. 3, pp. 263-286, 2001.
 E. Achtert, H.-P. Kriegel, and A. Zimek, "ELKI: a software system for evaluation of subspace clustering algorithms," in Proceedings of the 20th international conference on Scientific and Statistical Database Management, SSDBM '08, pp. 580-585. Springer Berlin / Heidelberg, 2008.
[CrossRef] [SCOPUS Times Cited 37]
 A. Hein and T. Kirste, "Unsupervised detection of motion primitives in very high dimensional sensor data," in Proceedings of the 5th Workshop on Behaviour Monitoring and Interpretation, BMI'10, Karlsruhe, Germany, 2010.
 D. Ingaramo, D. Pinto, P. Rosso, and M. Errecalde, "Evaluation of internal validity measures in short-text corpora," in Computational Linguistics and Intelligent Text Processing, Springer Berlin Heidelberg, pp. 555-567, 2008.
[CrossRef] [SCOPUS Times Cited 20]
 J. Handl, J. Knowles, and D.-B. Kell, "Computational cluster validation in post-genomic data analysis," Bioinformatics, vol. 21, no. 15, pp. 3201-3212, 2005.
[CrossRef] [Web of Science Times Cited 391] [SCOPUS Times Cited 435]
Web of Science® Citations for all references: 1,537 TCR
SCOPUS® Citations for all references: 4,861 TCR
Web of Science® Average Citations per reference: 48 ACR
SCOPUS® Average Citations per reference: 152 ACR
TCR = Total Citations for References / ACR = Average Citations per Reference
We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more
Citations for references updated on 2017-12-12 18:56 in 143 seconds.
Note1: Web of Science® is a registered trademark of Thomson Reuters.
Note2: SCOPUS® is a registered trademark of Elsevier B.V.
Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site.
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.