2. What is k-means clustering?
3. When performing K-means clustering, what attributes should you look for in the inputs for creating the clusters ?
Unsupervised classification is one of the major categories of techniques of image classification calculated by software. The outcome of unsupervised classification that is pixel grouping with common characteristics are based on software analysis of image. This does not requires user to provide sample classes. Computers for determining related pixels and grouping them with common characteristics use such technique. These common characteristics produced by computers have to be interrelated to actual features on various grounds and this include developed areas, wetlands and coniferous forests. Since sample classes is not provided in unsupervised classification, the number of classes to generate and band to use is identified by users. Pixels are then clustered into number of classes with the help of software and in later part, land cover classes are identified by users.
Unsupervised classification comprise of three steps and this include activating extension of spatial analyst and generating clusters and assigning classes. The identified classes under unsupervised classification may or may not correspond well to land interest of cover types. There are too many land cover classes concerning heterogeneous land cover and they are the results of such classification. This particular classification is considered useful when the image area does not have aerial photographs and it is not possible for users to accurately specify training areas of such cover type.
K means is regarded as one of the simplest unsupervised learning algorithm and assist in solving well-known problems of clustering. The procedure of K mean clustering follows a simple way of classifying given set of data through a certain number of clusters. In this step, k centroids for each of one cluster is defined. Since different location results in different results, it is essential to place these centroids in a cunning way. Hence, centroids should be placed far away from each other as possible. In the second step, each point belonging to given set of data is associated to the nearest centroid. First step is completed when no point is pending. It is required to re calculate k new centroids as clusters barycenter, which is the result of previous step. Thirdly, a new binding has to be done between nearest new centroid and same data set points after the formation of k new centroids. This result in generation of loop and after this k centroids change their location gradually until the time no changes are done.
An example depicting k means clustering is as follows:
It is assumed that there is an n sample feature vectors y1, y2, y3 …...yn for all same class and it is known that they are falling into k compact clusters and l < n. In cluster i, the mean of cluster is assumed to be u1. A minimum distance classifier can be used to separate the clusters, if they are well separated. Then it can be said that y is in cluster if [ y- u1 ] is minimum of all l distance. Following procedure is selected for finding l means.
- Initial guesses are to be made for the means u1, u2, u3 …… ul.
- It is suggested to estimate means for classifying samples into clusters until there are no changes in any mean. With mean of all samples of cluster i, u1 is replaced.
Inputs in performing k means clustering is that they should be of appropriate choice of number of clusters. This would help in yielding proper results. Input data set should be partitioned in k means algorithm. Input data set does not requires ordering and it should not deviate from their current value. Inputs comprised of numeric values of attributes and attributes of inputs should have commonly used technique of trigonometrically encoding periodic. Nonetheless, using this trigonometric encoding causes a systematic error despite of providing any input to algorithm. Input data should be capable of presorting and using indirect indexing for accessing sorted array. It should also have mixed elements and has collection of features for each data points. Input for cluster creating in performing k means clustering should have features of machine learning algorithm. Input points needs to be coordinated for forming the number of clusters. Inputs that are empty or null results in exceptions. They are such that it capable initializing center of clusters and attributing closest of cluster for each point of data. Unsupervised classification does not require the users to have the foreknowledge of each classes. It can still consider the multivariate spreads and obtaining accurate covariance matrix and mean vectors.