Clustering is a machine learning technique for analyzing data and dividing in to groups of similar data. These groups or sets of similar data are known as clusters. Cluster analysis looks at clustering algorithms that can identify clusters automatically. Hierarchical and Partitional are two such classes of clustering algorithms. Hierarchical clustering algorithms break up the data in to a hierarchy of clusters. Paritional algorithms divide the data set into mutually disjoint partitions.

**What is Hierarchical Clustering?**

Hierarchical clustering algorithms repeat the cycle of either merging smaller clusters in to larger ones or dividing larger clusters to smaller ones. Either way, it produces a hierarchy of clusters called a dendogram. Agglomerative clustering strategy uses the bottom-up approach of merging clusters in to larger ones, while divisive clustering strategy uses the top-down approach of splitting in to smaller ones. Typically, the greedy approach is used in deciding which larger/smaller clusters are used for merging/dividing. Euclidean distance, Manhattan distance and cosine similarity are some of the most commonly used metrics of similarity for numeric data. For non-numeric data, metrics such as the Hamming distance is used. It is important to note that the actual observations (instances) are not needed for hierarchical clustering, because only the matrix of distances is sufficient. Dendogram is a visual representation of the clusters, which displays the hierarchy very clearly. The user can obtain different clustering depending on the level at which the dendogram is cut.

**What is Partitional Clustering?**

Partitional clustering algorithms generate various partitions and then evaluate them by some criterion. They are also referred to as nonhierarchical as each instance is placed in exactly one of k mutually exclusive clusters. Because only one set of clusters is the output of a typical partitional clustering algorithm, the user is required to input the desired number of clusters (usually called k). One of the most commonly used partitional clustering algorithms is the k-means clustering algorithm. User is required to provide the number of clusters (k) before starting and the algorithm first initiates the centers (or centroids) of the k partitions. In a nutshell, k-means clustering algorithm then assigns members based on the current centers and re-estimates centers based on the current members. These two steps are repeated until a certain intra-cluster similarity objective function and inter-cluster dissimilarity objective function are optimized. Therefore, sensible initialization of centers is a very important factor in obtaining quality results from partitional clustering algorithms.

**What is the difference between Hierarchical and Partitional Clustering?**

Hierarchical and Partitional Clustering have key differences in running time, assumptions, input parameters and resultant clusters. Typically, partitional clustering is faster than hierarchical clustering. Hierarchical clustering requires only a similarity measure, while partitional clustering requires stronger assumptions such as number of clusters and the initial centers. Hierarchical clustering does not require any input parameters, while partitional clustering algorithms require the number of clusters to start running. Hierarchical clustering returns a much more meaningful and subjective division of clusters but partitional clustering results in exactly k clusters. Hierarchical clustering algorithms are more suitable for categorical data as long as a similarity measure can be defined accordingly.