These arrows will be limited by a circle of unit radius centred at the origin. By repeating the same process for the four features, we will draw 4 arrows in the plane. As a further leap, instead of a point, we can plot an arrow, starting from the origin and ending on that point. In this way, each features can be described by a point in this plane, the point being defined by the two correlation coefficients. We then assume that these two values are two coordinates in the plane spanned by PC1 and PC2. For each feature, we calculate two correlation coefficients, with the first and the second principal component respectively. Once the correlation coefficient has been calculated, the correlation circle is simply a handy way to visualise this correlation on a graph. By correlation we mean the Pearson correlation coefficient, which has a value between 1 (perfect correlation) and -1 (perfect anti-correlation). The idea behind the correlation circle is to calculate the correlation of each of the 4 original features with the two principal components. After PCA decomposition, since we decided to keep the first two principal components only, we are left with 150 samples and 2 features. The original iris dataset contains 150 samples (50 sample per species) and 4 features. As always, let’s import all the required libraries (some of which will be needed later) We are interested in both understanding its meaning and work out a strategy to calculate it. The dataset is nowadays mostly of historical significance, but it’s widely used as a toy dataset to test or display classification algorithms.įor this reason, the iris dataset is an excellent starting point to discuss the correlation circle for PCA. The three species are setosa, virginica and versicolor, and the four features are sepal length, sepal width, petal length and petal width. The iris flower dataset consists of data about four features of three species of iris flowers. PCA correlation circle of the iris dataset If you’d like to support the blog, check our our Patreon page. We will start with a toy example using the iris dataset and we’ll move on to an example using PCA decomposition of a series of NIR spectra.īefore starting, I’d like to thank all our Patreon supporters for the generous contributions, which supports the writing of these posts and make sure the blog stays free for everyone. In this tutorial, we are going to learn how to plot a correlation circle and, most importantly, how we can make good use of it in exploratory analysis. The correlation circle is a visualisation displaying how much the original variables are correlated with the first two principal components. independent of one another) and are sorted in order of explained variance. Principal Components Analysis (in short, PCA, see here and here) is a linear decomposition methods that transforms a set of variables (for instance spectra) into an equivalent set of transformed variables called principal components (PCs).
0 Comments
Leave a Reply. |