Correlation and the correlation matrix are two terms that you will often come across in statistics, probability theory or Machine Learning. The notations, formulae and numbers involved may seem overwhelming at first. But once you get the hang of it, you’ll get how important the correlation matrix is. It helps you get a deeper understanding of your data. So what is a correlation matrix? Read on to find out.
The Correlation Coefficient
To understand the correlation matrix, we need to first know more about the term ‘correlation’. Correlation is basically a measure of the dependence between two variables. In fact, more than dependence, it is a measure of how two variables move together and how strongly they are related. Does the increase in one variable also mean an increase in another?
Correlation is expressed in the form of a correlation coefficient. Let’s take an example. Below is a table that shows the weekly salary of employees of a software company and their job satisfaction levels. To find out if these two variables are related and to what degree, the correlation coefficient can be calculated between the two. Please note that this is only a part of the whole dataset.
To calculate the correlation coefficient, you can use built-in functions of common spreadsheet applications, like Excel. In this way, we found the correlation coefficient between variables x and y to be 0.83. This value will change according to the size of the dataset.
What does the Correlation Coefficient signify?
The correlation coefficient is measured on a scale from -1 to +1. A positive correlation coefficient means that there is a perfect positive relationship between the two variables, whereby they both move together in the same direction. An increase in one is accompanied by an increase in the other and a reduction in one is accompanied by a reduction in the other. A correlation coefficient of -1 represents a perfect negative correlation. This means when one increases, the other decreases and vice-versa. A value of 0, however, means that there is no correlation between the two and they are not related to each other at all. In our example, we got a positive number for the correlation coefficient, which confirms that an increase in salary is in fact related to an increase in job satisfaction.
While the sign of the correlation coefficient gives an idea of the direction of the relationship, the value of the coefficient tells us about the strength of the relationship, be it positive or negative. The closer the value is to +1 or -1, the more closely the two variables are related. So value above 0.8 and below -0.8 both mean that the variables are strongly related. In our case, we got a correlation coefficient of 0.83 which means there is quite a high correlation between the salary of an employee and job satisfaction.
The Correlation Matrix
The correlation coefficient is mainly used to deduce the relationship between two variables. However, in a real-world scenario, there are a number of variables that come into play. As such, making a correlation matrix is a great way to summarize all the data. In this way, you can pick the best features and use them for further processing your data.
Below is a correlation matrix to find out which factors have the most effect on employee job satisfaction. All the variables involved have been placed along with both the column header and the row header of the table. Correlation coefficients between each pair of variables have been calculated and placed at their intersections.
One glance at this table (matrix) tells a lot about the relationships between the variables involved. You will find a correlation of 1.0 along the diagonal of the matrix. This is because each variable is highly and positively correlated with itself. You can also see the relationship between age and promotions is -0.8. This means as employee ages, chances of him/her getting promoted decreases. Similarly, we find that higher salary and training leads to higher job satisfaction. On the other hand, as unmanaged stress and years in service increase, employee job satisfaction tends to decrease. We also see that the age of the employee and the frequency of feedback does not have any effect on job satisfaction.
As such, from the correlation matrix, it is clear now that the organization needs to work on increasing salaries and training employees more and that it needs to find ways to cut down on stress levels in the office environment. Moreover, employees who have been in the organization longer need to be kept interested and motivated by keeping them challenged.
Application of the Correlation Matrix in Finance
The correlation matrix has widespread use in Modern Portfolio Theory. The theory stresses that investors need to reduce the correlation between the returns from selected securities in their portfolio, in order to minimize risk. This can be done by measuring the correlation coefficients between the return from different assets and then carefully selecting those that are the least correlated. This means these assets will be least likely to lose value at the same time. It ensures lesser risk and volatility.