In Machine Learning, we frequently express the similarity between two vectors as cosine similarity. What exactly does that mean? What has ‘cosine’ got to do with similarity?

Well, there is Math in ML and cosine similarity between vectors has everything to do with Math. Let us refresh what is a vector, and then we come to what is the cosine between two vectors.

**What is a Vector?**

A vector is a column of numbers enclosed in brackets, like this.

As a special case, a two dimensional vector is a column of two numbers enclosed in brackets, like this.

When you use a variable to represent a vector, you typically differentiate that variable from the variable used to represent a scalar (a normal number) by using a boldface letter while using print or electronic media (for example, **u**) and using an arrow above the letter while using handwriting.

A two dimensional vector is represented by an arrow in the two dimensional Cartesian space, like this.

Two vectors in Cartesian space are considered to be equal if

- Both have the same length
- Both are in the same direction and are parallel to each other

Hence, we can have another arrow in the same Cartesian space equal to **u** if it is of the same length and in the same direction, and parallel to **u**. It could start at a point other than O.

Similarly, a three dimensional vector is represented by an arrow in the three dimensional Cartesian space, like this.

Like two dimensions, here also any other vector that is of the same length as **u**, parallel to** u **and has the same direction would be considered equal to **u** even if it starts at a point other than O.

For any positive integer n greater than or equal to 1, an n-dimensional vector is a column of n numbers enclosed in brackets.

An n-dimensional vector still exists, though we are unable to represent it using an arrow in the Cartesian space for n greater than 3.

**Vector Addition**

The addition of two vectors is defined as

You can see the triangle law of vector addition in the above diagram. You can plot them in two or three dimensions and convince yourself that the x, y (and z, if present) components of the vectors indeed add up this way.

**Vector Subtraction**

Vector subtraction is defined very similar to vector addition.

In two and three dimensions, the subtraction of two vectors follows from the way we do addition.

**Length of a Vector**

Let us come back to a two dimensional vector. We said a two dimensional vector can be represented by an arrow. What will be the length of that arrow?

By Pythagoras theorem, length of **u**, which is denoted by ||**u**||, will be

What about a three dimensional vector?

Let us return to the n dimensional vector **u**.

We have already seen that for n = 2 and for n = 3, the above equation actually represents the length of the arrow represented by **u**. But for n > 3, the length of the vector is just a definition.

**Cosine of the Angle between Vectors**

Let us start with the simple case of two dimensional vectors, which can be represented by arrows. Because they can be represented by arrows, the angle between them can be visualized and we can find out the cosine of that angle by using geometry. We will be using the cosine rule of trigonometry. The Wikipedia article I have linked to in the previous sentence contains the mathematical proof of the rule and the diagram below is also from the same article.

The image below applies the same cosine rule to two dimensional vectors **u **and **v**.

The quantity u_{1}v_{1 }+ u_{2}v_{2 }is called the dot product of the two vectors **u **and **v**.

**u.v** = u_{1}v_{1 }+ u_{2}v_{2}

So the equation from the diagram becomes,

**u.v = ||u**||||**v**|| cos θ

What if **u **and **v **are three dimensional vectors? Cosine rule applied to vector difference in the diagram above will still hold true.

||**v – u**||^{2 }= ||**v**||^{2 }+ ||**u**||^{2 }– 2||**v**|| ||**u**|| cos θ

Let us start with that premise.

For the three dimensional vectors **u **and **v**, the quantity u_{1}v_{1} + u_{2}v_{2 }+ u_{3}v_{3} is called the dot product of the two vectors **u **and **v**.

**u.v** = u_{1}v_{1} + u_{2}v_{2 }+ u_{3}v_{3}

Hence the equation from the diagram becomes

**u.v = ||u**||||**v**|| cos θ

Which is the same as what it was for two dimensions.

So both for two dimensions and for three dimensions,

For the n dimensional vectors **u **and **v**, the dot product of the two vectors is defined as,

**u.v = **u_{1}v_{1 }+ u_{2}v_{2 }+ … + u_{n}v_{n}

For n dimensional vectors, we cannot draw arrows representing those vectors and so we cannot visualize the angle between them. Then we simply define the cosine of the angle between the two vectors to be just like what we have in two and three dimensions.

The definition will be valid, if, as per trigonometry law, the cosine is less than or equal to 1, and greater than or equal to -1. That is actually the case, is proven by Cauchy-Schwarz inequality. The article I have linked to also provides proof of the inequality.

Cosine similarity of two vectors is just the cosine of the angle between two vectors. Two vectors with the same orientation have the cosine similarity of 1 (cos 0 = 1). Two vectors with opposite orientation have cosine similarity of -1 (cos π = -1) whereas two vectors which are perpendicular have an orientation of zero (cos π/2 = 0). So the value of cosine similarity ranges between -1 and 1.

It is also important to remember that cosine similarity expresses just the similarity in orientation, not magnitude. So if **u **and **v **are parallel to each other but **u **is twice of **v** in magnitude, cosine similarity will still be 1.

**Back to ML**

The Python scikit-learn library provides a function to calculate the cosine similarity. Let us use that library and calculate the cosine similarity between two vectors.

The scikit-learn method takes two matrices instead of two vectors as parameters and calculates the cosine similarity between every possible pair of vectors between the two parameters and returns the result as a matrix. Since we have only two vectors and the result is a scalar, we used square brackets to pass parameters as matrices and get the return value as a scalar.

Just to reiterate, if two vectors are parallel (in the example below, the second one is just twice the first one), the cosine similarity is 1 except for the rounding off error.

To summarize, we looked at what are vectors, how to add and subtract vectors, what is the length of a vector, and all of this resulted in understanding how to calculate the angle between two vectors by defining the dot product between two vectors. We cross-checked that the programming results for the cosine similarity match what we calculated using Maths.