kids encyclopedia robot

Spearman's rank correlation coefficient facts for kids

Kids Encyclopedia Facts

Spearman's rank correlation coefficient is a special number used in mathematics and statistics. It helps us understand how closely two different sets of data are connected or "linked." This tool was created by a person named Charles Spearman, and it's often written as the Greek letter rho (\rho) or sometimes as r_s.

You can use Spearman's coefficient when your data can be put in order, like from highest to lowest, or best to worst. For example, if you have information about how much different computers cost and how fast they are, you could use r_s to see if there's a connection between price and speed.

How to Calculate Spearman's Coefficient

Calculating Spearman's rank correlation coefficient might look tricky at first, but it's just a few steps. We'll use the example of computer prices and speeds to show you how.

Step 1: Rank Your Data

The first thing you need to do is "rank" each piece of data. This means giving each item a number based on its position in a list, like 1st, 2nd, 3rd, and so on.

Let's take our computer example. You would rank the computers by price, giving the lowest price a rank of 1, the next lowest a 2, and so on. You do the same thing for the speed of the computers.

PC Price ($) Rank_1 (Price) Speed (GHz) Rank_2 (Speed)
A 200 1 1.80 2
B 275 2 1.60 1
C 300 3 2.20 4
D 350 4 2.10 3
E 600 5 4.00 5

Step 2: Find the Difference and Square It

Next, we find the "difference" between the two ranks for each computer. This difference is called d. After that, you multiply this difference by itself (which is called squaring it). The result is called d^2.

Rank_1 Rank_2 d (Difference) d^2 (Difference Squared)
1 2 -1 1
2 1 1 1
3 4 -1 1
4 3 1 1
5 5 0 0

Step 3: Count Your Data Points

Now, count how many pieces of data you have. In our example, we have 5 computers, so we have 5 pieces of data. This number is called n.

Step 4: Use the Formula

Finally, we put all the numbers we've found into the Spearman's rank correlation formula:  r_s=1-\cfrac {6\sum d^2}{n(n^2-1)}

Let's break down the formula:

  • \sum d^2 means you add up all the numbers in the d^2 column. The symbol \sum means "sum" or "total."

* From our table, \sum d^2 is Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): 1+1+1+1+0 , which equals 4. * The formula says to multiply this by 6, so Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): 6 \times 4 = 24 .

  • n(n^2-1) means you take your count of data points (n), square it (n^2), subtract 1, and then multiply that result by n.

* Since n is 5, n^2 is Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): 5 \times 5 = 25 . * Then, Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): 25-1 = 24 . * Finally, n(n^2-1) is Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): 5 \times 24 = 120 .

So, to find r_s, we do: Failed to parse (Missing <code>texvc</code> executable. Please see math/README to configure.): 1-\cfrac {24}{120} = 1 - 0.2 = 0.8 .

For our computer data, Spearman's rank correlation coefficient is 0.8.

What the Numbers Mean

Positive correltion lobf
This graph shows a positive correlation. The r_s value would be close to 1 or 0.9. The red line is a line of best fit.

The value of r_s will always be a number between -1 and 1. This range acts like a scale that tells us how strong and what type of link exists between the two sets of data:

  • 1 means a very strong positive correlation. This means as one set of data goes up, the other set also goes up.
  • -1 means a very strong negative correlation. This means as one set of data goes up, the other set goes down.
  • 0 means there is no link or correlation between the data sets.

For our computer example, r_s was 0.8. Since 0.8 is close to 1, it means there's a strong positive link between computer price and speed. This suggests that generally, more expensive computers tend to be faster. If the result had been -0.8, it would mean that as price goes up, speed tends to go down.

When Numbers Are the Same

Sometimes, when you're ranking data, you might have two or more numbers that are exactly the same. These are called "tied" ranks. When this happens, you take the mean (or average) of the ranks that would have been used for those tied numbers.

For example, imagine we are ranking scores from a spelling test:

Test score Original Rank Rank (with tied scores)
4 1 1
6 2 \tfrac {2+3+4}{3} = 3
6 3 \tfrac {2+3+4}{3} = 3
6 4 \tfrac {2+3+4}{3} = 3
8 5 \tfrac {5+6}{2} = 5.5
8 6 \tfrac {5+6}{2} = 5.5

You use these averaged ranks in the Spearman's formula just like you would with normal ranks.

Related Pages

Images for kids

See also

Kids robot.svg In Spanish: Coeficiente de correlación de Spearman para niños

kids search engine
Spearman's rank correlation coefficient Facts for Kids. Kiddle Encyclopedia.