Percentiles
Percentiles
Definition: Percentile
The \(p^{\text{th}}\) percentile is the value, \(v\), that divides a data set into two parts, such that \(p\) percent of the values in the data set are less than \(v\) and \(100 - p\) percent of the values are greater than \(v\). Percentiles can lie in the range \(0\le p\le 100\).
To understand percentiles properly, we need to distinguish between \(\text{3}\) different aspects of a datum: its value, its rank and its percentile:
-
The value of a datum is what we measured and recorded during an experiment or survey.
-
The rank of a datum is its position in the sorted data set (for example, first, second, third, and so on).
-
The percentile at which a particular datum is, tells us what percentage of the values in the full data set are less than this datum.
The table below summarises the value, rank and percentile of the data set:
\[\{\text{14.2}; \text{13.9}; \text{19.8}; \text{10.3}; \text{13.0}; \text{11.1}\}\]
|
Value |
Rank |
Percentile |
|
\(\text{10.3}\) |
\(\text{1}\) |
\(\text{0}\) |
|
\(\text{11.1}\) |
\(\text{2}\) |
\(\text{20}\) |
|
\(\text{13.0}\) |
\(\text{3}\) |
\(\text{40}\) |
|
\(\text{13.9}\) |
\(\text{4}\) |
\(\text{60}\) |
|
\(\text{14.2}\) |
\(\text{5}\) |
\(\text{80}\) |
|
\(\text{19.8}\) |
\(\text{6}\) |
\(\text{100}\) |
As an example, \(\text{13.0}\) is at the \(40^{\text{th}}\) percentile since there are \(\text{2}\) values less than \(\text{13.0}\) and \(\text{3}\) values greater than \(\text{13.0}\).
\[\cfrac{2}{2+3} = \text{0.4} = \text{40}\%\]
In general, the formula for finding the \(p^{\text{th}}\) percentile in an ordered data set with \(n\) values is
\[r = \cfrac{p}{100}(n - 1) + 1\]
This gives us the rank, \(r\), of the \(p^{\text{th}}\) percentile. To find the value of the \(p^{\text{th}}\) percentile, we have to count from the first value in the ordered data set up to the \(r^{\text{th}}\) value.
Sometimes the rank will not be an integer. This means that the percentile lies between two values in the data set. The convention is to take the value halfway between the two values indicated by the rank.
The figure below shows the relationship between rank and percentile graphically. We have already encountered three percentiles in this tutorial: the median (\(50^{\text{th}}\) percentile), the minimum (\(0^{\text{th}}\) percentile) and the maximum (\(100^{\text{th}}\)). The median is defined as the value halfway in a sorted data set.
Example
Question
Determine the minimum, maximum and median values of the following data set using the percentile formula.
\[\{14; 17; 45; 20; 19; 36; 7; 30; 8\}\]Sort the values in the data set
Before we can use the rank to find values in the data set, we always have to order the values from the smallest to the largest. The sorted data set is
\[\{7; 8; 14; 17; 19; 20; 30; 36; 45\}\]Find the minimum
We already know that the minimum value is the first value in the ordered data set. We will now confirm that the percentile formula gives the same answer. The minimum is equivalent to the \(0^{\text{th}}\) percentile. According to the percentile formula the rank, \(r\), of the \(p = 0^{\text{th}}\) percentile in a data set with \(n = 9\) values is:
\begin{align*} r & = \cfrac{p}{100}(n - 1) + 1 \\ & = \cfrac{0}{100}(9 - 1) + 1 \\ & = 1 \end{align*}This confirms that the minimum value is the first value in the list, namely \(\text{7}\).
Find the maximum
We already know that the maximum value is the last value in the ordered data set. The maximum is also equivalent to the \(100^{\text{th}}\) percentile. Using the percentile formula with \(p = 100\) and \(n = 9\), we find the rank of the maximum value is:
\begin{align*} r& = \cfrac{p}{100}(n - 1) + 1 \\ & = \cfrac{100}{100}(9 - 1) + 1 \\ & = 9 \end{align*}This confirms that the maximum value is the last (the ninth) value in the list, namely \(\text{45}\).
Find the median
The median is equivalent to the \(50^{\text{th}}\) percentile. Using the percentile formula with \(p = 50\) and \(n = 9\), we find the rank of the median value is:
\begin{align*} r & =\cfrac{50}{100}(n - 1) + 1 \\ & = \cfrac{50}{100}(9 - 1) + 1 \\ & = \cfrac{1}{2}(8) + 1 \\ & = 5 \end{align*}This shows that the median is in the middle (at the fifth position) of the ordered data set. Therefore the median value is \(\text{19}\).
Definition: Quartiles
The quartiles are the three data values that divide an ordered data set into four groups, where each group contains an equal number of data values. The median (\(50^{\text{th}}\) percentile) is the second quartile (\(Q2\)). The \(25^{\text{th}}\) percentile is also called the first or lower quartile (\(Q1\)). The \(75^{\text{th}}\) percentile is also called the third or upper quartile (\(Q3\)).
Example
Question
Determine the quartiles of the following data set:
\[\{7; 45; 11; 3; 9; 35; 31; 7; 16; 40; 12; 6\}\]Sort the data set
\[\{3; 6; 7; 7; 9; 11; 12; 16; 31; 35; 40; 45\}\]Find the ranks of the quartiles
Using the percentile formula with \(n = 12\), we can find the rank of the \(25^{\text{th}}\), \(50^{\text{th}}\) and \(75^{\text{th}}\) percentiles:
\begin{align*} {r}_{25} & = \cfrac{25}{100}(12 - 1) + 1 \\ & = \text{3.75} \\ {r}_{50} & = \cfrac{50}{100}(12 - 1) + 1 \\ & = \text{6.5} \\ {r}_{75} & = \cfrac{75}{100}(12 - 1) + 1 \\ & = \text{9.25} \end{align*}Find the values of the quartiles
Note that each of these ranks is a fraction, meaning that the value for each percentile is somewhere in between two values from the data set.
For the \(25^{\text{th}}\) percentile the rank is \(\text{3.75}\), which is between the third and fourth values. Since both these values are equal to \(\text{7}\), the \(25^{\text{th}}\) percentile is \(\text{7}\).
For the \(50^{\text{th}}\) percentile (the median) the rank is \(\text{6.5}\), meaning halfway between the sixth and seventh values. The sixth value is \(\text{11}\) and the seventh value is \(\text{12}\), which means that the median is \(\cfrac{11 + 12}{2} = \text{11.5}\). For the \(75^{\text{th}}\) percentile the rank is \(\text{9.25}\), meaning between the ninth and tenth values. Therefore the \(75^{\text{th}}\) percentile is \(\cfrac{31 + 35}{2} = 33\).
Deciles
The deciles are the nine data values that divide an ordered data set into ten groups, where each group contains an equal number of data values.
For example, consider the ordered data set:
\begin{align*} 28; 33; 35; 45; 57; 59; 61; 68; 69; 72; 75; 78; 80; 83; 86; 91; \\ 92; 95; 101; 105; 111; 117; 118; 125; 127; 131; 137; 139; 141 \end{align*}
The nine deciles are: \(35; 59; 69; 78; 86; 95; 111; 125; 137\).
This lesson is part of:
Statistics and Probability