Identification of Outliers
Identification of Outliers
An outlier in a data set is a value that is far away from the rest of the values in the data set.In a box and whisker diagram, outliers are usually close to the whiskers of the diagram.This is because the centre of the diagram represents the data between the first and third quartiles, which is where \(\text{50}\%\) of the data lie, while the whiskers represent the extremes — the minimum and maximum — of the data.
Example
Question
Find the outliers in the following data set by drawing a box and whisker diagram and locating the data values on the diagram.
\(\text{0.5}\) ; \(\text{1}\) ; \(\text{1.1}\) ; \(\text{1.4}\) ; \(\text{2.4}\) ; \(\text{2.8}\) ; \(\text{3.5}\) ; \(\text{5.1}\) ; \(\text{5.2}\) ; \(\text{6}\) ; \(\text{6.5}\) ; \(\text{9.5}\)
Determine the five number summary
The minimum of the data set is \(\text{0.5}\).The maximum of the data set is \(\text{9.5}\).Since there are \(\text{12}\) values in the data set, the median lies between the sixth and seventh values, making it equal to \(\cfrac{\text{2.8}+\text{3.5}}{2} = \text{3.15}\).The first quartile lies between the third and fourth values, making it equal to \(\cfrac{\text{1.1}+\text{1.4}}{2} = \text{1.25}\).The third quartile lies between the ninth and tenth values, making it equal to \(\cfrac{\text{5.2}+\text{6}}{2} = \text{5.6}\).
Draw the box and whisker diagram
In the figure above, each value in the data set is shown with a black dot.
Find the outliers
From the diagram we can see that most of the values are between \(\text{1}\) and \(\text{6}\).The only value that is very far away from this range is the maximum at \(\text{9.5}\).Therefore \(\text{9.5}\) is the only outlier in the data set.
You should also be able to identify outliers in plots of two variables.A scatter plot is a graph that shows the relationship between two random variables.We call these data bivariate (literally meaning two variables) and we plot the data for two different variables on one set of axes.The following example shows what a typical scatter plot looks like.For Grade \(\text{11}\) you do not need to learn how to draw these \(\text{2}\)-dimensional scatter plots, but you should be able to identify outliers on them.As before, an outlier is a value that is far removed from the main distribution of data.
Example
Question
We have a data set that relates the heights and weights of a number of people.The height is the first variable and its value is plotted along the horizontal axis.The weight is the second variable and its value is plotted along the vertical axis.The data values are shown on the plot below.Identify any outliers on the scatter plot.
We inspect the plot visually and notice that there are two points that lie far away from the main data distribution.These two points are circled in the plot below.
This lesson is part of:
Statistics and Probability