**Start revising A-level & GCSE with 7 million other students**

# Introduction and Sampling

## You are here

## Introduction and Sampling

**Statistics is the collection, organisation, presentation and analysis of numerical data.**

Before we crack on with the meat of statistics we need to make sure we fully understand the notation used by statisticians (don't yawn just yet!)

**There are 2 types of data:**

- Qualitative: where the data is not numerical.
- Quantitative: where the data is numerical.

**Quantitative data is the most useful set of data to us and can be in 2 forms:**

- Discrete data.
- Continuous data.

**Discrete data** - can only take certain values.

**For example:**

The number of students making an intelligent comment during an A-level maths lesson. You could get 4 or 5 say, but it isn't possible to get 13.546!

** Continuous data** - this set of data can take **any** value within a given range.

**For example:**

The lengths of worms found in a garden one morning.

**So, measuring will give us continuous data, whereas counting will give discrete data.**

Statisticians collect the data to investigate a characteristic of a **population**.

**A population is the set of all the possible items to be observed.**

**For example:**

Whilst investigating the height of males in Wales, the population would be the height of **all** the males in Wales.

**Or**, whilst carrying out a survey of the lifetimes of lightbulbs made by one manufacturer, the population would be the lifetimes of **all** the lightbulbs made by that manufacturer.

I'm hoping that you're thinking it would be pretty dumb to try and collect the data given in the examples above. It wouldn't be very practical to measure the heights of **everyone** in a country! And if you tested the lifetime of each of your lightbulbs you'd have none left to sell. **Hence we take samples**.

A **sample** must represent the **whole population**. After taking a sample it is assumed that the result for the sample reflects the whole population. For example if 20% of a sample of 1000 people say they vote for 'The Conservative Party' then it is assumed that 20% of the whole population will vote in that way.

As you can imagine, this is subject to some error! You must always be aware of **bias** within your sample and must look carefully at how your sample is obtained. This is best done using a little common sense (heaven help us!).

**Random sampling:** this method gives every item of the population an equal chance of selection. This can be done in various ways for example by simply picking out of a hat or by using a random number generator on a calculator.

**Stratified sampling:** some populations are naturally split into a number of strata (kind of like sub groups). We can separate the strata and find what proportion of the population is in each stratum. We can then select a random sample from each stratum proportional to its size.

**For example:**

If doing a survey to calculate the average wage earned in the manufacturing industry, you would split your population into shopfloor workers, managers and directors. In your sample you would include a greater number of shopfloor workers, as there are significantly more of these.

**OK that's the dull bit over with! You are now allowed a little yawn!**