Introduction and Sampling
Introduction and Sampling
Statistics is the collection, organisation, presentation and analysis of numerical data.
Before we crack on with the meat of statistics we need to make sure we fully understand the notation used by statisticians (don't yawn just yet!)
There are 2 types of data:
- Qualitative: where the data is not numerical.
- Quantitative: where the data is numerical.
Quantitative data is the most useful set of data to us and can be in 2 forms:
- Discrete data.
- Continuous data.
Discrete data - can only take certain values.
The number of students making an intelligent comment during an A-level maths lesson. You could get 4 or 5 say, but it isn't possible to get 13.546!
Continuous data - this set of data can take any value within a given range.
The lengths of worms found in a garden one morning.
So, measuring will give us continuous data, whereas counting will give discrete data.
Statisticians collect the data to investigate a characteristic of a population.
A population is the set of all the possible items to be observed.
Whilst investigating the height of males in Wales, the population would be the height of all the males in Wales.
Or, whilst carrying out a survey of the lifetimes of lightbulbs made by one manufacturer, the population would be the lifetimes of all the lightbulbs made by that manufacturer.
I'm hoping that you're thinking it would be pretty dumb to try and collect the data given in the examples above. It wouldn't be very practical to measure the heights of everyone in a country! And if you tested the lifetime of each of your lightbulbs you'd have none left to sell. Hence we take samples.
A sample must represent the whole population. After taking a sample it is assumed that the result for the sample reflects the whole population. For example if 20% of a sample of 1000 people say they vote for 'The Conservative Party' then it is assumed that 20% of the whole population will vote in that way.
As you can imagine, this is subject to some error! You must always be aware of bias within your sample and must look carefully at how your sample is obtained. This is best done using a little common sense (heaven help us!).
Random sampling: this method gives every item of the population an equal chance of selection. This can be done in various ways for example by simply picking out of a hat or by using a random number generator on a calculator.
Stratified sampling: some populations are naturally split into a number of strata (kind of like sub groups). We can separate the strata and find what proportion of the population is in each stratum. We can then select a random sample from each stratum proportional to its size.
If doing a survey to calculate the average wage earned in the manufacturing industry, you would split your population into shopfloor workers, managers and directors. In your sample you would include a greater number of shopfloor workers, as there are significantly more of these.
OK that's the dull bit over with! You are now allowed a little yawn!