Resources For Teachers For Tutors For Students & Parents Pricing
Year 11 General Univariate Data Analysis

Problem-Solving Using The Statistical Investigation Process

20 practice questions 2 video lessons Theory + worked examples

Theory

The Statistical Investigation Process has five stages: Question β†’ Collect β†’ Display β†’ Analyse β†’ Conclude. A good question anticipates variability. Sampling should be representative (random or stratified, not convenience). Watch for bias, correlation vs causation, and overgeneralisation.

The Statistical Investigation Process is a structured way to answer real-world questions using data. It has five main stages.

Stage 1 β€” Pose a statistical question. A statistical question anticipates variability. "How tall are Year 11 students?" is statistical (heights vary). "How tall is the principal?" is not. Vague questions like "study habits" need to be refined into something measurable like "the average weekly study hours of Year 11 students".

Stage 2 β€” Collect data (sampling). Identify the population (everyone you want to know about) and the sample (the smaller group you actually measure). Common sampling methods:

  • Random sampling β€” every member of the population has an equal chance of being chosen.
  • Stratified sampling β€” divide the population into groups and randomly sample from each, in proportion to group size.
  • Convenience sampling β€” pick whoever's easiest. Quick, but biased.

Stage 3 β€” Display the data using whichever chart suits the data type.

Stage 4 β€” Analyse. Calculate appropriate summary statistics: mean/median/mode for centre; range/IQR/SD for spread. Symmetric data β†’ mean + SD. Skewed or outlier data β†’ median + IQR.

Stage 5 β€” Conclude. State findings in plain English, in the context of the original question. Include a measure of centre, spread, and any limitations.

The first diagram shows the five stages of the Statistical Investigation Process. The second compares the three main sampling methods and their levels of bias.

The five stages of the Statistical Investigation Process A flow diagram with five connected boxes showing the stages: Question, Collect, Display, Analyse, and Conclude. Each box has a brief description below it. The Statistical Investigation Process Question stage 1 Collect stage 2 Display stage 3 Analyse stage 4 Conclude stage 5 What each stage means 1. Question pose one that anticipates variability 2. Collect sample fairly from the population 3. Display choose a chart suited to the data type 4. Analyse summary statistics for centre and spread 5. Conclude answer the question in plain English with context Follow the same five stages for any statistical investigation Always state limitations in your conclusion
Five stages: Question β†’ Collect β†’ Display β†’ Analyse β†’ Conclude. Each builds on the previous.
Three sampling methods comparison A side-by-side comparison of random sampling, stratified sampling, and convenience sampling, showing how each picks members from the population and the level of bias for each. Three sampling methods Random equal chance Low bias gold standard hard in practice Stratified groups then random Group A Group B Low bias every group represented in proportion to size Convenience whoever's easiest High bias misses parts of pop. fast but unreliable Green circles = chosen for the sample Random or stratified for unbiased results
Random and stratified sampling are low-bias. Convenience sampling is fast but unreliable.

This subtopic is about process and judgment β€” no calculation formulas. Use the references below.

The five stages

StageGoal
1. QuestionPose a question that anticipates variability
2. CollectSample the population fairly
3. DisplayChoose the right chart for the data type
4. AnalyseCompute appropriate summary statistics
5. ConcludeAnswer in plain English, in context, with limitations

Sampling methods

MethodHow it worksBias level
RandomEvery member equally likely to be chosenLow
StratifiedRandom sample from each subgroup, proportional to sizeLow
ConvenienceWhoever is easiest to accessHigh

Display choices by data type

Data typeDisplay
Categoricalfrequency table, bar chart
Discrete numerical (small)dot plot
Mid-size numericalstem-and-leaf plot
Large/continuous numericalhistogram, boxplot
Two groupsparallel boxplots, back-to-back stem-and-leaf
Common traps in interpretation. Bias makes a sample unrepresentative. Correlation β‰  causation β€” two variables can be associated without one causing the other. Statistical significance β‰  practical significance β€” tiny differences can be real but not meaningful. Don't overgeneralise from one group to a much larger population.

Working through a statistical investigation

  1. Pose a precise statistical question. Replace vague phrasing with something measurable that anticipates variability.
  2. Choose a sampling method. Random or stratified for unbiased data. Define the population and sample clearly.
  3. Display the data using a chart suited to the type β€” bar chart, dot plot, stem-and-leaf, histogram, or parallel boxplots.
  4. Analyse with appropriate summary statistics. Symmetric β†’ mean + SD. Skewed or outliers β†’ median + IQR.
  5. Conclude in plain English, in the original units, and in context. State any limitations (small sample, possible bias, etc.).

Identifying bias

  1. Ask: which parts of the population are over- or under-represented?
  2. Common sources: convenience samples, voluntary response, sampling from only one location or time, leading survey questions.
  3. Name the type of bias if you can (e.g. selection bias, response bias).

Spotting correlation vs causation

  1. If a claim says one variable causes another, ask: could there be a third variable influencing both?
  2. Observational data shows association, not causation. Only a controlled experiment can establish causation.
EXAMPLE 1 β€” STATISTICAL QUESTION
A principal wants to investigate "study habits". State a more precise statistical question.
SOLUTION

The phrase "study habits" is too vague to measure. A statistical question needs to anticipate variability and pin down what is being measured.

Answer: "What is the average weekly study hours of Year 11 students at this school?" This is measurable (hours per week), anticipates variability (different students will report different hours), and specifies the population (Year 11 students at this school).

statistical question
EXAMPLE 2 β€” POPULATION vs SAMPLE
A researcher surveys 200 workers at Roma Street station between 8–9am Tuesday to study Brisbane commute times. State the population and the sample, and note any concern.
SOLUTION

Identify whom the researcher is trying to learn about (population) vs whom they actually measured (sample).

Population=all Brisbane workers
Sample=the 200 workers surveyed

Concern: the sample is biased β€” it includes only train commuters at one station during peak hour. Car drivers, bus users, and off-peak commuters are excluded.

population≠sample
EXAMPLE 3 β€” IDENTIFY THE BIAS
A gym surveys its own members about how much exercise Brisbane adults get. What's the bias?
SOLUTION

Gym members are not representative of all Brisbane adults β€” they're already a group that exercises more than average.

Answer: selection bias. The sample over-represents people who exercise. The gym's estimate of "Brisbane adults' exercise level" will be too high.

selection bias
EXAMPLE 4 β€” CORRELATION vs CAUSATION
"Tablet owners scored 8 marks higher on the maths exam." A teacher concludes: "Buying every student a tablet will raise scores." What's the flaw?
SOLUTION

The teacher confuses correlation (an observed association) with causation (one variable causing another). Tablet owners may also tend to be from families with more resources, more study support, or more general access to learning materials.

Answer: the flaw is treating correlation as causation. Owning a tablet is associated with higher scores but doesn't necessarily cause them. Only a controlled experiment (e.g. randomly giving half the students tablets) could test whether the tablet itself raises scores.

correlation≠causation

Common pitfalls

Bias makes the sample unrepresentative. Surveying gym members about exercise is selection bias β€” gym-goers exercise more than the average person. Always check whether the sampling method excludes parts of the population.
Correlation β‰  causation. "Tablet owners score higher" does not mean tablets cause higher scores. There could be other factors (family income, study support) that influence both. Only an experiment can establish causation.
Statistical significance β‰  practical significance. A difference of 0.5 marks across thousands of students can be statistically significant but too small to matter in practice. Always ask whether the size of the effect is meaningful.
Don't overgeneralise. A sample from one private school in Sydney can't represent "all Australian teenagers". Limit your conclusion to the population you actually sampled from.
State limitations in your conclusion. Mention small sample size, possible bias, and any other factors that could affect reliability. Stating limitations is part of doing statistics well β€” not a sign of weakness.

Frequently asked questions

What is a statistical question?

A statistical question is one that anticipates variability β€” there will be different answers from different people or measurements. 'How tall are Year 11 students?' is statistical (heights vary). 'How tall is the principal?' is not β€” there's only one answer.

What is the difference between population and sample?

The population is everyone (or everything) you want to know about. The sample is the smaller group you actually measure. The sample needs to be representative of the population for conclusions to be reliable.

What is random sampling?

Random sampling means every member of the population has an equal chance of being chosen. It's the gold standard for unbiased data collection, but can be hard to do in practice.

What is stratified sampling?

Stratified sampling means dividing the population into groups (called strata, like year levels), then randomly sampling from each group in proportion to its size. It ensures every subgroup is represented.

What is selection bias?

Selection bias is when the sampling method makes the sample unrepresentative. For example, surveying gym members about exercise habits β€” they exercise more than average, so the estimate will be too high.

What does 'correlation is not causation' mean?

It means that even if two variables are associated, one does not necessarily cause the other. 'Tablet owners scored higher' might be true, but tablets did not necessarily cause the higher scores β€” there could be confounding factors like family income or study support.

Video Lessons

  • 6F Statistical Investigation (1 of 2) Watch
  • The 6 Steps of a Statistical Investigation Watch

Practice Questions

20 questions available.

Practice Questions