Year 11 General Univariate Data Analysis

Problem-Solving Using The Statistical Investigation Process

20 practice questions 2 video lessons Theory + worked examples

Theory

The Statistical Investigation Process has five stages: Question → Collect → Display → Analyse → Conclude. A good question anticipates variability. Sampling should be representative (random or stratified, not convenience). Watch for bias, correlation vs causation, and overgeneralisation.

The Statistical Investigation Process is a structured way to answer real-world questions using data. It has five main stages.

Stage 1 — Pose a statistical question. A statistical question anticipates variability. "How tall are Year 11 students?" is statistical (heights vary). "How tall is the principal?" is not. Vague questions like "study habits" need to be refined into something measurable like "the average weekly study hours of Year 11 students".

Stage 2 — Collect data (sampling). Identify the population (everyone you want to know about) and the sample (the smaller group you actually measure). Common sampling methods:

Random sampling — every member of the population has an equal chance of being chosen.
Stratified sampling — divide the population into groups and randomly sample from each, in proportion to group size.
Convenience sampling — pick whoever's easiest. Quick, but biased.

Stage 3 — Display the data using whichever chart suits the data type.

Stage 4 — Analyse. Calculate appropriate summary statistics: mean/median/mode for centre; range/IQR/SD for spread. Symmetric data → mean + SD. Skewed or outlier data → median + IQR.

Stage 5 — Conclude. State findings in plain English, in the context of the original question. Include a measure of centre, spread, and any limitations.

The first diagram shows the five stages of the Statistical Investigation Process. The second compares the three main sampling methods and their levels of bias.

Five stages: Question → Collect → Display → Analyse → Conclude. Each builds on the previous.

Random and stratified sampling are low-bias. Convenience sampling is fast but unreliable.

This subtopic is about process and judgment — no calculation formulas. Use the references below.

The five stages

Stage	Goal
1. Question	Pose a question that anticipates variability
2. Collect	Sample the population fairly
3. Display	Choose the right chart for the data type
4. Analyse	Compute appropriate summary statistics
5. Conclude	Answer in plain English, in context, with limitations

Sampling methods

Method	How it works	Bias level
Random	Every member equally likely to be chosen	Low
Stratified	Random sample from each subgroup, proportional to size	Low
Convenience	Whoever is easiest to access	High

Display choices by data type

Data type	Display
Categorical	frequency table, bar chart
Discrete numerical (small)	dot plot
Mid-size numerical	stem-and-leaf plot
Large/continuous numerical	histogram, boxplot
Two groups	parallel boxplots, back-to-back stem-and-leaf

Common traps in interpretation. Bias makes a sample unrepresentative. Correlation ≠ causation — two variables can be associated without one causing the other. Statistical significance ≠ practical significance — tiny differences can be real but not meaningful. Don't overgeneralise from one group to a much larger population.

Working through a statistical investigation

Pose a precise statistical question. Replace vague phrasing with something measurable that anticipates variability.
Choose a sampling method. Random or stratified for unbiased data. Define the population and sample clearly.
Display the data using a chart suited to the type — bar chart, dot plot, stem-and-leaf, histogram, or parallel boxplots.
Analyse with appropriate summary statistics. Symmetric → mean + SD. Skewed or outliers → median + IQR.
Conclude in plain English, in the original units, and in context. State any limitations (small sample, possible bias, etc.).

Identifying bias

Ask: which parts of the population are over- or under-represented?
Common sources: convenience samples, voluntary response, sampling from only one location or time, leading survey questions.
Name the type of bias if you can (e.g. selection bias, response bias).

Spotting correlation vs causation

If a claim says one variable causes another, ask: could there be a third variable influencing both?
Observational data shows association, not causation. Only a controlled experiment can establish causation.

EXAMPLE 1 — STATISTICAL QUESTION

A principal wants to investigate "study habits". State a more precise statistical question.

SOLUTION

The phrase "study habits" is too vague to measure. A statistical question needs to anticipate variability and pin down what is being measured.

Answer: "What is the average weekly study hours of Year 11 students at this school?" This is measurable (hours per week), anticipates variability (different students will report different hours), and specifies the population (Year 11 students at this school).

statistical question

EXAMPLE 2 — POPULATION vs SAMPLE

A researcher surveys

200

workers at Roma Street station between

8

–

9

am Tuesday to study Brisbane commute times. State the population and the sample, and note any concern.

SOLUTION

Identify whom the researcher is trying to learn about (population) vs whom they actually measured (sample).

Population	$=$	all Brisbane workers
Sample	$=$	the $200$ workers surveyed

Concern: the sample is biased — it includes only train commuters at one station during peak hour. Car drivers, bus users, and off-peak commuters are excluded.

population \neq sample

EXAMPLE 3 — IDENTIFY THE BIAS

A gym surveys its own members about how much exercise Brisbane adults get. What's the bias?

SOLUTION

Gym members are not representative of all Brisbane adults — they're already a group that exercises more than average.

Answer: selection bias. The sample over-represents people who exercise. The gym's estimate of "Brisbane adults' exercise level" will be too high.

selection bias

EXAMPLE 4 — CORRELATION vs CAUSATION

"Tablet owners scored

8

marks higher on the maths exam." A teacher concludes: "Buying every student a tablet will raise scores." What's the flaw?

SOLUTION

The teacher confuses correlation (an observed association) with causation (one variable causing another). Tablet owners may also tend to be from families with more resources, more study support, or more general access to learning materials.

Answer: the flaw is treating correlation as causation. Owning a tablet is associated with higher scores but doesn't necessarily cause them. Only a controlled experiment (e.g. randomly giving half the students tablets) could test whether the tablet itself raises scores.

correlation \neq causation

Common pitfalls

Bias makes the sample unrepresentative. Surveying gym members about exercise is selection bias — gym-goers exercise more than the average person. Always check whether the sampling method excludes parts of the population.

Correlation ≠ causation. "Tablet owners score higher" does not mean tablets cause higher scores. There could be other factors (family income, study support) that influence both. Only an experiment can establish causation.

Statistical significance ≠ practical significance. A difference of

0.5

marks across thousands of students can be statistically significant but too small to matter in practice. Always ask whether the size of the effect is meaningful.

Don't overgeneralise. A sample from one private school in Sydney can't represent "all Australian teenagers". Limit your conclusion to the population you actually sampled from.

State limitations in your conclusion. Mention small sample size, possible bias, and any other factors that could affect reliability. Stating limitations is part of doing statistics well — not a sign of weakness.

Frequently asked questions

What is a statistical question?

A statistical question is one that anticipates variability — there will be different answers from different people or measurements. 'How tall are Year 11 students?' is statistical (heights vary). 'How tall is the principal?' is not — there's only one answer.

What is the difference between population and sample?

The population is everyone (or everything) you want to know about. The sample is the smaller group you actually measure. The sample needs to be representative of the population for conclusions to be reliable.

What is random sampling?

Random sampling means every member of the population has an equal chance of being chosen. It's the gold standard for unbiased data collection, but can be hard to do in practice.

What is stratified sampling?

Stratified sampling means dividing the population into groups (called strata, like year levels), then randomly sampling from each group in proportion to its size. It ensures every subgroup is represented.

What is selection bias?

Selection bias is when the sampling method makes the sample unrepresentative. For example, surveying gym members about exercise habits — they exercise more than average, so the estimate will be too high.

What does 'correlation is not causation' mean?

It means that even if two variables are associated, one does not necessarily cause the other. 'Tablet owners scored higher' might be true, but tablets did not necessarily cause the higher scores — there could be confounding factors like family income or study support.

Video Lessons

Practice Questions

20 questions available.

Practice Questions

← Previous subtopic

Comparing Data For A Numerical Variable Across Two Or More Groups

Next subtopic →

This is the last subtopic

Problem-Solving Using The Statistical Investigation Process

📖 Theory

The five stages

Sampling methods

Display choices by data type

Working through a statistical investigation

Identifying bias

Spotting correlation vs causation

Common pitfalls

Frequently asked questions

🎬 Video Lessons

✏️ Practice Questions

Theory

Video Lessons

Practice Questions