Introduction to Data Analysis
Course At a Glance
Category
Data Science
Level
Beginner
Age Group
14โ17 years
Prerequisite
Basic Python Knowledge
Duration
30 Hours
Modules
4 Modules
Program Outcomes
By the end of this course, students will be able to:
- 1
Understand basic concepts of data and how to analyse datasets using Python.
- 2
Perform simple data manipulation and analysis using Python lists, dictionaries, NumPy, and pandas.
- 3
Visualise data using charts to identify patterns and present findings clearly.
Introduction to Data & Python Review
Students learn what data is and how it is used in the real world. A structured Python review (lists, dicts, functions, CSV files) builds readiness for data analysis. NumPy is introduced as the numerical foundation for data work.
| # | Lesson Title | What Students Learn | Activity / Project | Key Tools / Concepts |
|---|---|---|---|---|
| 1.1 | What is Data & Why Does It Matter? | Define data and explore how it is used. Distinguish between quantitative and qualitative data. Understand the data analysis pipeline: Collect โ Clean โ Analyse โ Visualise โ Interpret. | Discussion & Exploration: Find 3 real-world datasets online. Identify data types and questions they could answer. | Data types: int, float, str, bool, categorical |
| 1.2 | Python Review: Variables & Data Types | Refresh variables, strings, integers, floats, booleans, and type conversion. Use type() and f-strings. Perform basic calculations. | Warm-Up Exercises: Calculate total, average, highest, and lowest scores manually using Python arithmetic. Print formatted results. | int, float, str, type(), round(), f-strings |
| 1.3 | Python Review: Lists & Loops | Revisit lists. Create, index, slice, and iterate over lists with for loops. Apply built-in functions: sum(), len(), min(), max(), sorted(). | Build: 'Class Score Analyser' โ compute total, count, average, min, and max of a list using functions. | [], sum(), len(), min(), max(), sorted(), for loop |
| 1.4 | Python Review: Dictionaries & Functions | Revisit dictionaries (key: value pairs) for row representation. Write functions to organise code into reusable tools. | Build: 'Student Record Store' โ store records in a list of dicts. Write functions: get_average(), get_top_student(), and print_report(). | dict, list of dicts, def, return, .items() |
| 1.5 | Loading & Inspecting Data from CSV Files | Introduce CSV format. Use Python's csv module to load a dataset into a list of dictionaries and inspect the data. | Guided Exercise: Load 'students.csv'. Print row count, column names, first 5 rows, and data types. | import csv, csv.DictReader, list(), keys() |
| 1.6 | Data Quality: Missing & Inconsistent Data | Understand 'dirty data': missing values, formatting issues, duplicates, outliers. Learn basic cleaning strategies. | Clean-Up Lab: Write a Python script to read a messy CSV, fix formatting, skip missing rows, and remove duplicates. | .strip(), .lower(), if val != '', set() for duplicates |
| 1.7 | Introduction to NumPy | Install/import NumPy. Understand NumPy arrays. Use vectorised arithmetic and compute statistics like mean and standard deviation. | Exercises: Convert a list to a NumPy array. Compute mean, median, standard deviation, and weighted average. | import numpy as np, np.array(), np.mean(), np.std() |
| 1.8 | Module 1 Review & Mini Challenge | Consolidate Module 1 skills: Python basics, CSV loading, data cleaning, and NumPy basics. | Mini Challenge: Load a real CSV dataset, clean it, compute stats with NumPy, and print a summary. | Full Module 1 โ csv, numpy, cleaning, stats |
Working with Data in Python
Students learn to manipulate and analyse datasets using descriptive statistics, filtering, grouping, and pandas DataFrames. The module progresses from pure Python techniques to professional-grade pandas operations.
| # | Lesson Title | What Students Learn | Activity / Project | Key Tools / Concepts |
|---|---|---|---|---|
| 2.1 | Descriptive Statistics | Understand measures of centre (mean, median, mode) and spread (range, variance, std dev). Know when to use mean vs. median. | Analysis Task: Compute mean and median for two datasets (one with outliers). Explain differences. | np.mean(), np.median(), statistics.mode(), np.std(), np.var() |
| 2.2 | Filtering & Sorting Data | Filter a dataset using list comprehension and boolean indexing. Sort lists of dictionaries using sorted() with a lambda. | Build: 'Data Detective' โ filter students by score, sort descending, and answer questions. | [x for x in data if cond], sorted(key=lambda), np boolean indexing |
| 2.3 | Grouping & Aggregating Data | Group data by category and compute aggregate stats. Build a frequency counter and introduce split-apply-combine. | Build: 'Subject Analyser' โ group student records by subject to compute average, min, and max scores. | dict grouping, defaultdict, frequency counter, group aggregation |
| 2.4 | Introduction to Pandas | Understand pandas DataFrames. Load CSVs with pd.read_csv() and inspect with .head(), .info(), and .describe(). | Guided Exploration: Load a dataset, run inspection methods, filter rows, and select columns. | import pandas as pd, pd.read_csv(), .head(), .describe(), .loc[], .iloc[] |
| 2.5 | Pandas: Filtering, Sorting & Adding Columns | Apply boolean filtering (&, |), sort with .sort_values(), and add new computed columns. | Build: 'Grade Report Generator' โ filter passing students, add a grade band column, and save using .to_csv(). | df[df['col'] > val], &, |, .sort_values(), df['new'] = expr, .to_csv() |
| 2.6 | Pandas: Grouping & Aggregation | Use .groupby() to split a DataFrame and compute statistics with .agg(), .mean(), and .sum(). | Build: 'Department Summary Report' โ group an employee dataset by department and calculate averages. | .groupby(), .agg(), .reset_index(), .mean(), .count() |
| 2.7 | Handling Missing Data & Data Types | Detect missing values. Decide whether to drop (.dropna()) or fill (.fillna()). Convert data types and rename columns. | Clean-Up Project: Given a messy dataset, write a full cleaning pipeline (detect, handle missing, fix types, rename). | .isnull(), .dropna(), .fillna(), .astype(), .rename() |
| 2.8 | Exploratory Data Analysis (EDA) Mini Project | Apply all Module 2 skills to perform a complete Exploratory Data Analysis on a real-world dataset. | EDA Project: Load, inspect, clean, group, and compute statistics for a chosen dataset. Write a summary of 3 findings. | Full Module 2 โ pandas EDA workflow |
Data Visualisation Basics
Students learn to create, customise, and interpret data visualisations using matplotlib and seaborn. Chart types covered include line, bar, pie, histogram, scatter, and heatmap.
| # | Lesson Title | What Students Learn | Activity / Project | Key Tools / Concepts |
|---|---|---|---|---|
| 3.1 | Introduction to Matplotlib | Understand the anatomy of a matplotlib figure. Create, customise, and save basic line charts. | Build: 'Temperature Trend Chart' โ plot a week of daily temperatures as a line chart with labels and a grid. | import matplotlib.pyplot as plt, plt.plot(), plt.title(), plt.savefig() |
| 3.2 | Bar Charts | Create vertical and horizontal bar charts. Customise colours and add data labels using plt.text(). | Build: 'Subject Average Scores' โ plot vertical bar charts for scores and a horizontal chart for top countries. | plt.bar(), plt.barh(), plt.text(), color, edgecolor, width |
| 3.3 | Pie Charts & Donut Charts | Create pie and donut charts. Understand when to use part-to-whole charts and how to explode slices. | Build: 'Grade Distribution Pie Chart' โ visualise grade bands and explode a slice. Convert to a donut chart. | plt.pie(), labels, autopct, explode, wedgeprops (donut) |
| 3.4 | Histograms & Distributions | Create histograms with plt.hist(). Choose bins and understand skewness, spread, and outliers. | Build: 'Score Distribution' โ plot a histogram with 10 bins. Compare two classes using transparency. | plt.hist(), bins, alpha, edgecolor, density |
| 3.5 | Scatter Plots & Correlation | Create scatter plots to explore relationships between two numerical variables. Add a linear trend line. | Build: 'Study Hours vs Score' โ scatter plot study hours against test scores. Add a linear trend line. | plt.scatter(), s, c, alpha, np.polyfit(), np.poly1d() |
| 3.6 | Multiple Plots & Subplots | Create dashboard-style figures with plt.subplots(). Arrange charts in rows and columns with a shared title. | Build: 'Data Dashboard' โ a 2x2 subplot figure showing a bar chart, histogram, pie chart, and scatter plot. | plt.subplots(), fig, ax, plt.suptitle(), figsize, tight_layout() |
| 3.7 | Customisation & Styling | Apply matplotlib style sheets. Customise tick labels, add annotations, and set axis limits. | Style Sprint: Apply 'seaborn-v0_8' to the Data Dashboard. Add descriptive annotations and rotate labels. | plt.style.use(), plt.annotate(), plt.xticks(rotation=), cmap |
| 3.8 | Introduction to Seaborn | Install and import seaborn. Use seaborn for cleaner defaults and statistical charts like histplots and heatmaps. | Comparison Exercise: Recreate matplotlib charts using seaborn. Create a correlation heatmap. | import seaborn as sns, sns.barplot(), sns.histplot(), sns.heatmap(), hue= |
Mini Data Project
Students apply all skills to a self-chosen real-world dataset. The full data analysis pipeline โ loading, cleaning, exploring, visualising, and interpreting โ is completed and presented.
| # | Lesson Title | What Students Learn | Activity / Project | Key Tools / Concepts |
|---|---|---|---|---|
| 4.1 | Project Briefing & Dataset Selection | Choose a real-world dataset. Explore datasets briefly with .head() and .describe() to ensure viability. | Dataset Exploration: Load 4 provided datasets, explore, select one, and write 2 questions it could answer. | pd.read_csv(), .head(), .describe(), .info() |
| 4.2 | Project Planning & Questions | Write 3โ5 specific analysis questions. Plan pandas operations and chart types for each question. | Planning Deliverable: Complete a Project Plan Sheet specifying dataset, questions, analysis steps, and charts. | Planning โ analysis questions, chart type mapping |
| 4.3 | Data Loading & Cleaning | Perform a full data quality audit and apply a cleaning pipeline (handle NaNs, fix types, drop duplicates). | Build Sprint: Clean dataset. Print 'before and after' comparisons and save 'clean_data.csv'. | .isnull().sum(), .dropna(), .fillna(), .astype(), .drop_duplicates() |
| 4.4 | Exploratory Analysis | Answer analysis questions using pandas filtering, grouping, aggregation, and sorting. Identify interesting patterns. | Build Sprint: Run pandas code to answer all questions. Annotate each result with a one-sentence interpretation. | .groupby(), .agg(), .sort_values(), .value_counts(), .corr() |
| 4.5 | Data Visualisation | Create one chart per question using matplotlib/seaborn. Combine all charts into a multi-panel figure. | Build Sprint: Produce charts, apply a consistent style, and add highlight annotations. Save the dashboard. | plt.subplots(), plt.savefig(), sns charts, plt.style.use(), plt.annotate() |
| 4.6 | Findings & Written Interpretation | Write a structured analysis report interpreting the visualisations. Practise data storytelling. | Report Writing: Complete a findings template. Write 2โ3 sentences interpreting each chart clearly. | Interpretation, insight writing, data storytelling |
| 4.7 | Presentation Preparation | Structure a 5-minute presentation: dataset intro, questions, findings with charts, and dataset limitations. | Dress Rehearsal: Deliver a timed practice presentation. Teacher provides feedback on clarity and confidence. | Presentation structure, data storytelling |
| 4.8 | Final Presentation Day | Deliver the completed mini data project presentation, answering Q&A. | Final Presentation: 5-minute live presentation of charts and findings. Assessed on analysis depth, chart clarity, and interpretation. | Full course โ Data Analysis pipeline |
Teaching Notes & Tips
Pacing Guidance
Each module contains 8 lessons of approximately 50โ60 minutes, plus a shorter Module 4 (6 hrs). Lessons 1.7 (NumPy) and 2.4 (pandas intro) often need extra time. Module 4 runs as project sprints.
Differentiation
Advanced students can explore: pandas .pivot_table(), time-series analysis with pd.to_datetime(), interactive charts with plotly, or basic linear regression with scikit-learn. Students needing support should focus on core pandas before seaborn.
Assessment Criteria
Mini project assessed on: (1) Data Loading & Cleaning. (2) Analysis Depth. (3) Visualisation Quality. (4) Interpretation in plain English. (5) Presentation Confidence.
Tools & Environment
Recommended: Jupyter Notebook or JupyterLab (via Anaconda). Alternatively, VS Code with Jupyter extension. Required libraries: numpy, pandas, matplotlib, seaborn. Python 3.9+ recommended.
Suggested Datasets (Module 4)
World Happiness Report (Kaggle), Video Game Sales (Kaggle), Titanic Passengers, Premier League Results, COVID-19 Statistics, Student Performance Dataset.
Prior Knowledge Expected
Students must be confident with: Python variables, lists, for loops, dictionaries, writing functions, and reading files. Students who have completed Python Fundamentals are well prepared.
Introduction to Data Analysis ยท Beginner ยท Ages 14โ17 ยท ยฉ Course Curriculum
Enroll Your Child Now