📈Course Curriculum

Data Visualization & Pandas

Data Science· Intermediate· Ages 14–17· 28 Hours

Course At a Glance

Program Outcomes

By the end of this course, students will be able to:

1
Use pandas and Python to efficiently load, manipulate, and clean structured datasets.
2
Apply advanced data visualisation techniques to explore trends and relationships in data.
3
Develop data-driven insights and present findings through clear and informative visualisations.

Module 1

Introduction to Pandas & DataFrames

Students build a deep working knowledge of pandas — the professional standard for data manipulation in Python. Topics progress from Series and DataFrames to advanced selection, groupby, pivot tables, and cross-tabulation.

Approx. 7 hrs

#	Lesson Title	What Students Learn	Activity / Project	Key Methods / Libraries
1.1	Pandas Series & DataFrames	Review core pandas data structures. Understand Series as 1D array and DataFrame as 2D table. Explore the index.	Build: Create a DataFrame of 10 countries and a Series of GDP values from dicts/lists. Inspect with shape/dtypes.	`pd.Series(), pd.DataFrame(), .index, .dtype, .shape, .values`
1.2	Loading & Inspecting Real Datasets	Load CSV, Excel, and JSON files into DataFrames. Set column names, skip headers, and parse dates.	Guided Exploration: Load datasets in different formats. Print summary statistics and identify columns needing cleaning.	`pd.read_csv(), pd.read_excel(), pd.read_json(), parse_dates=, .info(), .describe()`
1.3	Indexing & Selecting Data	Master selection using [], .loc[], and .iloc[]. Select single cells, slices, and multiple columns.	Exercises: Use .loc and .iloc to select specific rows and columns from a 50-row student dataset.	`df['col'], df[['a','b']], .loc[row, col], .iloc[i, j]`
1.4	Boolean Filtering & Compound Conditions	Filter rows using boolean conditions and compound operators (&, \|, ~). Use .isin() and .between().	Data Detective: Load a cities dataset and filter using multiple progressive conditions (e.g., European cities > 1M population).	`df[mask], &, \|, ~, .isin(), .between(), .str.startswith()`
1.5	Sorting & Ranking	Sort with .sort_values(). Add ranks with .rank(). Use .nlargest() and .nsmallest() for quick top-N queries.	Build: 'Leaderboard Generator' — sort a video game sales dataset and find the top 3 games per genre.	`.sort_values(), ascending=, .rank(), .nlargest(), .nsmallest()`
1.6	Adding, Renaming & Dropping Columns	Add computed columns using direct assignment and np.where(). Rename columns, drop unwanted ones, and reorder.	Build: Add a pass/fail column to a student dataset using np.where(). Rename and reorder columns.	`df['new'] = expr, np.where(), .rename(), .drop(), .copy()`
1.7	GroupBy & Aggregation	Split DataFrames by categorical columns. Apply single/multiple aggregations (.sum, .mean) using .agg().	Build: 'Sales Summary Report' — group retail data by category/region to compute total revenue and max order.	`.groupby(), .agg({'col': ['mean','sum']}), named agg, .reset_index()`
1.8	Pivot Tables & Cross-Tabulation	Create pivot tables to reshape data. Use pd.crosstab() for frequency cross-tabulations and add margins.	Analysis: Create a pivot table and crosstab on a Titanic dataset to explore survival rates by class and gender.	`pd.pivot_table(), pd.crosstab(), values=, aggfunc=, margins=`

Module 2

Data Cleaning & Preparation

Students master professional-grade data cleaning techniques: missing value imputation, deduplication, standardisation, type conversion, merging, and reshaping with melt and stack.

Approx. 7 hrs

#	Lesson Title	What Students Learn	Activity / Project	Key Methods / Libraries
2.1	Detecting & Handling Missing Values	Audit missing data per column. Visualise patterns with sns.heatmap(). Choose to drop or fill values.	Audit Lab: Generate a missing data summary for a housing dataset. Visualise and justify a strategy.	`.isnull(), .isnull().sum(), .dropna(), .fillna(), sns.heatmap(df.isnull())`
2.2	Filling & Imputing Missing Values	Apply fill strategies (.fillna, .ffill, .bfill). Impute missing values based on group means using .transform().	Build: 'Smart Imputer' — fill missing age with median, and missing salary with group mean by job role.	`.fillna(df['col'].mean()), .ffill(), .bfill(), .transform('mean')`
2.3	Removing Duplicates & Inconsistent Data	Remove duplicate rows. Identify and fix inconsistent categorical values using .replace() and string methods.	Clean-Up Sprint: Fix duplicate entries, inconsistent gender labels, and trailing whitespace in a customer dataset.	`.duplicated(), .drop_duplicates(), .replace(), .str.lower(), .str.strip()`
2.4	Data Type Conversion & Parsing	Convert types with .astype(). Parse datetime strings with pd.to_datetime() and extract datetime components.	Build: Parse string prices to float, parse date strings to datetime, and extract month/day for an orders dataset.	`.astype(), pd.to_datetime(), .dt.year, .dt.month_name(), errors='coerce'`
2.5	String Operations on Text Data	Use the .str accessor for vectorised string operations (.split, .replace, .extract).	Text Mining Task: Extract review word counts, star ratings using regex, and count keyword mentions in product reviews.	`.str.contains(), .str.split(), .str.extract(r'pattern'), .str.replace()`
2.6	Merging & Joining DataFrames	Combine DataFrames using pd.merge() (inner, left, right, outer) and pd.concat() for stacking.	Build: 'Student Database Merge' — merge student info, scores, and attendance on student_id. Compare join types.	`pd.merge(how='inner/left/outer'), on=, pd.concat(axis=0/1)`
2.7	Reshaping Data: Melt & Stack	Transform wide-format to long-format using pd.melt(). Use .stack() and .unstack() to pivot axes.	Reshape Challenge: Melt a wide-format student score dataset into long format, group by subject, and compute averages.	`pd.melt(), id_vars=, value_vars=, .stack(), .unstack()`
2.8	Full Cleaning Pipeline Project	Apply end-to-end cleaning to a raw dataset: audit, handle missing, fix types, standardise, and merge.	Capstone Clean: Clean a deliberately messy dataset. Produce a documented pipeline and a before/after quality report.	`Full Module 2 — pandas cleaning pipeline`

Module 3

Advanced Data Visualisation

Students create publication-quality and interactive visualisations using pandas plotting, matplotlib, seaborn, and plotly. Includes grouped bars, KDE, pair plots, time series, and multi-panel dashboards.

Approx. 7 hrs

#	Lesson Title	What Students Learn	Activity / Project	Key Methods / Libraries
3.1	Pandas Built-in Plotting	Use pandas .plot() as a wrapper for matplotlib to quickly create line, bar, hist, and scatter plots.	Quick-Plot Session: Create 4 distinct charts from an economic dataset using only df.plot() with titles.	`df.plot(kind='bar/line/hist/box'), df['col'].plot(), title=, xlabel=`
3.2	Advanced Bar & Line Charts	Create grouped and stacked bar charts. Plot multi-line time series and add secondary y-axes.	Build: 'Grouped Comparison Chart' — plot grouped/stacked bars for multi-year sales data. Plot a multi-line time series.	`df.plot(kind='bar', stacked=True), ax.twinx(), plt.legend()`
3.3	Distribution Plots: Histograms, KDE & Box Plots	Overlay KDE curves on histograms. Create box plots and violin plots to show medians, IQRs, and outliers.	Distribution Analysis: Plot overlapping histograms, KDE comparisons, box plots, and violin plots for exam scores.	`sns.histplot(kde=True), sns.boxplot(), sns.violinplot(), hue=, alpha=`
3.4	Scatter Plots, Pair Plots & Correlation	Create scatter plots with hue/size mapping. Add regression lines. Use pairplots and correlation heatmaps.	Correlation Exploration: Visualise the Iris dataset using scatter plots, sns.pairplot(), and a correlation heatmap.	`sns.scatterplot(hue=, size=), sns.regplot(), sns.pairplot(), .corr(), sns.heatmap(annot=True)`
3.5	Time Series Visualisation	Set datetime indices. Resample data by month/year and compute rolling window moving averages.	Build: 'Trend Chart' — plot raw daily values, a 7-day rolling average, and resampled monthly totals with annotations.	`.set_index(), .resample('M').sum(), .rolling(7).mean(), plt.annotate()`
3.6	Categorical Plots & Count Charts	Create count plots, barplots with confidence intervals, and swarm plots for categorical frequency data.	Build: 'Survey Data Visualiser' — create count plots, bar plots, and swarm plots on a categorical survey dataset.	`sns.countplot(), sns.barplot(), sns.swarmplot(), order=, palette=`
3.7	Multi-Panel Dashboards & Annotation	Design multi-panel figures using plt.subplots(). Add figure-level annotations and share axes cleanly.	Build: 'Analytical Dashboard' — combine 6 different charts into a single styled dashboard figure with annotations.	`plt.subplots(2,3), sharex=, tight_layout(), ax.annotate(), plt.suptitle()`
3.8	Interactive Visualisation with Plotly	Create interactive bar, scatter, and choropleth (map) charts with hover and zoom functionality.	Build: Recreate charts using plotly.express and export as interactive HTML files, including a world choropleth map.	`import plotly.express as px, px.scatter(), px.choropleth(), .write_html()`

Module 4

Data Project / Mini Capstone

Students apply all skills to a self-chosen real-world dataset. The full pipeline — loading, cleaning, EDA, visualisation, interpretation, and presentation — is completed as a data story.

Approx. 7 hrs

#	Lesson Title	What Students Learn	Activity / Project	Key Methods / Libraries
4.1	Capstone Briefing & Dataset Selection	Choose a real-world dataset and define a specific analytical question for the capstone data story.	Dataset Audit: Browse 4 provided datasets, run .info(), and write a 1-paragraph project brief stating the central question.	`pd.read_csv(), .info(), .describe(), project brief`
4.2	Analysis Planning	Translate the central question into 4–6 sub-questions mapped to specific pandas operations and chart types.	Planning Deliverable: Submit a detailed Analysis Plan mapping questions to pandas/seaborn methods.	`Analysis planning, chart-type mapping, cleaning checklist`
4.3	Data Loading & Full Cleaning Pipeline	Execute a complete cleaning pipeline (missing data, types, text standardisation, duplicates) on the dataset.	Build Sprint: Output a data quality report (before vs after) and save the clean dataset as a CSV.	`.isnull(), .dropna(), .fillna(), .astype(), pd.to_datetime(), .drop_duplicates()`
4.4	Exploratory Data Analysis	Answer sub-questions using pandas filtering, groupby, and pivoting. Identify interesting patterns.	Build Sprint: Write pandas code to answer each sub-question. Add a one-sentence interpretation for each output.	`.groupby(), .agg(), .sort_values(), .corr(), pd.pivot_table()`
4.5	Visualisation Build	Create one carefully designed chart per sub-question. Combine them into a polished dashboard figure.	Build Sprint: Produce all visualisations with correct labels and annotations, combining them into a final dashboard.	`sns, plt.subplots(), plt.annotate(), plt.savefig(), px (plotly)`
4.6	Data Story & Written Findings	Write a structured findings narrative (introduction, methodology, findings, conclusion, limitations).	Report Writing: Write the data story referencing specific statistics from the charts. Peer-review for clarity.	`Data storytelling, insight writing, referencing statistics`
4.7	Presentation Preparation & Rehearsal	Structure a 6-minute presentation: overview, cleaning, key findings, and conclusion.	Dress Rehearsal: Deliver a timed practice presentation. Refine chart narration based on teacher feedback.	`Presentation structure, data storytelling, chart narration`
4.8	Final Capstone Presentation Day	Deliver the completed data project presentation, narrating the charts and handling Q&A.	Final Presentation: 6-minute live data project presentation. Assessed on cleaning, depth, visuals, and insight.	`Full course — Pandas & Data Visualisation`

Teaching Notes & Tips

Pacing Guidance

Each module contains 8 lessons of approximately 50–60 minutes, totalling ~28 hours. Module 2 (Cleaning) lessons 2.6 (merging) and 2.7 (reshaping) often need an extra 15–20 minutes. Module 4 runs as project sprints.

Differentiation

Advanced students can explore: pandas method chaining, pd.qcut(), advanced plotly Dash for interactive dashboards, or scikit-learn for basic regression. Core focus should be placed on fundamental pandas cleaning operations.

Assessment Criteria

Capstone assessed on: (1) Cleaning Quality — complete, documented pipeline. (2) Analysis Depth. (3) Visualisation Quality — proper formatting, annotations. (4) Interpretation — data-driven insights. (5) Presentation.

Tools & Environment

Recommended: JupyterLab (via Anaconda) for inline chart rendering and documentation. Required libraries: pandas, numpy, matplotlib, seaborn, plotly. Python 3.9+. Students should navigate Jupyter seamlessly.

Suggested Capstone Datasets

World Happiness Report (Kaggle), Netflix Titles (Kaggle), Airbnb Listings, Global CO₂ Emissions, Olympic Games History, Spotify Top Songs.

Prior Knowledge Expected

Students should be comfortable with: pandas basics (.read_csv, .head), matplotlib chart creation, Python functions, and loops. Completion of Introduction to Data Analysis is strongly recommended.

Enroll Your Child Now