Data Visualization & Pandas
Course At a Glance
Category
Data Science
Level
Intermediate
Age Group
14โ17 years
Prerequisite
Introduction to Data Analysis
Duration
28 Hours
Modules
4 Modules
Program Outcomes
By the end of this course, students will be able to:
- 1
Use pandas and Python to efficiently load, manipulate, and clean structured datasets.
- 2
Apply advanced data visualisation techniques to explore trends and relationships in data.
- 3
Develop data-driven insights and present findings through clear and informative visualisations.
Introduction to Pandas & DataFrames
Students build a deep working knowledge of pandas โ the professional standard for data manipulation in Python. Topics progress from Series and DataFrames to advanced selection, groupby, pivot tables, and cross-tabulation.
| # | Lesson Title | What Students Learn | Activity / Project | Key Methods / Libraries |
|---|---|---|---|---|
| 1.1 | Pandas Series & DataFrames | Review core pandas data structures. Understand Series as 1D array and DataFrame as 2D table. Explore the index. | Build: Create a DataFrame of 10 countries and a Series of GDP values from dicts/lists. Inspect with shape/dtypes. | pd.Series(), pd.DataFrame(), .index, .dtype, .shape, .values |
| 1.2 | Loading & Inspecting Real Datasets | Load CSV, Excel, and JSON files into DataFrames. Set column names, skip headers, and parse dates. | Guided Exploration: Load datasets in different formats. Print summary statistics and identify columns needing cleaning. | pd.read_csv(), pd.read_excel(), pd.read_json(), parse_dates=, .info(), .describe() |
| 1.3 | Indexing & Selecting Data | Master selection using [], .loc[], and .iloc[]. Select single cells, slices, and multiple columns. | Exercises: Use .loc and .iloc to select specific rows and columns from a 50-row student dataset. | df['col'], df[['a','b']], .loc[row, col], .iloc[i, j] |
| 1.4 | Boolean Filtering & Compound Conditions | Filter rows using boolean conditions and compound operators (&, |, ~). Use .isin() and .between(). | Data Detective: Load a cities dataset and filter using multiple progressive conditions (e.g., European cities > 1M population). | df[mask], &, |, ~, .isin(), .between(), .str.startswith() |
| 1.5 | Sorting & Ranking | Sort with .sort_values(). Add ranks with .rank(). Use .nlargest() and .nsmallest() for quick top-N queries. | Build: 'Leaderboard Generator' โ sort a video game sales dataset and find the top 3 games per genre. | .sort_values(), ascending=, .rank(), .nlargest(), .nsmallest() |
| 1.6 | Adding, Renaming & Dropping Columns | Add computed columns using direct assignment and np.where(). Rename columns, drop unwanted ones, and reorder. | Build: Add a pass/fail column to a student dataset using np.where(). Rename and reorder columns. | df['new'] = expr, np.where(), .rename(), .drop(), .copy() |
| 1.7 | GroupBy & Aggregation | Split DataFrames by categorical columns. Apply single/multiple aggregations (.sum, .mean) using .agg(). | Build: 'Sales Summary Report' โ group retail data by category/region to compute total revenue and max order. | .groupby(), .agg({'col': ['mean','sum']}), named agg, .reset_index() |
| 1.8 | Pivot Tables & Cross-Tabulation | Create pivot tables to reshape data. Use pd.crosstab() for frequency cross-tabulations and add margins. | Analysis: Create a pivot table and crosstab on a Titanic dataset to explore survival rates by class and gender. | pd.pivot_table(), pd.crosstab(), values=, aggfunc=, margins= |
Data Cleaning & Preparation
Students master professional-grade data cleaning techniques: missing value imputation, deduplication, standardisation, type conversion, merging, and reshaping with melt and stack.
| # | Lesson Title | What Students Learn | Activity / Project | Key Methods / Libraries |
|---|---|---|---|---|
| 2.1 | Detecting & Handling Missing Values | Audit missing data per column. Visualise patterns with sns.heatmap(). Choose to drop or fill values. | Audit Lab: Generate a missing data summary for a housing dataset. Visualise and justify a strategy. | .isnull(), .isnull().sum(), .dropna(), .fillna(), sns.heatmap(df.isnull()) |
| 2.2 | Filling & Imputing Missing Values | Apply fill strategies (.fillna, .ffill, .bfill). Impute missing values based on group means using .transform(). | Build: 'Smart Imputer' โ fill missing age with median, and missing salary with group mean by job role. | .fillna(df['col'].mean()), .ffill(), .bfill(), .transform('mean') |
| 2.3 | Removing Duplicates & Inconsistent Data | Remove duplicate rows. Identify and fix inconsistent categorical values using .replace() and string methods. | Clean-Up Sprint: Fix duplicate entries, inconsistent gender labels, and trailing whitespace in a customer dataset. | .duplicated(), .drop_duplicates(), .replace(), .str.lower(), .str.strip() |
| 2.4 | Data Type Conversion & Parsing | Convert types with .astype(). Parse datetime strings with pd.to_datetime() and extract datetime components. | Build: Parse string prices to float, parse date strings to datetime, and extract month/day for an orders dataset. | .astype(), pd.to_datetime(), .dt.year, .dt.month_name(), errors='coerce' |
| 2.5 | String Operations on Text Data | Use the .str accessor for vectorised string operations (.split, .replace, .extract). | Text Mining Task: Extract review word counts, star ratings using regex, and count keyword mentions in product reviews. | .str.contains(), .str.split(), .str.extract(r'pattern'), .str.replace() |
| 2.6 | Merging & Joining DataFrames | Combine DataFrames using pd.merge() (inner, left, right, outer) and pd.concat() for stacking. | Build: 'Student Database Merge' โ merge student info, scores, and attendance on student_id. Compare join types. | pd.merge(how='inner/left/outer'), on=, pd.concat(axis=0/1) |
| 2.7 | Reshaping Data: Melt & Stack | Transform wide-format to long-format using pd.melt(). Use .stack() and .unstack() to pivot axes. | Reshape Challenge: Melt a wide-format student score dataset into long format, group by subject, and compute averages. | pd.melt(), id_vars=, value_vars=, .stack(), .unstack() |
| 2.8 | Full Cleaning Pipeline Project | Apply end-to-end cleaning to a raw dataset: audit, handle missing, fix types, standardise, and merge. | Capstone Clean: Clean a deliberately messy dataset. Produce a documented pipeline and a before/after quality report. | Full Module 2 โ pandas cleaning pipeline |
Advanced Data Visualisation
Students create publication-quality and interactive visualisations using pandas plotting, matplotlib, seaborn, and plotly. Includes grouped bars, KDE, pair plots, time series, and multi-panel dashboards.
| # | Lesson Title | What Students Learn | Activity / Project | Key Methods / Libraries |
|---|---|---|---|---|
| 3.1 | Pandas Built-in Plotting | Use pandas .plot() as a wrapper for matplotlib to quickly create line, bar, hist, and scatter plots. | Quick-Plot Session: Create 4 distinct charts from an economic dataset using only df.plot() with titles. | df.plot(kind='bar/line/hist/box'), df['col'].plot(), title=, xlabel= |
| 3.2 | Advanced Bar & Line Charts | Create grouped and stacked bar charts. Plot multi-line time series and add secondary y-axes. | Build: 'Grouped Comparison Chart' โ plot grouped/stacked bars for multi-year sales data. Plot a multi-line time series. | df.plot(kind='bar', stacked=True), ax.twinx(), plt.legend() |
| 3.3 | Distribution Plots: Histograms, KDE & Box Plots | Overlay KDE curves on histograms. Create box plots and violin plots to show medians, IQRs, and outliers. | Distribution Analysis: Plot overlapping histograms, KDE comparisons, box plots, and violin plots for exam scores. | sns.histplot(kde=True), sns.boxplot(), sns.violinplot(), hue=, alpha= |
| 3.4 | Scatter Plots, Pair Plots & Correlation | Create scatter plots with hue/size mapping. Add regression lines. Use pairplots and correlation heatmaps. | Correlation Exploration: Visualise the Iris dataset using scatter plots, sns.pairplot(), and a correlation heatmap. | sns.scatterplot(hue=, size=), sns.regplot(), sns.pairplot(), .corr(), sns.heatmap(annot=True) |
| 3.5 | Time Series Visualisation | Set datetime indices. Resample data by month/year and compute rolling window moving averages. | Build: 'Trend Chart' โ plot raw daily values, a 7-day rolling average, and resampled monthly totals with annotations. | .set_index(), .resample('M').sum(), .rolling(7).mean(), plt.annotate() |
| 3.6 | Categorical Plots & Count Charts | Create count plots, barplots with confidence intervals, and swarm plots for categorical frequency data. | Build: 'Survey Data Visualiser' โ create count plots, bar plots, and swarm plots on a categorical survey dataset. | sns.countplot(), sns.barplot(), sns.swarmplot(), order=, palette= |
| 3.7 | Multi-Panel Dashboards & Annotation | Design multi-panel figures using plt.subplots(). Add figure-level annotations and share axes cleanly. | Build: 'Analytical Dashboard' โ combine 6 different charts into a single styled dashboard figure with annotations. | plt.subplots(2,3), sharex=, tight_layout(), ax.annotate(), plt.suptitle() |
| 3.8 | Interactive Visualisation with Plotly | Create interactive bar, scatter, and choropleth (map) charts with hover and zoom functionality. | Build: Recreate charts using plotly.express and export as interactive HTML files, including a world choropleth map. | import plotly.express as px, px.scatter(), px.choropleth(), .write_html() |
Data Project / Mini Capstone
Students apply all skills to a self-chosen real-world dataset. The full pipeline โ loading, cleaning, EDA, visualisation, interpretation, and presentation โ is completed as a data story.
| # | Lesson Title | What Students Learn | Activity / Project | Key Methods / Libraries |
|---|---|---|---|---|
| 4.1 | Capstone Briefing & Dataset Selection | Choose a real-world dataset and define a specific analytical question for the capstone data story. | Dataset Audit: Browse 4 provided datasets, run .info(), and write a 1-paragraph project brief stating the central question. | pd.read_csv(), .info(), .describe(), project brief |
| 4.2 | Analysis Planning | Translate the central question into 4โ6 sub-questions mapped to specific pandas operations and chart types. | Planning Deliverable: Submit a detailed Analysis Plan mapping questions to pandas/seaborn methods. | Analysis planning, chart-type mapping, cleaning checklist |
| 4.3 | Data Loading & Full Cleaning Pipeline | Execute a complete cleaning pipeline (missing data, types, text standardisation, duplicates) on the dataset. | Build Sprint: Output a data quality report (before vs after) and save the clean dataset as a CSV. | .isnull(), .dropna(), .fillna(), .astype(), pd.to_datetime(), .drop_duplicates() |
| 4.4 | Exploratory Data Analysis | Answer sub-questions using pandas filtering, groupby, and pivoting. Identify interesting patterns. | Build Sprint: Write pandas code to answer each sub-question. Add a one-sentence interpretation for each output. | .groupby(), .agg(), .sort_values(), .corr(), pd.pivot_table() |
| 4.5 | Visualisation Build | Create one carefully designed chart per sub-question. Combine them into a polished dashboard figure. | Build Sprint: Produce all visualisations with correct labels and annotations, combining them into a final dashboard. | sns, plt.subplots(), plt.annotate(), plt.savefig(), px (plotly) |
| 4.6 | Data Story & Written Findings | Write a structured findings narrative (introduction, methodology, findings, conclusion, limitations). | Report Writing: Write the data story referencing specific statistics from the charts. Peer-review for clarity. | Data storytelling, insight writing, referencing statistics |
| 4.7 | Presentation Preparation & Rehearsal | Structure a 6-minute presentation: overview, cleaning, key findings, and conclusion. | Dress Rehearsal: Deliver a timed practice presentation. Refine chart narration based on teacher feedback. | Presentation structure, data storytelling, chart narration |
| 4.8 | Final Capstone Presentation Day | Deliver the completed data project presentation, narrating the charts and handling Q&A. | Final Presentation: 6-minute live data project presentation. Assessed on cleaning, depth, visuals, and insight. | Full course โ Pandas & Data Visualisation |
Teaching Notes & Tips
Pacing Guidance
Each module contains 8 lessons of approximately 50โ60 minutes, totalling ~28 hours. Module 2 (Cleaning) lessons 2.6 (merging) and 2.7 (reshaping) often need an extra 15โ20 minutes. Module 4 runs as project sprints.
Differentiation
Advanced students can explore: pandas method chaining, pd.qcut(), advanced plotly Dash for interactive dashboards, or scikit-learn for basic regression. Core focus should be placed on fundamental pandas cleaning operations.
Assessment Criteria
Capstone assessed on: (1) Cleaning Quality โ complete, documented pipeline. (2) Analysis Depth. (3) Visualisation Quality โ proper formatting, annotations. (4) Interpretation โ data-driven insights. (5) Presentation.
Tools & Environment
Recommended: JupyterLab (via Anaconda) for inline chart rendering and documentation. Required libraries: pandas, numpy, matplotlib, seaborn, plotly. Python 3.9+. Students should navigate Jupyter seamlessly.
Suggested Capstone Datasets
World Happiness Report (Kaggle), Netflix Titles (Kaggle), Airbnb Listings, Global COโ Emissions, Olympic Games History, Spotify Top Songs.
Prior Knowledge Expected
Students should be comfortable with: pandas basics (.read_csv, .head), matplotlib chart creation, Python functions, and loops. Completion of Introduction to Data Analysis is strongly recommended.
Data Visualization & Pandas ยท Intermediate ยท Ages 14โ17 ยท ยฉ Course Curriculum
Enroll Your Child Now