โšกFREE Live Master Session: Scratch Game Development for Kids

    Register for Free โ†’
    Back to Programs
    ๐Ÿ“ˆCourse Curriculum

    Data Visualization & Pandas

    Data Scienceยท Intermediateยท Ages 14โ€“17ยท 28 Hours

    Course At a Glance

    Category

    Data Science

    Level

    Intermediate

    Age Group

    14โ€“17 years

    Prerequisite

    Introduction to Data Analysis

    Duration

    28 Hours

    Modules

    4 Modules

    Program Outcomes

    By the end of this course, students will be able to:

    • 1

      Use pandas and Python to efficiently load, manipulate, and clean structured datasets.

    • 2

      Apply advanced data visualisation techniques to explore trends and relationships in data.

    • 3

      Develop data-driven insights and present findings through clear and informative visualisations.

    Module 1

    Introduction to Pandas & DataFrames

    Students build a deep working knowledge of pandas โ€” the professional standard for data manipulation in Python. Topics progress from Series and DataFrames to advanced selection, groupby, pivot tables, and cross-tabulation.

    Approx. 7 hrs
    #Lesson TitleWhat Students LearnActivity / ProjectKey Methods / Libraries
    1.1Pandas Series & DataFramesReview core pandas data structures. Understand Series as 1D array and DataFrame as 2D table. Explore the index.Build: Create a DataFrame of 10 countries and a Series of GDP values from dicts/lists. Inspect with shape/dtypes.pd.Series(), pd.DataFrame(), .index, .dtype, .shape, .values
    1.2Loading & Inspecting Real DatasetsLoad CSV, Excel, and JSON files into DataFrames. Set column names, skip headers, and parse dates.Guided Exploration: Load datasets in different formats. Print summary statistics and identify columns needing cleaning.pd.read_csv(), pd.read_excel(), pd.read_json(), parse_dates=, .info(), .describe()
    1.3Indexing & Selecting DataMaster selection using [], .loc[], and .iloc[]. Select single cells, slices, and multiple columns.Exercises: Use .loc and .iloc to select specific rows and columns from a 50-row student dataset.df['col'], df[['a','b']], .loc[row, col], .iloc[i, j]
    1.4Boolean Filtering & Compound ConditionsFilter rows using boolean conditions and compound operators (&, |, ~). Use .isin() and .between().Data Detective: Load a cities dataset and filter using multiple progressive conditions (e.g., European cities > 1M population).df[mask], &, |, ~, .isin(), .between(), .str.startswith()
    1.5Sorting & RankingSort with .sort_values(). Add ranks with .rank(). Use .nlargest() and .nsmallest() for quick top-N queries.Build: 'Leaderboard Generator' โ€” sort a video game sales dataset and find the top 3 games per genre..sort_values(), ascending=, .rank(), .nlargest(), .nsmallest()
    1.6Adding, Renaming & Dropping ColumnsAdd computed columns using direct assignment and np.where(). Rename columns, drop unwanted ones, and reorder.Build: Add a pass/fail column to a student dataset using np.where(). Rename and reorder columns.df['new'] = expr, np.where(), .rename(), .drop(), .copy()
    1.7GroupBy & AggregationSplit DataFrames by categorical columns. Apply single/multiple aggregations (.sum, .mean) using .agg().Build: 'Sales Summary Report' โ€” group retail data by category/region to compute total revenue and max order..groupby(), .agg({'col': ['mean','sum']}), named agg, .reset_index()
    1.8Pivot Tables & Cross-TabulationCreate pivot tables to reshape data. Use pd.crosstab() for frequency cross-tabulations and add margins.Analysis: Create a pivot table and crosstab on a Titanic dataset to explore survival rates by class and gender.pd.pivot_table(), pd.crosstab(), values=, aggfunc=, margins=
    Module 2

    Data Cleaning & Preparation

    Students master professional-grade data cleaning techniques: missing value imputation, deduplication, standardisation, type conversion, merging, and reshaping with melt and stack.

    Approx. 7 hrs
    #Lesson TitleWhat Students LearnActivity / ProjectKey Methods / Libraries
    2.1Detecting & Handling Missing ValuesAudit missing data per column. Visualise patterns with sns.heatmap(). Choose to drop or fill values.Audit Lab: Generate a missing data summary for a housing dataset. Visualise and justify a strategy..isnull(), .isnull().sum(), .dropna(), .fillna(), sns.heatmap(df.isnull())
    2.2Filling & Imputing Missing ValuesApply fill strategies (.fillna, .ffill, .bfill). Impute missing values based on group means using .transform().Build: 'Smart Imputer' โ€” fill missing age with median, and missing salary with group mean by job role..fillna(df['col'].mean()), .ffill(), .bfill(), .transform('mean')
    2.3Removing Duplicates & Inconsistent DataRemove duplicate rows. Identify and fix inconsistent categorical values using .replace() and string methods.Clean-Up Sprint: Fix duplicate entries, inconsistent gender labels, and trailing whitespace in a customer dataset..duplicated(), .drop_duplicates(), .replace(), .str.lower(), .str.strip()
    2.4Data Type Conversion & ParsingConvert types with .astype(). Parse datetime strings with pd.to_datetime() and extract datetime components.Build: Parse string prices to float, parse date strings to datetime, and extract month/day for an orders dataset..astype(), pd.to_datetime(), .dt.year, .dt.month_name(), errors='coerce'
    2.5String Operations on Text DataUse the .str accessor for vectorised string operations (.split, .replace, .extract).Text Mining Task: Extract review word counts, star ratings using regex, and count keyword mentions in product reviews..str.contains(), .str.split(), .str.extract(r'pattern'), .str.replace()
    2.6Merging & Joining DataFramesCombine DataFrames using pd.merge() (inner, left, right, outer) and pd.concat() for stacking.Build: 'Student Database Merge' โ€” merge student info, scores, and attendance on student_id. Compare join types.pd.merge(how='inner/left/outer'), on=, pd.concat(axis=0/1)
    2.7Reshaping Data: Melt & StackTransform wide-format to long-format using pd.melt(). Use .stack() and .unstack() to pivot axes.Reshape Challenge: Melt a wide-format student score dataset into long format, group by subject, and compute averages.pd.melt(), id_vars=, value_vars=, .stack(), .unstack()
    2.8Full Cleaning Pipeline ProjectApply end-to-end cleaning to a raw dataset: audit, handle missing, fix types, standardise, and merge.Capstone Clean: Clean a deliberately messy dataset. Produce a documented pipeline and a before/after quality report.Full Module 2 โ€” pandas cleaning pipeline
    Module 3

    Advanced Data Visualisation

    Students create publication-quality and interactive visualisations using pandas plotting, matplotlib, seaborn, and plotly. Includes grouped bars, KDE, pair plots, time series, and multi-panel dashboards.

    Approx. 7 hrs
    #Lesson TitleWhat Students LearnActivity / ProjectKey Methods / Libraries
    3.1Pandas Built-in PlottingUse pandas .plot() as a wrapper for matplotlib to quickly create line, bar, hist, and scatter plots.Quick-Plot Session: Create 4 distinct charts from an economic dataset using only df.plot() with titles.df.plot(kind='bar/line/hist/box'), df['col'].plot(), title=, xlabel=
    3.2Advanced Bar & Line ChartsCreate grouped and stacked bar charts. Plot multi-line time series and add secondary y-axes.Build: 'Grouped Comparison Chart' โ€” plot grouped/stacked bars for multi-year sales data. Plot a multi-line time series.df.plot(kind='bar', stacked=True), ax.twinx(), plt.legend()
    3.3Distribution Plots: Histograms, KDE & Box PlotsOverlay KDE curves on histograms. Create box plots and violin plots to show medians, IQRs, and outliers.Distribution Analysis: Plot overlapping histograms, KDE comparisons, box plots, and violin plots for exam scores.sns.histplot(kde=True), sns.boxplot(), sns.violinplot(), hue=, alpha=
    3.4Scatter Plots, Pair Plots & CorrelationCreate scatter plots with hue/size mapping. Add regression lines. Use pairplots and correlation heatmaps.Correlation Exploration: Visualise the Iris dataset using scatter plots, sns.pairplot(), and a correlation heatmap.sns.scatterplot(hue=, size=), sns.regplot(), sns.pairplot(), .corr(), sns.heatmap(annot=True)
    3.5Time Series VisualisationSet datetime indices. Resample data by month/year and compute rolling window moving averages.Build: 'Trend Chart' โ€” plot raw daily values, a 7-day rolling average, and resampled monthly totals with annotations..set_index(), .resample('M').sum(), .rolling(7).mean(), plt.annotate()
    3.6Categorical Plots & Count ChartsCreate count plots, barplots with confidence intervals, and swarm plots for categorical frequency data.Build: 'Survey Data Visualiser' โ€” create count plots, bar plots, and swarm plots on a categorical survey dataset.sns.countplot(), sns.barplot(), sns.swarmplot(), order=, palette=
    3.7Multi-Panel Dashboards & AnnotationDesign multi-panel figures using plt.subplots(). Add figure-level annotations and share axes cleanly.Build: 'Analytical Dashboard' โ€” combine 6 different charts into a single styled dashboard figure with annotations.plt.subplots(2,3), sharex=, tight_layout(), ax.annotate(), plt.suptitle()
    3.8Interactive Visualisation with PlotlyCreate interactive bar, scatter, and choropleth (map) charts with hover and zoom functionality.Build: Recreate charts using plotly.express and export as interactive HTML files, including a world choropleth map.import plotly.express as px, px.scatter(), px.choropleth(), .write_html()
    Module 4

    Data Project / Mini Capstone

    Students apply all skills to a self-chosen real-world dataset. The full pipeline โ€” loading, cleaning, EDA, visualisation, interpretation, and presentation โ€” is completed as a data story.

    Approx. 7 hrs
    #Lesson TitleWhat Students LearnActivity / ProjectKey Methods / Libraries
    4.1Capstone Briefing & Dataset SelectionChoose a real-world dataset and define a specific analytical question for the capstone data story.Dataset Audit: Browse 4 provided datasets, run .info(), and write a 1-paragraph project brief stating the central question.pd.read_csv(), .info(), .describe(), project brief
    4.2Analysis PlanningTranslate the central question into 4โ€“6 sub-questions mapped to specific pandas operations and chart types.Planning Deliverable: Submit a detailed Analysis Plan mapping questions to pandas/seaborn methods.Analysis planning, chart-type mapping, cleaning checklist
    4.3Data Loading & Full Cleaning PipelineExecute a complete cleaning pipeline (missing data, types, text standardisation, duplicates) on the dataset.Build Sprint: Output a data quality report (before vs after) and save the clean dataset as a CSV..isnull(), .dropna(), .fillna(), .astype(), pd.to_datetime(), .drop_duplicates()
    4.4Exploratory Data AnalysisAnswer sub-questions using pandas filtering, groupby, and pivoting. Identify interesting patterns.Build Sprint: Write pandas code to answer each sub-question. Add a one-sentence interpretation for each output..groupby(), .agg(), .sort_values(), .corr(), pd.pivot_table()
    4.5Visualisation BuildCreate one carefully designed chart per sub-question. Combine them into a polished dashboard figure.Build Sprint: Produce all visualisations with correct labels and annotations, combining them into a final dashboard.sns, plt.subplots(), plt.annotate(), plt.savefig(), px (plotly)
    4.6Data Story & Written FindingsWrite a structured findings narrative (introduction, methodology, findings, conclusion, limitations).Report Writing: Write the data story referencing specific statistics from the charts. Peer-review for clarity.Data storytelling, insight writing, referencing statistics
    4.7Presentation Preparation & RehearsalStructure a 6-minute presentation: overview, cleaning, key findings, and conclusion.Dress Rehearsal: Deliver a timed practice presentation. Refine chart narration based on teacher feedback.Presentation structure, data storytelling, chart narration
    4.8Final Capstone Presentation DayDeliver the completed data project presentation, narrating the charts and handling Q&A.Final Presentation: 6-minute live data project presentation. Assessed on cleaning, depth, visuals, and insight.Full course โ€” Pandas & Data Visualisation

    Teaching Notes & Tips

    Pacing Guidance

    Each module contains 8 lessons of approximately 50โ€“60 minutes, totalling ~28 hours. Module 2 (Cleaning) lessons 2.6 (merging) and 2.7 (reshaping) often need an extra 15โ€“20 minutes. Module 4 runs as project sprints.

    Differentiation

    Advanced students can explore: pandas method chaining, pd.qcut(), advanced plotly Dash for interactive dashboards, or scikit-learn for basic regression. Core focus should be placed on fundamental pandas cleaning operations.

    Assessment Criteria

    Capstone assessed on: (1) Cleaning Quality โ€” complete, documented pipeline. (2) Analysis Depth. (3) Visualisation Quality โ€” proper formatting, annotations. (4) Interpretation โ€” data-driven insights. (5) Presentation.

    Tools & Environment

    Recommended: JupyterLab (via Anaconda) for inline chart rendering and documentation. Required libraries: pandas, numpy, matplotlib, seaborn, plotly. Python 3.9+. Students should navigate Jupyter seamlessly.

    Suggested Capstone Datasets

    World Happiness Report (Kaggle), Netflix Titles (Kaggle), Airbnb Listings, Global COโ‚‚ Emissions, Olympic Games History, Spotify Top Songs.

    Prior Knowledge Expected

    Students should be comfortable with: pandas basics (.read_csv, .head), matplotlib chart creation, Python functions, and loops. Completion of Introduction to Data Analysis is strongly recommended.

    Data Visualization & Pandas ยท Intermediate ยท Ages 14โ€“17 ยท ยฉ Course Curriculum

    Enroll Your Child Now