โšกFREE Live Master Session: Scratch Game Development for Kids

    Register for Free โ†’
    Back to Programs
    ๐Ÿ“ŠCourse Curriculum

    Introduction to Data Analysis

    Data Scienceยท Beginnerยท Ages 14โ€“17ยท 30 Hours

    Course At a Glance

    Category

    Data Science

    Level

    Beginner

    Age Group

    14โ€“17 years

    Prerequisite

    Basic Python Knowledge

    Duration

    30 Hours

    Modules

    4 Modules

    Program Outcomes

    By the end of this course, students will be able to:

    • 1

      Understand basic concepts of data and how to analyse datasets using Python.

    • 2

      Perform simple data manipulation and analysis using Python lists, dictionaries, NumPy, and pandas.

    • 3

      Visualise data using charts to identify patterns and present findings clearly.

    Module 1

    Introduction to Data & Python Review

    Students learn what data is and how it is used in the real world. A structured Python review (lists, dicts, functions, CSV files) builds readiness for data analysis. NumPy is introduced as the numerical foundation for data work.

    Approx. 8 hrs
    #Lesson TitleWhat Students LearnActivity / ProjectKey Tools / Concepts
    1.1What is Data & Why Does It Matter?Define data and explore how it is used. Distinguish between quantitative and qualitative data. Understand the data analysis pipeline: Collect โ†’ Clean โ†’ Analyse โ†’ Visualise โ†’ Interpret.Discussion & Exploration: Find 3 real-world datasets online. Identify data types and questions they could answer.Data types: int, float, str, bool, categorical
    1.2Python Review: Variables & Data TypesRefresh variables, strings, integers, floats, booleans, and type conversion. Use type() and f-strings. Perform basic calculations.Warm-Up Exercises: Calculate total, average, highest, and lowest scores manually using Python arithmetic. Print formatted results.int, float, str, type(), round(), f-strings
    1.3Python Review: Lists & LoopsRevisit lists. Create, index, slice, and iterate over lists with for loops. Apply built-in functions: sum(), len(), min(), max(), sorted().Build: 'Class Score Analyser' โ€” compute total, count, average, min, and max of a list using functions.[], sum(), len(), min(), max(), sorted(), for loop
    1.4Python Review: Dictionaries & FunctionsRevisit dictionaries (key: value pairs) for row representation. Write functions to organise code into reusable tools.Build: 'Student Record Store' โ€” store records in a list of dicts. Write functions: get_average(), get_top_student(), and print_report().dict, list of dicts, def, return, .items()
    1.5Loading & Inspecting Data from CSV FilesIntroduce CSV format. Use Python's csv module to load a dataset into a list of dictionaries and inspect the data.Guided Exercise: Load 'students.csv'. Print row count, column names, first 5 rows, and data types.import csv, csv.DictReader, list(), keys()
    1.6Data Quality: Missing & Inconsistent DataUnderstand 'dirty data': missing values, formatting issues, duplicates, outliers. Learn basic cleaning strategies.Clean-Up Lab: Write a Python script to read a messy CSV, fix formatting, skip missing rows, and remove duplicates..strip(), .lower(), if val != '', set() for duplicates
    1.7Introduction to NumPyInstall/import NumPy. Understand NumPy arrays. Use vectorised arithmetic and compute statistics like mean and standard deviation.Exercises: Convert a list to a NumPy array. Compute mean, median, standard deviation, and weighted average.import numpy as np, np.array(), np.mean(), np.std()
    1.8Module 1 Review & Mini ChallengeConsolidate Module 1 skills: Python basics, CSV loading, data cleaning, and NumPy basics.Mini Challenge: Load a real CSV dataset, clean it, compute stats with NumPy, and print a summary.Full Module 1 โ€” csv, numpy, cleaning, stats
    Module 2

    Working with Data in Python

    Students learn to manipulate and analyse datasets using descriptive statistics, filtering, grouping, and pandas DataFrames. The module progresses from pure Python techniques to professional-grade pandas operations.

    Approx. 8 hrs
    #Lesson TitleWhat Students LearnActivity / ProjectKey Tools / Concepts
    2.1Descriptive StatisticsUnderstand measures of centre (mean, median, mode) and spread (range, variance, std dev). Know when to use mean vs. median.Analysis Task: Compute mean and median for two datasets (one with outliers). Explain differences.np.mean(), np.median(), statistics.mode(), np.std(), np.var()
    2.2Filtering & Sorting DataFilter a dataset using list comprehension and boolean indexing. Sort lists of dictionaries using sorted() with a lambda.Build: 'Data Detective' โ€” filter students by score, sort descending, and answer questions.[x for x in data if cond], sorted(key=lambda), np boolean indexing
    2.3Grouping & Aggregating DataGroup data by category and compute aggregate stats. Build a frequency counter and introduce split-apply-combine.Build: 'Subject Analyser' โ€” group student records by subject to compute average, min, and max scores.dict grouping, defaultdict, frequency counter, group aggregation
    2.4Introduction to PandasUnderstand pandas DataFrames. Load CSVs with pd.read_csv() and inspect with .head(), .info(), and .describe().Guided Exploration: Load a dataset, run inspection methods, filter rows, and select columns.import pandas as pd, pd.read_csv(), .head(), .describe(), .loc[], .iloc[]
    2.5Pandas: Filtering, Sorting & Adding ColumnsApply boolean filtering (&, |), sort with .sort_values(), and add new computed columns.Build: 'Grade Report Generator' โ€” filter passing students, add a grade band column, and save using .to_csv().df[df['col'] > val], &, |, .sort_values(), df['new'] = expr, .to_csv()
    2.6Pandas: Grouping & AggregationUse .groupby() to split a DataFrame and compute statistics with .agg(), .mean(), and .sum().Build: 'Department Summary Report' โ€” group an employee dataset by department and calculate averages..groupby(), .agg(), .reset_index(), .mean(), .count()
    2.7Handling Missing Data & Data TypesDetect missing values. Decide whether to drop (.dropna()) or fill (.fillna()). Convert data types and rename columns.Clean-Up Project: Given a messy dataset, write a full cleaning pipeline (detect, handle missing, fix types, rename)..isnull(), .dropna(), .fillna(), .astype(), .rename()
    2.8Exploratory Data Analysis (EDA) Mini ProjectApply all Module 2 skills to perform a complete Exploratory Data Analysis on a real-world dataset.EDA Project: Load, inspect, clean, group, and compute statistics for a chosen dataset. Write a summary of 3 findings.Full Module 2 โ€” pandas EDA workflow
    Module 3

    Data Visualisation Basics

    Students learn to create, customise, and interpret data visualisations using matplotlib and seaborn. Chart types covered include line, bar, pie, histogram, scatter, and heatmap.

    Approx. 8 hrs
    #Lesson TitleWhat Students LearnActivity / ProjectKey Tools / Concepts
    3.1Introduction to MatplotlibUnderstand the anatomy of a matplotlib figure. Create, customise, and save basic line charts.Build: 'Temperature Trend Chart' โ€” plot a week of daily temperatures as a line chart with labels and a grid.import matplotlib.pyplot as plt, plt.plot(), plt.title(), plt.savefig()
    3.2Bar ChartsCreate vertical and horizontal bar charts. Customise colours and add data labels using plt.text().Build: 'Subject Average Scores' โ€” plot vertical bar charts for scores and a horizontal chart for top countries.plt.bar(), plt.barh(), plt.text(), color, edgecolor, width
    3.3Pie Charts & Donut ChartsCreate pie and donut charts. Understand when to use part-to-whole charts and how to explode slices.Build: 'Grade Distribution Pie Chart' โ€” visualise grade bands and explode a slice. Convert to a donut chart.plt.pie(), labels, autopct, explode, wedgeprops (donut)
    3.4Histograms & DistributionsCreate histograms with plt.hist(). Choose bins and understand skewness, spread, and outliers.Build: 'Score Distribution' โ€” plot a histogram with 10 bins. Compare two classes using transparency.plt.hist(), bins, alpha, edgecolor, density
    3.5Scatter Plots & CorrelationCreate scatter plots to explore relationships between two numerical variables. Add a linear trend line.Build: 'Study Hours vs Score' โ€” scatter plot study hours against test scores. Add a linear trend line.plt.scatter(), s, c, alpha, np.polyfit(), np.poly1d()
    3.6Multiple Plots & SubplotsCreate dashboard-style figures with plt.subplots(). Arrange charts in rows and columns with a shared title.Build: 'Data Dashboard' โ€” a 2x2 subplot figure showing a bar chart, histogram, pie chart, and scatter plot.plt.subplots(), fig, ax, plt.suptitle(), figsize, tight_layout()
    3.7Customisation & StylingApply matplotlib style sheets. Customise tick labels, add annotations, and set axis limits.Style Sprint: Apply 'seaborn-v0_8' to the Data Dashboard. Add descriptive annotations and rotate labels.plt.style.use(), plt.annotate(), plt.xticks(rotation=), cmap
    3.8Introduction to SeabornInstall and import seaborn. Use seaborn for cleaner defaults and statistical charts like histplots and heatmaps.Comparison Exercise: Recreate matplotlib charts using seaborn. Create a correlation heatmap.import seaborn as sns, sns.barplot(), sns.histplot(), sns.heatmap(), hue=
    Module 4

    Mini Data Project

    Students apply all skills to a self-chosen real-world dataset. The full data analysis pipeline โ€” loading, cleaning, exploring, visualising, and interpreting โ€” is completed and presented.

    Approx. 6 hrs
    #Lesson TitleWhat Students LearnActivity / ProjectKey Tools / Concepts
    4.1Project Briefing & Dataset SelectionChoose a real-world dataset. Explore datasets briefly with .head() and .describe() to ensure viability.Dataset Exploration: Load 4 provided datasets, explore, select one, and write 2 questions it could answer.pd.read_csv(), .head(), .describe(), .info()
    4.2Project Planning & QuestionsWrite 3โ€“5 specific analysis questions. Plan pandas operations and chart types for each question.Planning Deliverable: Complete a Project Plan Sheet specifying dataset, questions, analysis steps, and charts.Planning โ€” analysis questions, chart type mapping
    4.3Data Loading & CleaningPerform a full data quality audit and apply a cleaning pipeline (handle NaNs, fix types, drop duplicates).Build Sprint: Clean dataset. Print 'before and after' comparisons and save 'clean_data.csv'..isnull().sum(), .dropna(), .fillna(), .astype(), .drop_duplicates()
    4.4Exploratory AnalysisAnswer analysis questions using pandas filtering, grouping, aggregation, and sorting. Identify interesting patterns.Build Sprint: Run pandas code to answer all questions. Annotate each result with a one-sentence interpretation..groupby(), .agg(), .sort_values(), .value_counts(), .corr()
    4.5Data VisualisationCreate one chart per question using matplotlib/seaborn. Combine all charts into a multi-panel figure.Build Sprint: Produce charts, apply a consistent style, and add highlight annotations. Save the dashboard.plt.subplots(), plt.savefig(), sns charts, plt.style.use(), plt.annotate()
    4.6Findings & Written InterpretationWrite a structured analysis report interpreting the visualisations. Practise data storytelling.Report Writing: Complete a findings template. Write 2โ€“3 sentences interpreting each chart clearly.Interpretation, insight writing, data storytelling
    4.7Presentation PreparationStructure a 5-minute presentation: dataset intro, questions, findings with charts, and dataset limitations.Dress Rehearsal: Deliver a timed practice presentation. Teacher provides feedback on clarity and confidence.Presentation structure, data storytelling
    4.8Final Presentation DayDeliver the completed mini data project presentation, answering Q&A.Final Presentation: 5-minute live presentation of charts and findings. Assessed on analysis depth, chart clarity, and interpretation.Full course โ€” Data Analysis pipeline

    Teaching Notes & Tips

    Pacing Guidance

    Each module contains 8 lessons of approximately 50โ€“60 minutes, plus a shorter Module 4 (6 hrs). Lessons 1.7 (NumPy) and 2.4 (pandas intro) often need extra time. Module 4 runs as project sprints.

    Differentiation

    Advanced students can explore: pandas .pivot_table(), time-series analysis with pd.to_datetime(), interactive charts with plotly, or basic linear regression with scikit-learn. Students needing support should focus on core pandas before seaborn.

    Assessment Criteria

    Mini project assessed on: (1) Data Loading & Cleaning. (2) Analysis Depth. (3) Visualisation Quality. (4) Interpretation in plain English. (5) Presentation Confidence.

    Tools & Environment

    Recommended: Jupyter Notebook or JupyterLab (via Anaconda). Alternatively, VS Code with Jupyter extension. Required libraries: numpy, pandas, matplotlib, seaborn. Python 3.9+ recommended.

    Suggested Datasets (Module 4)

    World Happiness Report (Kaggle), Video Game Sales (Kaggle), Titanic Passengers, Premier League Results, COVID-19 Statistics, Student Performance Dataset.

    Prior Knowledge Expected

    Students must be confident with: Python variables, lists, for loops, dictionaries, writing functions, and reading files. Students who have completed Python Fundamentals are well prepared.

    Introduction to Data Analysis ยท Beginner ยท Ages 14โ€“17 ยท ยฉ Course Curriculum

    Enroll Your Child Now