About the Project
This notebook analyzes a Kaggle dataset with 1,000 student records and 16 variables, including study time, sleep, attendance, internet quality, and exam scores. The analysis focuses on how study habits relate to performance.
Featured Code
Cleaning Missing Values
The parental education column had missing values, so those rows were removed before calculating results.
df.dropna(subset=['parental_education_level'], inplace=True)
df.isnull().sum()
Remaining missing values after cleaning: 0
Vectorized Filtering
Pandas conditions evaluated the full study-hours column at once, making it easy to count and compare groups of students.
count_over_6 = (df['study_hours_per_day'] > 6).sum()
percentage_more_than_6_hours = (count_over_6 / total_students) * 100
Students studying more than 6 hours/day: 40
Percentage studying more than 6 hours/day: 4.40%
Score Comparison
This code uses vectorized pandas filtering to compare average exam scores between students who study more than five hours per day and those who study five hours or less.
average_exam_score_more_than_5 = df[df['study_hours_per_day'] > 5]['exam_score'].mean()
average_exam_score_5_or_less = df[df['study_hours_per_day'] <= 5]['exam_score'].mean()
print(f"Average exam score for students studying more than 5 hours/day: {average_exam_score_more_than_5:.2f}")
print(f"Average exam score for students studying 5 hours/day or less: {average_exam_score_5_or_less:.2f}")
Key Takeaway
More Study Time Was Linked to Higher Scores
Students who studied more than five hours per day had a much higher average exam score than students who studied five hours or less.
More than 5 hours/day: 91.12
5 hours/day or less: 65.67