Analysis and visualization of medical data in the field of mental health with Python¶
Efim Shliamin¶
In recent years, there has been a growing awareness of mental health and its impact on different aspects of life. Mental health is especially important for students in the field of education. For example, the article "Grading Bias and Young Adult Mental Health" explains that grading bias influences health and skill development after graduation. However, the effects differ based on gender, socioeconomic status, and migration background, leading to health inequalities.
The article "Effects of Mental Health on Student Learning" also mentions that mental illness is linked to lower academic success and poorer achievement. Good mental health can boost academic performance, while mental stress and issues can negatively affect it.
The data set titled "A Statistical Research on the Effects of Mental Health on Students' CGPA" was gathered through a survey using Google Forms among university students. The main goal of this study was to examine the relationship between students' mental health and their academic success, as measured by their cumulative grade point average (CGPA). We compared our findings with previous studies and found no negative relationship between grades and mental health disorders in our study. However, our research focused on the mental health of students before they graduate.
The survey included questions on various aspects of mental health, such as stress, anxiety, depression, and sleep disorders. It also collected demographic information and data on the students' academic backgrounds. By analyzing this data set, we can gain important insights into how mental health affects students' academic success. This comprehensive study allows us to compare our results with those published in the journal Health Economics. The findings could be valuable for educational institutions, psychologists, and counselors in developing targeted support measures and interventions to promote the mental health and well-being of students.
As part of this project, we will analyze the data set using statistical methods to investigate how various mental health factors affect students' CGPA. We will also visualize the results to better represent and interpret the relationships. This project has the potential to enhance our understanding of students' mental health and provide recommendations for educational institutions to promote students' well-being and academic success.
Data collection¶
This questionnaire was developed based on the dataset "A Statistical Research on the Effects of Mental Health on Students' CGPA" and is designed to gather information on how mental health affects students' academic success.
This data set was collected from 100 university students through a survey conducted via Google Forms to examine their current academic situation and mental health.
Data analysis and data visualization¶
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
# Load data from CSV file into a DataFrame
data = pd.read_csv('students.csv')
data = data.dropna()
# Convert relevant columns to numeric
data['Do you have Anxiety?'] = data['Do you have Anxiety?'].replace({'Yes': 1, 'No': 0}).astype(int)
data['Do you have Depression?'] = data['Do you have Depression?'].replace({'Yes': 1, 'No': 0}).astype(int)
data['Do you have Panic attack?'] = data['Do you have Panic attack?'].replace({'Yes': 1, 'No': 0}).astype(int)
# Clean 'Your current year of Study' column
data['Your current year of Study'] = data['Your current year of Study'].str.extract('(\d+)').astype(int)
# Anxiety Analysis
anxiety_by_age = data.groupby('Age')['Do you have Anxiety?'].sum()
total_anxiety = data['Do you have Anxiety?'].sum()
relative_anxiety_by_age = anxiety_by_age / total_anxiety if total_anxiety > 0 else pd.Series()
fig1, ax1 = plt.subplots(1, 2, figsize=(12, 6))
ax1[0].bar(anxiety_by_age.index, anxiety_by_age.values)
ax1[0].set_xlabel('Age')
ax1[0].set_ylabel('Absolute Anxiety Values')
ax1[0].set_title('Absolute Anxiety Values by Age')
ax1[1].bar(relative_anxiety_by_age.index, relative_anxiety_by_age.values)
ax1[1].set_xlabel('Age')
ax1[1].set_ylabel('Relative Anxiety Values')
ax1[1].set_title('Relative Anxiety Values by Age')
plt.tight_layout()
plt.show()
# Depression Analysis
depression_by_age = data.groupby('Age')['Do you have Depression?'].sum()
total_depression = data['Do you have Depression?'].sum()
relative_depression_by_age = depression_by_age / total_depression if total_depression > 0 else pd.Series()
fig2, ax2 = plt.subplots(1, 2, figsize=(12, 6))
ax2[0].bar(depression_by_age.index, depression_by_age.values)
ax2[0].set_xlabel('Age')
ax2[0].set_ylabel('Absolute Depression Values')
ax2[0].set_title('Absolute Depression Values by Age')
ax2[1].bar(relative_depression_by_age.index, relative_depression_by_age.values)
ax2[1].set_xlabel('Age')
ax2[1].set_ylabel('Relative Depression Values')
ax2[1].set_title('Relative Depression Values by Age')
plt.tight_layout()
plt.show()
# Panic Attack Analysis
panic_by_age = data.groupby('Age')['Do you have Panic attack?'].sum()
total_panic = data['Do you have Panic attack?'].sum()
relative_panic_by_age = panic_by_age / total_panic if total_panic > 0 else pd.Series()
fig3, ax3 = plt.subplots(1, 2, figsize=(12, 6))
ax3[0].bar(panic_by_age.index, panic_by_age.values)
ax3[0].set_xlabel('Age')
ax3[0].set_ylabel('Absolute Panic Values')
ax3[0].set_title('Absolute Panic Values by Age')
ax3[1].bar(relative_panic_by_age.index, relative_panic_by_age.values)
ax3[1].set_xlabel('Age')
ax3[1].set_ylabel('Relative Panic Values')
ax3[1].set_title('Relative Panic Values by Age')
plt.tight_layout()
plt.show()
# Age vs. Year of Study Analysis
age_vs_year = data.groupby('Age')['Your current year of Study'].mean()
fig4 = plt.figure(figsize=(10, 6))
plt.plot(age_vs_year.index, age_vs_year.values, marker='o')
plt.xlabel('Age')
plt.ylabel('Year of Study')
plt.title('Age vs. Year of Study')
plt.show()
# T-Test between students with and without Anxiety for Years of Study
t_test_results = stats.ttest_ind(data[data['Do you have Anxiety?'] == 1]['Your current year of Study'],
data[data['Do you have Anxiety?'] == 0]['Your current year of Study'])
print(f"T-Test Results: Statistic={t_test_results.statistic:.4f}, P-value={t_test_results.pvalue:.4f}")
T-Test Results: Statistic=-0.2788, P-value=0.7810
The diagram indicates a general trend: most students are nearing the end of their studies by age 22. To illustrate this, we used a bar plot.
Following this analysis, we investigated student performance in relation to mental disorders. Statistical methods such as the t-test were employed. The data was divided into two groups: students with mental disorders and students without mental disorders. The null hypothesis stated that there would be no difference in performance between the two groups, while the alternative hypothesis proposed that there would be a difference.
However, it is important to consider other influencing factors such as educational level, socioeconomic status, and the presence of other health problems.
Find connections¶
The aim of our analysis is to identify relationships between students' grades and their mental states. We use statistical methods to uncover patterns and relationships between students' grades and their mental health. By examining these relationships, we can understand the possible effects of mental health on academic performance.
Statistical methods, such as the t-test, were applied. The results provide the following information:
Average grade for students with depression: 3.14
Average grade for students without depression: 3.05
The t-test results are as follows:
t-value: 0.654
p-value: 0.514 The t-value measures the difference in average grades between the two groups. The p-value indicates the probability of observing the data if there is no real difference between the groups. In this case, the p-value suggests that there is no significant difference in grades between students with and without depression.
Interpretation of the results¶
Based on the average grades, students with depression have a slightly higher average (3.14) compared to students without depression (3.05). However, the results of the t-test, with a t-value of 0.654 and a p-value of 0.514, indicate that there is no significant difference in the average grades between the two groups. The p-value of 0.514 is greater than the commonly used significance level of 0.05, suggesting that the observed difference is likely due to chance rather than an actual difference.
Conclusion¶
The p-value is a statistical measure that helps evaluate the evidence against the null hypothesis. In a statistical test, the p-value indicates the likelihood of obtaining the observed data, or even more extreme results, if the null hypothesis is true. A small p-value suggests that the observed data is unlikely if the null hypothesis is true, thereby supporting the alternative hypothesis.
The t-value is a measure of statistical significance used in hypothesis tests. It shows how strongly the average values of two groups (e.g., average grades of students with and without depression) differ in standard errors. A higher t-value indicates a greater deviation between the groups.
For this paper, we compared the average grades of students with and without depression. The results show that the average grade of students with depression (3.14) is slightly higher than the average grade of students without depression (3.05). The t-value is 0.654, and the p-value is 0.514. Since the p-value is greater than the commonly used significance level of 0.05, we do not have sufficient evidence to reject the null hypothesis. This means we could not find a significant difference in mean grades between the two groups. Thus, it cannot be said unequivocally that depression has a significant effect on grades.
Therefore, our results contradict the statement published in the article "Effects of Mental Health on Student Learning."