Home

Charity Data Exploration

A Python data analysis project exploring fundraising efficiency among Forbes top 100 charities.

About the Project

This notebook uses pandas, matplotlib, and seaborn to clean and explore a real-world charity dataset from Kaggle. The guiding question is: what factors are associated with efficient fundraising?

Featured Code

The project required converting financial strings, such as dollar amounts in millions or billions, into numeric values before analysis and visualization.

def clean_and_convert_financial(series):
    series = series.replace('Na', float('NaN'))
    series = series.astype(str).str.replace('$', '', regex=False)
    series = series.str.replace(',', '', regex=False).str.replace(' ', '', regex=False)

    def convert_to_numeric(value):
        if isinstance(value, str):
            value = value.upper()
            if 'B' in value:
                return float(value.replace('B', '')) * 1_000_000_000
            elif 'M' in value:
                return float(value.replace('M', '')) * 1_000_000
            elif '%' in value:
                return float(value.replace('%', ''))
        try:
            return float(value)
        except (ValueError, TypeError):
            return float('NaN')

    return series.apply(convert_to_numeric)

Key Takeaways

Fundraising Efficiency Was Generally High

The distribution shows that most organizations clustered near the upper end of the fundraising efficiency range.

Histogram showing the distribution of fundraising efficiency percentages

Donor Dependency Was Not a Clear Predictor

The scatterplot does not show a strong visual relationship between donor dependency and fundraising efficiency.

Scatterplot comparing fundraising efficiency and donor dependency by charity category

Open Full Notebook Raw Notebook