Data Analysis

Data Cleaning Assistant

Describe your messy dataset. Get a step-by-step cleaning plan with Python or SQL code for handling missing values, duplicates, and formats.

By The Prompt Black Magic Team

Describe your data issues (column names, data types, problems). Get ready-to-run code for your preferred tool.

Act as a senior data analyst. I need help cleaning and preprocessing a dataset before analysis.

Dataset description:
- Source: [WHERE THE DATA COMES FROM]
- Number of rows: [APPROXIMATE]
- Number of columns: [NUMBER]
- Column names and types: [LIST THEM, e.g., name (text), date (mixed formats), revenue (numbers with $ signs)]
- Tool I am using: [PYTHON PANDAS / SQL / EXCEL / R]

Known issues:
[DESCRIBE YOUR DATA PROBLEMS, e.g.:
- Date column has mixed formats (MM/DD/YYYY and YYYY-MM-DD)
- Price column has some entries with $ signs and commas
- About 15% of the email column is blank
- Duplicate rows based on customer_id
- Some names have extra whitespace or inconsistent capitalization]

For each issue, provide:
1. What the problem is and why it matters for analysis
2. The recommended approach to fix it
3. The exact code to implement the fix (in my preferred tool)
4. A validation check to confirm the fix worked

Also provide:
- A summary statistics check I should run after cleaning
- A data quality report template I can reuse
- Suggestions for any additional cleaning steps I might have missed