Data Analysis

Data Cleaning and Preparation Pipeline

Build a systematic data cleaning pipeline that transforms messy raw data into analysis-ready datasets with documented transformations.

By The Prompt Black Magic Team

How to Use

Paste into any LLM. Describe your data source and quality issues. Use the pipeline to standardize your data preparation process.

The Prompt

You are a data engineering specialist who has cleaned and prepared datasets for Fortune 500 analytics teams, handling everything from missing values to complex entity resolution across millions of records.

[DATA SOURCE]: Where your data comes from (CSV, database, API, etc.)
[DATA SIZE]: Approximate row and column count
[DATA TYPES]: Types of fields (numeric, categorical, text, dates, etc.)
[KNOWN ISSUES]: Missing values, duplicates, inconsistencies, etc.
[ANALYSIS GOAL]: What you plan to do with the clean data
[TOOLS]: Python/Pandas, R, SQL, Excel, etc.

Build a comprehensive data cleaning pipeline:

**1. Data Profiling**
- Initial shape and structure assessment
- Column-by-column data type verification
- Missing value analysis (percentage, patterns, MCAR/MAR/MNAR)
- Unique value counts and distribution
- Statistical summary (mean, median, std, quartiles)
- Outlier detection methodology
- Data quality score baseline

**2. Missing Value Treatment**
- Strategy by column: drop, impute, or flag
- Imputation methods: mean, median, mode, forward-fill, regression, KNN
- When to drop rows vs. columns
- Missing indicator columns for model features
- Validation of imputation impact

**3. Deduplication**
- Exact duplicate identification
- Fuzzy matching for near-duplicates
- Merge rules when duplicates found
- Record linkage across datasets
- Dedup logging for audit trail

**4. Standardization**
- Date format standardization
- String cleaning (whitespace, case, special characters)
- Categorical value standardization (mapping variants)
- Unit conversion and normalization
- Address and name standardization
- Phone and email format validation

**5. Transformation**
- Feature encoding (one-hot, label, ordinal)
- Binning and discretization
- Log and power transformations for skewed data
- Aggregation and pivot operations
- Derived feature creation
- Text preprocessing (tokenization, stemming, stopwords)

**6. Validation and Documentation**
- Pre/post cleaning comparison metrics
- Data quality checks after each step
- Transformation log documentation
- Reproducible pipeline code structure
- Data dictionary generation
- Quality monitoring for ongoing data feeds

Pro Tips for Data Analysis

Include your tool preferences (Excel, Python, SQL, Tableau) so the AI provides code or formulas you can actually use.
Request the AI to explain its analytical reasoning, not just provide answers. Understanding the "why" lets you validate findings and apply the approach to future analyses.
Always describe your dataset structure (columns, data types, size) and the business question you're trying to answer.

When to Use This Prompt

You have a raw dataset and need to identify which columns matter most before running any analysis.
Your stakeholder asked a vague business question and you need to translate it into specific, answerable analytical queries.
You are presenting findings to a non-technical audience and need clear visualizations and plain-language summaries.

Expected Results

Cleaned and structured datasets with documented transformations and handling of missing values.
Statistical summaries with distribution analysis, correlation findings, and outlier identification.
Visualization recommendations matched to data types and audience comprehension levels.

How to Customize This Prompt

Adjust the tool references to match what you actually use: Excel, Python, R, SQL, or Tableau.
Include your audience for the analysis so the AI adjusts technical depth in explanations and visualizations.
Modify the output format to match how you will present findings: slide deck, written report, or live dashboard.

Read more about Data Analysis prompts →

Data Cleaning and Preparation Pipeline

How to Use

The Prompt

Pro Tips for Data Analysis

When to Use This Prompt

Expected Results

How to Customize This Prompt

More Data Analysis Prompts

Data Cleaning Assistant

SQL Query Generator

Dashboard Design Planner

Regression Analysis Guide

You Might Also Like

Research Literature Review

Local Artisanal Product Curation Planner

Microservices Architecture Planner