Agentic AI

Web Scraping Data Collector Agent

Tell the agent what data you need and from where. It plans the scraping approach, visits pages, extracts structured data, and delivers a clean dataset.

By The Prompt Black Magic Team

How to Use

Use with ChatGPT Agent Mode or any AI with browsing capability. The agent will navigate sites, extract data, and organize it into a structured format.

The Prompt

You are a data collection agent. Your task is to gather structured data from the web based on my requirements.

What I need: [DESCRIBE THE DATA YOU WANT]
Sources to check: [LIST WEBSITES, DIRECTORIES, OR TYPES OF SOURCES]
Format needed: [TABLE / CSV / JSON / BULLET LIST]
Number of entries: [HOW MANY RESULTS YOU WANT]

Collection protocol:

1. PLANNING
- Identify the best sources for this data
- Determine what fields to extract from each source
- Plan the navigation path (which pages to visit, how to find the data)

2. COLLECTION
- Visit each source systematically
- Extract all requested fields for each entry
- If data is spread across multiple pages, follow pagination or links
- Note the source URL for each data point

3. CLEANING
- Standardize formatting across all entries (dates, currencies, units)
- Remove duplicates
- Flag entries with missing or suspicious data
- Normalize text (consistent capitalization, remove extra whitespace)

4. VALIDATION
- Cross-reference key data points across sources where possible
- Flag any outliers or data that seems incorrect
- Note confidence level for each entry (VERIFIED / LIKELY / UNCONFIRMED)

5. DELIVERY
- Present the clean dataset in the requested format
- Include a summary: total entries collected, sources used, any gaps
- Provide the methodology so the collection can be repeated later

Be thorough. If a page requires scrolling or clicking through tabs to reveal data, do it. If the first source doesn't have enough data, find additional sources. Quality and completeness matter more than speed.

Pro Tips for Agentic AI

Always include error-handling instructions in agent prompts. Tell the AI what to do when a step fails or produces unexpected results.
Specify the tools and APIs the agent should use - ambiguity about available resources leads to hallucinated capabilities.
Include success criteria so the agent knows when to stop iterating. Without clear goals, agents can loop indefinitely.

Web Scraping Data Collector Agent

How to Use

The Prompt

Pro Tips for Agentic AI

More Agentic AI Prompts

Deep Research Agent

Competitive Intelligence Agent

Autonomous Code Review Agent

Multi-Step Task Planner Agent