Connect Lobstr to ChatGPT: Automate Scraper Squids and Data Runs
Learn how to connect Lobstr to ChatGPT using a managed MCP server. Automate data extraction, orchestrate scraper squids, and track runs with AI.
If you need to connect Lobstr to ChatGPT to automate data extraction, manage scraper squids, and orchestrate massive data runs, you need a Model Context Protocol (MCP) server. This server acts as the translation layer between ChatGPT's tool calls and Lobstr's REST APIs. You can either build and maintain this infrastructure yourself, or use a managed integration platform like Truto to dynamically generate a secure, authenticated MCP server URL. If your team uses Claude, check out our guide on connecting Lobstr to Claude or explore our broader architectural overview on connecting Lobstr to AI Agents.
Giving a Large Language Model (LLM) read and write access to a sprawling data extraction ecosystem like Lobstr is an engineering challenge. You have to handle API authentication, map dynamic schema outputs to MCP tool definitions, and deal with Lobstr's specific asynchronous execution patterns. Every time you need a new crawler, you have to update your server code, redeploy, and test the integration. This guide breaks down exactly how to use Truto to generate a secure, managed MCP server for Lobstr, connect it natively to ChatGPT, and execute complex workflows using natural language.
The Engineering Reality of the Lobstr API
A custom MCP server is a self-hosted integration layer. While the open MCP standard provides a predictable way for models to discover tools, the reality of implementing it against Lobstr's APIs is painful. You aren't just integrating a standard CRM - you are integrating an asynchronous job execution engine with dynamic return schemas and complex account synchronization requirements.
If you decide to build a custom MCP server for Lobstr, you own the entire API lifecycle. Here are the specific integration challenges that break standard CRUD assumptions when working with Lobstr:
Asynchronous Execution Lifecycles Scraping data is not a synchronous operation. If an LLM wants to extract data from a directory, it cannot simply call a GET endpoint and receive rows. It must construct a sequence: create a squid (job definition), push tasks (URLs or search queries), launch a run, poll the run status until completion, and finally fetch the paginated results. If your MCP server does not expose these atomic steps clearly, the LLM will hallucinate the execution state or timeout waiting for a synchronous response.
Dynamic Result Schemas A Google Maps scraper yields fundamentally different JSON data than a LinkedIn jobs scraper. Standardizing these outputs into a single typed schema for an LLM is impossible. Your integration layer must be capable of passing dynamic, un-typed arrays of scraped rows back to the LLM and trusting the model to interpret the structural variance on the fly.
Raw Cookie Authentication for Target Platforms
Lobstr requires connected platform accounts to perform authenticated scraping. Connecting these accounts via API requires submitting raw session cookies (create_a_lobstr_account_cooky). Handling raw authentication cookies securely within an LLM context requires strict parameter validation so the LLM does not inadvertently log or expose sensitive session identifiers.
Strict Rate Limiting and 429s
Scraping engines enforce aggressive concurrency and rate limits to protect their infrastructure. Factual note on rate limits: Truto does not retry, throttle, or apply backoff on rate limit errors. When the upstream Lobstr API returns an HTTP 429, Truto passes that error directly to the caller. Truto normalizes the upstream rate limit information into standardized headers (ratelimit-limit, ratelimit-remaining, ratelimit-reset) per the IETF specification. The caller (or the AI agent framework) is entirely responsible for retry and exponential backoff logic. Do not expect the managed MCP server to absorb these errors.
Generating the Managed MCP Server
Instead of forcing your engineering team to build a custom JSON-RPC router, maintain tool definitions, and handle reverse proxy logic, you can use Truto to generate an MCP server instantly. Truto derives tool definitions dynamically from the integration's resource configurations and API documentation records.
There are two ways to generate your Lobstr MCP server in Truto.
Method 1: Via the Truto UI
For immediate access and testing, the Truto dashboard provides a direct interface:
- Navigate to the Integrated Accounts page in your Truto dashboard and select your connected Lobstr account.
- Click the MCP Servers tab.
- Click Create MCP Server.
- Select your desired configuration. You can filter the server to only expose specific methods (e.g., only "read" methods) or specific tags.
- Click generate and copy the provided MCP server URL (e.g.,
https://api.truto.one/mcp/a1b2c3d4e5f6...).
Method 2: Via the API
For dynamic agent provisioning, you can generate MCP servers programmatically. This endpoint generates a cryptographic token, stores the configuration in a managed database, syncs metadata to distributed KV storage, and returns a ready-to-use URL.
curl -X POST https://api.truto.one/integrated-account/{integrated_account_id}/mcp \
-H "Authorization: Bearer YOUR_TRUTO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Lobstr Extraction Agent",
"config": {
"methods": ["read", "write", "custom"]
},
"expires_at": "2026-12-31T23:59:59Z"
}'The response contains your secure connection URL.
Connecting the MCP Server to ChatGPT
Once you have your Truto MCP URL, connecting it to your AI assistant takes less than a minute. You can do this via the application UI or a configuration file.
Method A: Via the ChatGPT UI
- Open ChatGPT and navigate to Settings -> Apps -> Advanced settings.
- Enable Developer mode (MCP support requires this flag to be active).
- Under the MCP servers / Custom connectors section, click to add a new server.
- Name: Enter a descriptive label (e.g., "Lobstr Data Engine").
- Server URL: Paste the Truto MCP URL you generated earlier.
- Click Save. ChatGPT will immediately perform a JSON-RPC handshake, pull the integration's schemas, and list the available tools.
Method B: Via Manual Config File
If you are running a local agent setup, a custom frontend, or Claude Desktop, you can route connections using the official server-sse transport. Add the following to your mcp_config.json:
{
"mcpServers": {
"lobstr_truto": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-sse",
"https://api.truto.one/mcp/YOUR_SECURE_TOKEN_HERE"
]
}
}
}Restart your client, and the tools will populate automatically.
Lobstr Hero Tools for ChatGPT
Truto dynamically translates Lobstr's API endpoints into flat, LLM-friendly tools. When the LLM calls a tool, the input parameters are evaluated against both the query and body JSON schemas, automatically routing the arguments to the correct payload location.
Here are the highest-leverage tools exposed to your agent.
Discover Available Crawlers
Tool Name: list_all_lobstr_crawlers
Before the LLM can extract data, it needs to know what extraction engines are available on the platform. This tool lists all available crawlers, returning critical data like ID, name, premium status, and the credit cost per row.
"List all available crawlers on Lobstr and find the one designed to scrape LinkedIn company profiles. Tell me its exact ID and how many credits it costs per row."
Provision a Scraping Job
Tool Name: create_a_lobstr_squid
A "squid" is Lobstr's terminology for a configured scraping job. This tool creates a new squid. The LLM must supply the target crawler ID and the desired configuration parameters.
"Create a new squid named 'Q3 Enterprise Lead Extraction' using the LinkedIn company crawler ID we just found."
Queue Targets for Extraction
Tool Name: create_a_lobstr_task_upload
Once a squid exists, the LLM must populate it with targets (URLs or search queries). This tool queues tasks for the engine. It returns a duplicate count, allowing the agent to know if it accidentally submitted overlapping targets.
"Upload this list of 50 target company URLs to the 'Q3 Enterprise Lead Extraction' squid. Let me know if any of them were flagged as duplicates by the platform."
Trigger the Data Run
Tool Name: create_a_lobstr_run
This tool executes the actual scraping job. It initiates a run on a specific squid and returns a run ID. Because scraping is asynchronous, the LLM uses this run ID for all subsequent tracking.
"Start a new run for the Q3 Enterprise squid. Give me the run ID so we can monitor its progress."
Monitor Execution Progress
Tool Name: get_single_lobstr_run_stat_by_id
Provides detailed, real-time telemetry on an active run. It returns the completion percentage, total tasks done, duration, and ETA. The LLM can use this to poll the API until the is_done flag returns true.
"Check the status of run ID 847291. Tell me what percentage is complete and what the estimated time remaining is. If it is done, let me know so we can fetch the results."
Retrieve Extracted Data
Tool Name: list_all_lobstr_results
Once the run completes, this tool fetches the scraped data results. The fields in the response vary completely based on the squid's configuration. The LLM digests these raw JSON rows and synthesizes them for the user.
"Fetch the results from our completed Q3 Enterprise run. Format the extracted data into a markdown table showing the Company Name, Employee Count, and Website URL."
Audit Platform Credits
Tool Name: get_single_lobstr_balance_by_id
Extracting data costs credits. This tool allows the agent to audit the current account balance, consumed credits, rollover balance, and check for any unpaid bills before initiating massive runs.
"Check my current Lobstr account balance. Do I have enough available credits to execute a run that will yield approximately 5,000 rows?"
To view the complete schema definitions, required parameters, and the full list of available operations, see the Lobstr integration page.
Workflows in Action
By chaining these tools together, ChatGPT transitions from a conversational interface into an autonomous data extraction engine.
Scenario 1: Automated Lead Generation Pipeline
An operations manager needs to scrape a list of competitor URLs and summarize the pricing data.
"Find the appropriate Lobstr crawler for extracting generic website text. Create a new squid named 'Competitor Pricing Scrape'. Upload this list of 10 competitor URLs as tasks. Start the run, and check the status. Once the run is complete, fetch the results and build a comparison matrix of their pricing tiers."
Execution Steps:
- ChatGPT calls
list_all_lobstr_crawlersto identify the generic web scraping engine. - It calls
create_a_lobstr_squidusing the crawler ID. - It calls
create_a_lobstr_task_uploadwith the provided URLs. - It calls
create_a_lobstr_runto initiate the job and captures the Run ID. - It repeatedly calls
get_single_lobstr_run_stat_by_id(polling based on its own internal reasoning loop) untilis_doneis true. - It calls
list_all_lobstr_resultsto retrieve the unstructured text, parses the JSON, and generates the final markdown comparison table for the user.
Scenario 2: Credit Monitoring and Run Audits
A data engineering lead notices unexpected billing spikes and asks the AI to investigate.
"Check my Lobstr account balance to see how many credits have been consumed this month. Then, list all recent runs and identify any that used an excessive amount of credits or failed due to errors."
Execution Steps:
- ChatGPT calls
get_single_lobstr_balance_by_idto retrieve total consumed credits, available credits, and reset time. - It calls
list_all_lobstr_runs(across active squids) to pull historical execution data. - It analyzes the
credit_usedandstatusfields across the returned array. - The agent reports back to the user with a specific breakdown, identifying a runaway squid that chewed through rollover credits.
sequenceDiagram
participant ChatGPT as ChatGPT
participant MCP as "Truto MCP Server"
participant Lobstr as "Lobstr API"
ChatGPT->>MCP: Call create_a_lobstr_run
MCP->>Lobstr: POST /v1/runs
Lobstr-->>MCP: 200 OK (Run ID)
MCP-->>ChatGPT: Return Run ID
loop Polling Phase
ChatGPT->>MCP: Call get_single_lobstr_run_stat_by_id
MCP->>Lobstr: GET /v1/runs/:id/stats
Lobstr-->>MCP: Return percent_done
MCP-->>ChatGPT: Return status
end
ChatGPT->>MCP: Call list_all_lobstr_results
MCP->>Lobstr: GET /v1/results
Lobstr-->>MCP: Array of scraped rows
MCP-->>ChatGPT: Return structured dataSecurity and Access Control
Exposing a massive scraping engine to an LLM requires strict governance. Truto's MCP servers provide several layers of access control built directly into the connection URL:
- Method Filtering: Limit the MCP server to specific operation types. By configuring
methods: ["read"], you ensure the LLM can audit crawlers and view balances, but cannot accidentally launch a 10,000-credit run (create,update,deleteare blocked). - Tag Filtering: Group tools by functional area. You can restrict the server to only expose tools related to
runsandresults, hiding account management and administrative tools. - API Token Authentication: For elevated security, enable
require_api_token_auth: true. This forces the client to pass a valid Truto API token in theAuthorizationheader, ensuring possession of the URL alone is not enough to execute tools. - Automatic Expiration: Set an
expires_atdatetime. Background cleanup workers will automatically invalidate the cryptographic token and purge it from distributed storage, making it perfect for temporary contractor access or bounded CI/CD scraping pipelines.
Strategic Wrap-Up
Connecting Lobstr to ChatGPT using an MCP server transforms your LLM into a highly capable orchestration engine for massive data extraction. Instead of writing custom Python scripts to manage squids, handle polling logic, and parse dynamic JSON arrays, your AI agent can manage the entire asynchronous lifecycle natively.
Truto eliminates the infrastructure burden. You do not have to host a reverse proxy, manage token hashing, write massive JSON schema definitions, or manually map REST endpoints to LLM tools. By connecting your Lobstr account to Truto, you get a fully managed, secure, and production-ready MCP server instantly.
FAQ
- How do I handle Lobstr API rate limits with Truto MCP?
- Truto does not retry or apply backoff logic automatically. Instead, Truto passes the HTTP 429 error directly back to ChatGPT while normalizing the headers (`ratelimit-limit`, `ratelimit-remaining`, `ratelimit-reset`). Your agent or caller must handle the exponential backoff.
- Can ChatGPT create new scraping tasks in Lobstr?
- Yes. Using the `create_a_lobstr_squid` and `create_a_lobstr_task_upload` tools, ChatGPT can define new scraping jobs and upload target URLs or search queries entirely autonomously.
- How does the MCP server authenticate with ChatGPT?
- Truto generates a secure URL containing a cryptographic token. For higher security, you can enable `require_api_token_auth` on the MCP configuration, which forces the client to also provide a valid Truto API token in the headers.