Skip to content

Connect Lobstr to Claude: Manage Crawlers and Automated Results

Learn how to connect Lobstr to Claude using a managed MCP server to orchestrate web scraping squids, tasks, and data exports directly from your AI agent.

Uday Gajavalli Uday Gajavalli · · 10 min read
Connect Lobstr to Claude: Manage Crawlers and Automated Results

If you need to connect Lobstr to Claude to automate web data extraction, manage scraping crawlers, and orchestrate automated result exports, you need a Model Context Protocol (MCP) server. This server acts as the translation layer between Claude's tool calls and Lobstr's REST APIs. You can either spend weeks building and maintaining this infrastructure yourself, or use a managed integration platform like Truto to dynamically generate a secure, authenticated MCP server URL.

If your team uses ChatGPT, check out our guide on connecting Lobstr to ChatGPT or explore our broader architectural overview on connecting Lobstr to AI Agents.

Giving a Large Language Model (LLM) read and write access to an asynchronous, credit-based execution platform like Lobstr is an engineering challenge. You have to map highly variable crawler input schemas to MCP tool definitions, deal with asynchronous polling logic, and safely handle strict usage limits. Every time Lobstr adds a new crawler or updates an execution state, you have to update your server code, redeploy, and test the integration. This guide breaks down exactly how to use Truto to generate a secure, managed MCP server for Lobstr, connect it natively to Claude, and execute complex scraping workflows using natural language.

Stop building custom API wrappers. Generate secure MCP servers for 100+ B2B apps in seconds. :::

The Engineering Reality of the Lobstr API

A custom MCP server is a self-hosted integration layer. While the open MCP standard provides a predictable way for models to discover tools, the reality of implementing it against Lobstr's APIs is complex. Lobstr is not a standard REST CRUD app - it is a job orchestration and execution platform.

If you decide to build a custom MCP server for Lobstr, you own the entire API lifecycle. Here are the specific challenges you will face:

The Asynchronous Execution Hierarchy Lobstr relies on a strict operational hierarchy: Crawler -> Squid -> Task -> Run -> Result. An LLM cannot simply ask Lobstr to "scrape this URL." It must first identify the right Crawler, instantiate a Squid (the job container), queue Tasks (the target URLs), initiate a Run, and then poll the Run until it completes. Exposing this raw hierarchy to an LLM usually results in the model trying to skip steps - like requesting results before a run is finished. Your MCP server must explicitly define schemas that guide the LLM through this multi-step asynchronous dance.

Highly Variable Parameter Schemas Every Lobstr crawler has a unique configuration schema. A LinkedIn profile scraper requires completely different inputs than an Amazon product scraper. The params object in Lobstr API requests is entirely dynamic. Hardcoding an OpenAPI spec for Lobstr is virtually impossible because the schema drifts depending on the specific crawler you use. A resilient MCP server needs to allow the LLM to query get_single_lobstr_crawler_param_by_id dynamically to figure out the required inputs before it attempts to build a task.

Strict Rate Limits and Polling Because Lobstr runs are asynchronous, clients must poll the API to check execution status. Aggressive polling triggers Lobstr's rate limits. Factual note on rate limits: Truto does not retry, throttle, or apply backoff on rate limit errors. When the upstream Lobstr API returns an HTTP 429 Too Many Requests, Truto passes that error directly to the caller. Truto normalizes the upstream rate limit info into standardized headers (ratelimit-limit, ratelimit-remaining, ratelimit-reset) per the IETF spec. The Claude client or calling agent is fully responsible for intercepting the 429, reading the ratelimit-reset header, and implementing its own backoff and retry logic.

Instead of building this orchestration logic from scratch, you can use Truto. Truto exposes Lobstr's endpoints as meticulously documented, ready-to-use MCP tools, handling all the underlying HTTP boilerplate so Claude can focus on reasoning through the scraping lifecycle.

How to Generate a Lobstr MCP Server

Truto dynamically generates MCP tools based on the active API documentation for your Lobstr integration. Tools are generated on the fly during the tools/list JSON-RPC handshake - they are never cached or stale.

You can generate an MCP server for a connected Lobstr account using either the Truto UI or the API.

Method 1: Via the Truto UI

If you are setting this up for internal team use, the Truto dashboard is the fastest route.

  1. Navigate to the Integrated Accounts page in your Truto dashboard and select your connected Lobstr account.
  2. Click the MCP Servers tab.
  3. Click Create MCP Server.
  4. Configure your server filters (e.g., restrict to read methods or specific tags like runs and squids).
  5. Copy the generated MCP server URL (e.g., https://api.truto.one/mcp/a1b2c3d4...).

Method 2: Via the Truto API

If you are building an AI agent product and need to programmatically provision Lobstr MCP servers for your end-users, you use the Truto REST API. The endpoint validates the integration, provisions a secure token backed by a distributed KV store, and schedules any necessary expiration alarms.

Execute a POST request to /integrated-account/:id/mcp:

curl -X POST https://api.truto.one/integrated-account/<lobstr_account_id>/mcp \
  -H "Authorization: Bearer <YOUR_TRUTO_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Lobstr Web Scraping Agent",
    "config": {
      "methods": ["read", "write", "custom"]
    }
  }'

The API returns a fully qualified, authenticated MCP server URL:

{
  "id": "mcp_token_987654",
  "name": "Lobstr Web Scraping Agent",
  "config": { "methods": ["read", "write", "custom"] },
  "expires_at": null,
  "url": "https://api.truto.one/mcp/xyz789..."
}

Connecting the MCP Server to Claude

Once you have your Truto MCP URL, you need to register it with Claude. Anthropic supports connecting remote MCP servers over Server-Sent Events (SSE) or stdio.

Method A: Via the Claude UI

If you are using the Claude desktop app or web interface (for Enterprise/Team plans), you can add the connector directly in the UI.

  1. Open Claude and navigate to Settings -> Integrations.
  2. Click Add MCP Server (or Add Custom Connector).
  3. Paste the Truto MCP URL you generated.
  4. Click Add.

Claude will immediately execute an initialize handshake, request the tools/list, and populate its context window with the available Lobstr capabilities.

Method B: Via Manual Config File

If you are running Claude Desktop locally and prefer file-based configuration, or if you are integrating via a framework like Cursor, you can update your claude_desktop_config.json file.

Because Truto provides a hosted SSE endpoint, you use the official MCP SSE transport module to connect:

{
  "mcpServers": {
    "lobstr_truto": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-sse",
        "https://api.truto.one/mcp/xyz789..."
      ]
    }
  }
}

Restart Claude Desktop. The agent is now wired directly into your Lobstr environment.

Hero Tools for Lobstr Automation

Truto exposes the entirety of the Lobstr API, but providing the LLM with the right context is key. Below are the highest-leverage "hero tools" generated by Truto that enable Claude to orchestrate the full execution lifecycle.

list_all_lobstr_crawlers

Before Claude can scrape anything, it needs to know what tools are available on the platform. This tool lists all available Lobstr crawlers, returning their IDs, names, and credit costs per row.

"I need to scrape some LinkedIn profiles. Can you list the available crawlers in my Lobstr account and find one that handles LinkedIn, and tell me how much it costs per row?"

get_single_lobstr_crawler_param_by_id

Because crawler inputs are highly dynamic, Claude must call this tool to fetch the exact schema required for a specific crawler before building a Squid. It returns the configurable input parameters separated into task and squid configuration objects.

"I found the LinkedIn profile scraper crawler. Fetch its parameter schema so we know exactly what input fields and JSON structure it requires before we build the task."

create_a_lobstr_squid

A Squid is the execution container for a scraping job. This tool instantiates a new Squid in Lobstr for a specific crawler ID, preparing it to receive tasks.

"Create a new Lobstr squid named 'Q3 Competitor Tracking' using the LinkedIn crawler ID we just looked up."

create_a_lobstr_task

Tasks represent the actual work - usually target URLs or search queries. This tool allows Claude to batch-upload URLs into the Squid. It returns an array of queued tasks and indicates if any duplicates were skipped.

"Add these five target LinkedIn URLs as tasks to the 'Q3 Competitor Tracking' squid we just created. Make sure they are formatted according to the crawler's parameter schema."

create_a_lobstr_run

Once a Squid is loaded with tasks, this tool triggers the actual scraping engine. It returns a run ID and the initial execution status. Claude needs to hold onto this run hash for polling.

"Start a run for the 'Q3 Competitor Tracking' squid. Give me the run ID so we can monitor its progress."

get_single_lobstr_run_stat_by_id

Because scraping takes time, Claude uses this tool to check real-time statistics for an active run. It returns the percentage done, total tasks processed, duration, ETA, and a boolean is_done flag.

"Check the status of the run we just started. If it isn't finished, tell me the ETA and how many tasks have successfully processed so far."

list_all_lobstr_results

Once a run is complete (is_done: true), Claude calls this tool to retrieve the actual scraped data. It returns an array of result rows containing the data payload extracted by the crawler.

"The run is finished. Fetch all the results from the squid and summarize the key findings from the scraped LinkedIn profiles."

For the complete tool inventory, including delivery configurations, webhooks, and account credential management, see the Truto Lobstr Integration Page.

Workflows in Action

When Claude has access to these MCP tools, it stops being a mere chat interface and becomes an autonomous data operations engineer. Here is how Claude handles complex Lobstr workflows in practice.

1. The End-to-End Autonomous Scrape

Marketing and growth teams frequently need to run ad-hoc data enrichment. Instead of logging into a UI, clicking through menus, and manually downloading CSVs, a user can instruct Claude to handle the entire asynchronous pipeline.

"I need to scrape data for these 10 company URLs using the standard domain enrichment crawler. Set up the job in Lobstr, execute it, wait for it to finish, and then give me a table of the output data."

Here is how Claude executes this multi-step orchestration:

  1. Claude calls list_all_lobstr_crawlers to find the ID for the domain enrichment crawler.
  2. Claude calls get_single_lobstr_crawler_param_by_id to understand the exact JSON structure required for the URLs.
  3. Claude calls create_a_lobstr_squid to spin up the execution container.
  4. Claude calls create_a_lobstr_task passing the 10 URLs mapped to the required schema.
  5. Claude calls create_a_lobstr_run to initiate the scrape and extracts the run_hash.
  6. Claude enters a polling loop, calling get_single_lobstr_run_stat_by_id. (If the API returns a 429 rate limit error due to aggressive polling, Claude reads the ratelimit-reset header and backs off).
  7. Once is_done is true, Claude calls list_all_lobstr_results to retrieve the payload.
  8. Claude formats the raw JSON response into a clean Markdown table for the user.
sequenceDiagram
    participant User as User
    participant Claude as Claude Desktop
    participant MCP as Truto MCP Server
    participant Lobstr as Lobstr API
    
    User->>Claude: "Scrape these 10 URLs..."
    Claude->>MCP: Call list_all_lobstr_crawlers
    MCP->>Lobstr: GET /crawlers
    Lobstr-->>MCP: [Crawler List]
    MCP-->>Claude: Return Crawler ID
    
    Claude->>MCP: Call create_a_lobstr_squid
    MCP->>Lobstr: POST /squids
    Lobstr-->>MCP: {squid_id}
    MCP-->>Claude: Return Squid ID
    
    Claude->>MCP: Call create_a_lobstr_task
    MCP->>Lobstr: POST /squids/{id}/tasks
    Lobstr-->>MCP: {queued_tasks}
    MCP-->>Claude: Confirm tasks added
    
    Claude->>MCP: Call create_a_lobstr_run
    MCP->>Lobstr: POST /runs
    Lobstr-->>MCP: {run_hash, status: "running"}
    MCP-->>Claude: Return Run ID
    
    loop Polling
        Claude->>MCP: Call get_single_lobstr_run_stat_by_id
        MCP->>Lobstr: GET /runs/{hash}/stats
        Lobstr-->>MCP: {is_done: false, percent: 50}
        MCP-->>Claude: Status update
    end
    
    Claude->>MCP: Call list_all_lobstr_results
    MCP->>Lobstr: GET /squids/{id}/results
    Lobstr-->>MCP: [Scraped Data]
    MCP-->>Claude: Return JSON results
    Claude-->>User: Markdown table of data

2. Credit Monitoring and Run Abort

Data Ops teams need to ensure that scraping jobs do not quietly drain account credits. Claude can audit active runs, check resource consumption, and kill runaway jobs.

"Check all my active Lobstr squids. If any run has consumed more than 500 credits but is less than 20% done, abort it immediately and tell me the run ID."

Claude processes this operational rule by bridging multiple endpoints:

  1. Claude calls list_all_lobstr_squids to get the user's active configurations.
  2. For each squid, Claude calls list_all_lobstr_runs to find active runs.
  3. Claude calls get_single_lobstr_run_stat_by_id to check the percent_done.
  4. Claude calls get_single_lobstr_run_credit_by_id to check the total_credits consumed.
  5. If Claude finds a run matching the criteria (e.g., 600 credits used, 15% done), it calls create_a_lobstr_run_abort passing the run_hash.
  6. Claude reports back to the user with the aborted run details.

Security and Access Control

Giving an AI agent access to an execution platform that burns financial credits requires strict guardrails. Truto's MCP architecture provides native access controls at the server level, ensuring the model cannot perform unauthorized actions regardless of the user's prompt.

  • Method Filtering: You can enforce Read-Only architectures. By passing config: { methods: ["read"] } during MCP server creation, Truto will strip out all create, update, and delete tools. Claude can check run statuses and list results, but it physically cannot start new squids or spend credits.
  • Tag Filtering: You can restrict the MCP server to specific functional domains. If you only want the agent to audit account health, you can filter tools to only include those tagged with accounts or billing, hiding the crawler and execution tools completely.
  • Double Authentication: By enabling require_api_token_auth: true, the MCP server URL itself is no longer enough to execute tools. The connecting client (Claude) must pass a valid Truto API token in the header. This prevents unauthorized execution if the MCP URL is leaked in a config file.
  • Auto-Expiring Servers: If you are provisioning an agent for a temporary scraping project, you can set an expires_at timestamp. Truto's underlying durable scheduling system will automatically purge the credentials and invalidate the server at the exact expiration time, leaving zero zombie access points.

Moving from Chat to Automation

Connecting Lobstr to Claude via an MCP server transitions your workflows from manual UI operations to intelligent, conversational automation. By utilizing Truto, you bypass the massive engineering overhead of translating Lobstr's asynchronous execution hierarchy and dynamic schemas into reliable LLM tools.

Instead of writing custom polling loops, tracking cursor pagination, and maintaining crawler schemas, your engineering team can focus on what matters: building superior AI agents that extract value from the web. Truto handles the API normalization; Claude handles the logic.

Stop wrestling with API schemas. Generate a production-ready Lobstr MCP server in seconds. :::

FAQ

How do I connect Lobstr to Claude?
You connect Lobstr to Claude by deploying a Model Context Protocol (MCP) server. Truto generates a managed, authenticated MCP URL for your Lobstr account, which you can paste into Claude Desktop's custom connectors or configure via the claude_desktop_config.json file.
How does Truto handle Lobstr rate limits?
Truto does not retry, throttle, or apply backoff on rate limit errors. When Lobstr returns an HTTP 429, Truto passes the error back to Claude and normalizes the rate limit info into standard headers (ratelimit-limit, ratelimit-remaining, ratelimit-reset). The client AI must handle its own retry logic.
Can Claude wait for a Lobstr run to finish before getting results?
Yes. Lobstr operates asynchronously. Claude will use the get_single_lobstr_run_stat_by_id tool to poll the status of an active run. Once the run's status returns as done, Claude can call list_all_lobstr_results to fetch the scraped data.

More from our Blog