How to Architect a Bidirectional HubSpot Sync (Without Infinite Loops)
A technical guide to architecting two-way HubSpot sync that handles strict API rate limits, prevents infinite webhook loops, and normalizes HubSpot's nested schema without custom code.
Syncing customer data bidirectionally between your application and HubSpot means both systems can create, update, and delete records, with changes propagating in both directions without data loss, duplication, or infinite loops. If you've shipped (or attempted to ship) this, you already know the hard parts aren't the HTTP calls. It's the rate limits that throttle you mid-sync, the webhook events that arrive late and out of order, and the nested property schema that makes field mapping feel like translating between two different languages.
This guide breaks down the exact technical challenges of two-way HubSpot integration, the architectural patterns that prevent the most common failures, and how a declarative, data-driven approach can replace thousands of lines of custom integration code.
The Enterprise Demand for Bidirectional HubSpot Sync
One-way sync — pushing data into HubSpot — is table stakes. The real business value comes from bidirectional sync: your product writes engagement signals, usage metrics, and enrichment data into HubSpot while simultaneously pulling deal updates, lifecycle stage changes, and rep activity back into your app.
Why does this matter so much? Because stale CRM data is a revenue problem, not just an annoyance. Poor data quality costs organizations at least $12.9 million a year on average, according to Gartner research. CRM activity accounts for roughly 18% of rep time, and with up to 25% of that time wasted on bad or disorganized data, a business can lose hundreds of thousands of dollars in productivity annually. In one survey of over 1,250 companies, 44% estimated they lose over 10% in annual revenue due to poor-quality CRM data.
When your product captures a lead score change, a meeting outcome, or a usage milestone and that data doesn't land in HubSpot before the next rep picks up the phone, it's a missed opportunity. And when a deal stage change in HubSpot doesn't make it back into your product in time to trigger an automated workflow, your customer experience suffers silently.
Traditional batch ETL creates a fundamental delay. For fast-moving enterprise sales cycles, a six-hour sync interval is an eternity. You need real-time, event-driven synchronization.
Real-time is not the same as instant. Between CRM event delivery, your internal message queue, retries, and rate-limit backoff, "real-time" usually means seconds, sometimes minutes, and occasionally "we'll reconcile later." If your stakeholders expect a hard SLA of 200ms, reset expectations early.
The "Vampire Record" Problem: Why Two-Way Syncs Create Infinite Loops
The moment both systems can write to the same record, you open the door to infinite loops. The mechanics are deceptively simple:
- A sales rep updates a contact's phone number in HubSpot.
- HubSpot fires a
contact.propertyChangewebhook to your app. - Your app processes the webhook and updates the corresponding record in your database.
- Your sync logic detects the internal change and pushes it back to HubSpot.
- HubSpot detects the update and fires another webhook.
- Go to step 3. Repeat forever.
These are called "vampire records" — data entries that bounce between systems indefinitely, feeding on your API quota without ever settling. For a deeper treatment of this problem and the loop-prevention patterns that work at scale, see our architect's guide to bi-directional API sync.
sequenceDiagram
participant Rep as Sales Rep
participant HS as HubSpot
participant App as Your App
participant DB as Your Database
Rep->>HS: Updates phone number
HS->>App: Webhook: contact.propertyChange
App->>DB: Update local record
DB->>App: Change detected
App->>HS: PATCH /contacts/{id}
HS->>App: Webhook: contact.propertyChange
Note over App,HS: Infinite loop beginsWhy Naive Fixes Fail
The "just compare timestamps" approach fails in practice. HubSpot's hs_lastmodifieddate has second-level granularity, and network latency means your write and the subsequent webhook can land within the same second. You need a combination of strategies:
- Origin tagging: Stamp every outbound write with a source identifier (e.g., a custom property like
_sync_source). When a webhook arrives, check if the change originated from your system and skip it if so. - Write receipts: Maintain a short-lived cache of recent writes (record ID + field + value). If an inbound webhook matches a write you just made, suppress it.
- Idempotent upserts: Design every write operation so applying it twice produces the same result. This turns accidental re-delivery from a data corruption risk into a no-op.
HubSpot's Webhook Delivery Model Makes This Harder
HubSpot's delivery model punishes sloppy handlers. HubSpot can send up to 100 events in a single webhook request, uses a default concurrency limit of 10 in-flight webhook requests per installed account, and retries failed notifications up to 10 times. HubSpot also expects a webhook batch to be acknowledged within 5 seconds. If you do slow work before returning 2xx, you create backlog, retries, and duplicate processing.
That's why the inbound path should be boring: validate the signature, persist the raw payload, respond fast, and let a worker do the expensive work later. As we've covered in our lessons on designing reliable webhooks, HubSpot's current request-validation guidance uses the X-HubSpot-Signature-V3 header with HMAC SHA-256. Requests older than 5 minutes should be rejected.
app.post('/hubspot/webhooks', rawBodyMiddleware, async (req, res) => {
if (!isValidHubSpotSignatureV3(req)) {
return res.status(401).end();
}
const batchId = await webhookStore.append(req.body);
res.status(204).end();
for (const event of req.body) {
await queue.enqueue('hubspot-contact-change', {
batchId,
objectId: event.objectId,
eventType: event.subscriptionType,
occurredAt: event.occurredAt
});
}
});That handler is intentionally boring. The real loop prevention lives in state you control: a dedupe key for inbound events, a fingerprint of the last applied remote state, an outbound write journal, and field ownership rules.
HubSpot fires multiple webhook events for a single business action. When creating a deal, you might receive the creation event plus separate property change notifications for each field. These often arrive out of order. Your deduplication strategy must handle this.
Navigating HubSpot API Rate Limits and Pagination
HubSpot's rate limiting is layered, and the layers interact in ways that will surprise you if you only read the top-level docs.
The Rate Limit Landscape
| App Type | Burst Limit | Daily Limit |
|---|---|---|
| Public OAuth apps | 110 requests / 10 seconds | Varies by customer tier |
| Private apps (Pro/Enterprise) | 190 requests / 10 seconds | 650K–1M requests/day |
| CRM Search API | 5 requests / second (account-wide) | Shared with daily quota |
Here's the part that bites: the CRM Search API burst limit is 5 requests per second, and that cap is shared by everything under one HubSpot account. If your app and two other integrations are all hitting the Search endpoint, you're sharing 5 requests per second across all of them. This is an account-level ceiling, not a per-app one.
Search responses also do not include the standard X-HubSpot-RateLimit-* headers, so you need your own pacing logic. Treat search as a separate budget, not as spare capacity from your core CRUD calls.
Why This Matters for Bidirectional Sync
A bidirectional sync has two hot paths that both consume API quota:
- Inbound (HubSpot → your app): Webhooks reduce polling, but you still need to fetch full records after receiving a webhook payload (which only contains the changed property, not the full object). That's at least one GET per webhook event.
- Outbound (your app → HubSpot): Every create, update, or upsert is a write call. Batch endpoints help (up to 100 records per batch call), but you still burn quota.
When you combine these with search calls for deduplication lookups, you can hit the daily or burst limits faster than you'd expect — especially during initial backfill or end-of-quarter pipeline pushes.
Surviving the Limits
Batch aggressively. HubSpot's batch read and batch upsert endpoints handle up to 100 records per call. A sync that naively does record-by-record operations burns 100x more quota than one that batches intelligently. HubSpot also supports up to 200 records per Search API response, so use larger page sizes.
Implement exponential backoff with jitter. When you hit a 429, a naive fixed delay causes all throttled workers to wake up simultaneously and immediately hit the limit again. Randomize your retry intervals. And be aware of the escalation: after hitting standard rate limits, if a system makes 10 requests resulting in 429 errors within a second, it gets blocked for one full minute.
Handle 423 Locked responses. During high-volume write bursts, HubSpot can return 423 Locked and recommends waiting at least 2 seconds between requests when you hit that path.
Partition queries for large datasets. HubSpot's Search API enforces a hard cap of 10,000 total results per query, with search request bodies limited to 3,000 characters. If your customer has 50,000 contacts, a paginated query will fail after hitting the ceiling. Break the sync into smaller chunks by querying specific date ranges (e.g., createdate BETWEEN X AND Y).
Don't rely on search for write verification. Search can lag recently created or updated objects. If you need immediate confirmation of a write, read the object directly rather than searching for it.
Webhook calls made via HubSpot workflows do not count toward the API rate limit. This is a useful escape hatch, but it only applies to workflow-triggered webhooks, not the Webhooks API subscriptions you'd typically use for a bidirectional sync.
For a deeper dive into handling real-time CRM sync at enterprise scale, read our guide to architecting real-time CRM syncs.
Data Mapping: The Hidden Complexity of HubSpot's API
If you've only worked with flat REST APIs, HubSpot's CRM object schema will feel unfamiliar. It's not bad — it's just different in ways that require explicit handling at your integration boundary.
HubSpot's Nested Properties Model
HubSpot doesn't return contact fields at the top level. Everything lives inside a properties object:
{
"id": "501",
"properties": {
"firstname": "Jane",
"lastname": "Chen",
"email": "jane@acme.com",
"hs_additional_emails": "jane.chen@personal.com;jchen@old-company.com",
"phone": "+1-415-555-0100",
"mobilephone": "+1-415-555-0101",
"hs_whatsapp_phone_number": "+1-415-555-0102"
},
"createdAt": "2024-01-15T10:30:00Z",
"updatedAt": "2025-06-20T14:15:00Z"
}Compare this to Salesforce, which returns flat PascalCase fields:
{
"Id": "003xxx",
"FirstName": "Jane",
"LastName": "Chen",
"Email": "jane@acme.com",
"Phone": "+1-415-555-0100",
"MobilePhone": "+1-415-555-0101",
"CreatedDate": "2024-01-15T10:30:00Z"
}This isn't just a cosmetic difference. It affects every part of your sync pipeline:
- Email parsing: HubSpot stores additional emails as a semicolon-delimited string in
hs_additional_emails. You need to split and normalize that into an array. - Phone numbers: HubSpot uses separate properties for
phone,mobilephone, andhs_whatsapp_phone_number. Salesforce has six phone fields (Phone,Fax,MobilePhone,HomePhone,OtherPhone,AssistantPhone). Your unified model needs to absorb both. - Custom fields: In HubSpot, custom fields live alongside standard fields in the
propertiesobject. In Salesforce, they're distinguished by a__csuffix. Identifying which fields are custom requires different logic per CRM.
The filterGroups Headache
Searching and filtering in HubSpot is a world apart from standard query parameters. You can't just pass ?email=jane@acme.com. You must issue a POST request to the search endpoint with a nested filterGroups payload:
{
"filterGroups": [
{
"filters": [
{ "propertyName": "email", "operator": "EQ", "value": "jane@acme.com" },
{ "propertyName": "lastname", "operator": "CONTAINS_TOKEN", "value": "Chen" }
]
}
]
}Salesforce achieves the same thing with SOQL:
SELECT Id, FirstName, LastName FROM Contact
WHERE Email = 'jane@acme.com' AND LastName LIKE '%Chen%'Neither is inherently better, but they're completely incompatible. If you're building a product that syncs with both, you need separate query-construction logic for each. Multiply that by every CRM your customers ask for, and you start to see why integration teams drown in vendor-specific code. For a comprehensive look at why this is the hardest problem in SaaS integrations, see our deep dive on schema normalization.
HubSpot-Specific Gotchas That Will Corrupt Your Data
Beyond the schema structure, HubSpot has behavioral quirks that will silently wreck your sync if you're not careful:
- Lifecycle stage only moves forward. You cannot set
lifecyclestagebackward unless you clear the existing value first. Blind last-write-wins logic will silently fail or corrupt pipeline state. - Batch upsert by email has limitations. HubSpot supports batch upsert using
emailas theidProperty, but partial upserts are not supported in that mode. If you control a stable external identifier, model it as a custom unique property and upsert against that instead. - Dynamic endpoint routing. The same logical operation can route to completely different API endpoints depending on parameters. Listing contacts might hit
/crm/v3/objects/contacts(basic list),/crm/v3/objects/contacts/search(when filters are present), or/contacts/v1/lists/{listId}/contacts/all(when querying a specific view). Your integration needs to inspect query parameters and route dynamically — another area where hardcoded, per-integration code creates a maintenance burden that scales linearly.
Do not make lifecyclestage a blind last-write-wins field. HubSpot only lets you move it forward unless you clear it first. If your app also computes lifecycle state, you need explicit ownership or a conflict resolution workflow.
Use a custom unique identifier if you can. HubSpot allows batch upsert by email, but partial upserts by email are not supported. A customer ID or external contact ID gives you cleaner semantics and fewer nasty surprises.
How to Architect a Loop-Free Bidirectional Sync
Given everything above — rate limits, infinite loops, schema mismatches, dynamic routing — here's an architecture that actually works.
The Core Pattern: Event Bus + Origin Tagging + Declarative Mapping
flowchart LR
subgraph Inbound["Inbound Path"]
WH[HubSpot Webhook] --> Q1[Event Queue]
Q1 --> OC{Origin Check}
OC -->|External change| MAP1[Response Mapping]
OC -->|Echo from us| SKIP[Skip/Discard]
MAP1 --> DB[(Your Database)]
end
subgraph Outbound["Outbound Path"]
DB --> CDC[Change Detection]
CDC --> OC2{Origin Check}
OC2 -->|Internal change| MAP2[Query/Body Mapping]
OC2 -->|Echo from HubSpot| SKIP2[Skip/Discard]
MAP2 --> BATCH[Batch + Rate Limit]
BATCH --> HS[HubSpot API]
endKey components:
1. Event queue. Never process webhooks synchronously. The receiver should validate auth, do minimal schema checks, persist the raw payload, and return 2xx quickly. Transformations and downstream calls happen asynchronously. This prevents timeouts and keeps HubSpot's retry logic from over-firing.
2. Origin check. Every record in your database carries a last_sync_source field. When processing an inbound webhook, compare the changed fields against your last outbound write. If they match, it's an echo — discard it.
3. Field ownership rules. Not every field should be bidirectional. A good default:
- HubSpot-owned: owner, lifecycle stage, lead status
- App-owned: product plan, usage metrics, health score
- Conditionally bidirectional: name, phone, job title, company association
Once ownership is set, use compare-before-write on every outbound change. Fetch the current remote values, compute a fingerprint over the normalized fields, and skip the write if nothing material changed. That single habit cuts loopbacks and saves rate-limit budget.
4. Rate-limited outbound queue. A token bucket rate limiter sits in front of all outbound HubSpot calls, shared across every sync operation for that account. This prevents burst violations even when multiple sync jobs run concurrently.
5. Reconciliation cron. Webhooks are fire-and-forget. If your server is down, HubSpot retries a few times, then drops the payload. Implement a periodic reconciliation job that fetches all records where lastmodifieddate > last_sync_timestamp. This ensures dropped webhooks don't result in permanently out-of-sync data. For stronger replay and offset control at higher scale, HubSpot's newer V4 Webhooks Journal API supports journal files, offsets, and up to 3 days of historical changes.
Compare-Before-Write in Practice
Your sync journal should track more than local ID and remote ID. In practice you want: last applied inbound fingerprint, last outbound fingerprint, source system of the last accepted write, cursor or checkpoint position for backfills, per-field ownership policy, and raw vendor request IDs for debugging.
Here's what the inbound path looks like:
async function applyInboundContactChange(event: HubSpotEvent) {
const dedupeKey = `${event.subscriptionType}:${event.objectId}:${event.occurredAt}`;
if (await dedupeStore.seen(dedupeKey)) return;
const remote = await hubspot.getContact(event.objectId, CONTACT_PROPERTIES);
const unified = normalizeHubSpotContact(remote);
const fingerprint = stableHash(unified);
const prior = await syncJournal.getByRemoteId(remote.id);
if (prior?.lastAppliedFingerprint === fingerprint) return;
await appContacts.upsert(unified);
await syncJournal.save({
remoteId: remote.id,
localId: unified.id,
lastAppliedFingerprint: fingerprint,
lastSource: 'hubspot'
});
}The outbound path is the mirror image. Map the app record into HubSpot's shape, compare it to the last known remote state, write only the delta, and store the outbound fingerprint. When the echo webhook arrives, your worker sees a matching fingerprint and stops. That's the core pattern for preventing infinite loops without hardcoding vendor-specific branch logic everywhere.
Separate real-time sync from repair sync. Real-time handles fresh changes through webhooks and small writes. Repair sync sweeps updatedAt windows, replays failures, and fixes drift that every production system accumulates eventually. If you skip the repair path, you're not building sync — you're building a demo.
The "Data, Not Code" Principle
The biggest architectural mistake teams make is writing custom code for each integration. HubSpot sync becomes one module. Salesforce sync becomes another. Pipedrive a third. Each with its own mapping logic, pagination handling, and error recovery.
The better pattern is a generic execution pipeline where every integration difference is captured in configuration data:
| Concern | Code-per-integration | Data-driven |
|---|---|---|
| Field mapping | Custom transformer function | JSONata/template expression |
| Query construction | Custom query builder | Declarative query mapping |
| Endpoint routing | if/else branches | Expression-based resource resolution |
| Pagination | Custom cursor handling | Declarative pagination config |
| Error handling | Per-API error parsing | Normalized error mapping |
With a data-driven approach, adding a new CRM means adding configuration rows, not code. The execution engine stays the same.
How Truto's Architecture Handles HubSpot Without Custom Code
This is where Truto's approach becomes relevant — not as a silver bullet, but as a concrete implementation of the "data, not code" pattern described above.
Truto's unified API handles HubSpot (and Salesforce, and Pipedrive, and dozens of other CRMs) through a single generic execution pipeline. Every difference between HubSpot and other CRMs is captured in declarative mapping configurations — not in integration-specific code branches.
What This Looks Like in Practice
A caller makes a single request:
GET /unified/crm/contacts?integrated_account_id=abc123&limit=10
The caller doesn't know or care whether abc123 is a HubSpot account or a Salesforce account. Under the hood, Truto's pipeline reads the mapping configuration for the connected CRM and handles everything: constructing filterGroups for HubSpot, building SOQL for Salesforce, routing to the correct endpoint, extracting and normalizing fields, and producing a unified response.
Endpoint routing is handled declaratively:
resource:
resources:
- contacts # Default: list all contacts
- contacts-search # When filter params are present
expression: >
$firstNonEmpty(rawQuery.first_name, rawQuery.last_name) ? 'contacts-search'
: 'contacts'Query translation from your unified filters into HubSpot's filterGroups arrays happens through data configuration, not custom code:
request_body_mapping: >-
rawQuery.{
"filterGroups": $firstNonEmpty(first_name, last_name, email_addresses)
? [{
"filters":[
first_name ? { "propertyName": "firstname", "operator": "CONTAINS_TOKEN", "value": first_name },
email_addresses ? { "propertyName": "email", "operator": "IN",
"values": [$firstNonEmpty(email_addresses.email, email_addresses)] }
]
}],
"query": search_term
}And response normalization uses JSONata to flatten HubSpot's nested structure into a canonical model:
{
"id": response.id,
"first_name": response.properties.firstname,
"last_name": response.properties.lastname,
"email_addresses": [
response.properties.email
? { "email": response.properties.email, "is_primary": true },
response.properties.hs_additional_emails
? response.properties.hs_additional_emails.$split(";").{ "email": $ }
],
"phone_numbers": [
response.properties.phone
? { "number": response.properties.phone, "type": "phone" },
response.properties.mobilephone
? { "number": response.properties.mobilephone, "type": "mobile" }
],
"created_at": response.createdAt,
"updated_at": response.updatedAt
}The equivalent mapping for Salesforce is a different JSONata expression, but the engine that evaluates it is identical. Zero branching on integration name. Zero integration-specific code. The remote_data field in Truto's response preserves the original HubSpot response, so you never lose access to provider-specific data when you need it.
| HubSpot Challenge | How Truto Handles It |
|---|---|
Nested properties object |
JSONata response mapping extracts and flattens fields |
hs_additional_emails as semicolon string |
Mapping expression splits and normalizes to array |
filterGroups search syntax |
Declarative query/body mapping constructs the correct payload |
| Dynamic endpoint routing | Expression-based resource resolution picks the right endpoint |
| Rate limiting | Built-in rate limiter respects per-account burst and daily limits |
| Webhook echoes and retries | Generic sync journal and dedupe logic |
For more on how this zero-code architecture works under the hood, see Look Ma, No Code! Why Truto's Zero-Code Architecture Wins.
Honest Trade-offs
A unified API is not the right answer for every scenario:
- Deeply custom HubSpot workflows: If you need to orchestrate multi-step sequences using HubSpot-specific features (enrollment triggers, custom code actions), a unified API won't replace that. You'll need direct HubSpot API access.
- Real-time latency under 500ms: A unified API adds a translation layer. If your use case demands sub-second latency for every request, the overhead matters. For most sync workloads — where "real-time" means seconds to minutes — it's a non-issue.
- Enterprise escape hatches: Enterprise customers will ask for HubSpot-only objects, unusual custom properties, or one-off workflows that don't fit a common model cleanly. A good platform needs a pass-through or proxy path for that long tail. If a vendor tells you the common model covers everything, be skeptical. Real integrations are messier than the brochure.
The strongest use case for a unified API is when you need to support multiple CRMs simultaneously and your team's time is better spent on product features than on per-integration code. As we discussed in our guide to building native CRM integrations without draining engineering, a team of two engineers can ship HubSpot, Salesforce, and Pipedrive integrations in a week instead of a quarter.
What to Ship This Week
If you're staring down a bidirectional HubSpot integration, ship in this order:
-
Define your canonical model and field ownership matrix. Decide which fields are HubSpot-owned, app-owned, and conditionally bidirectional before you write any sync code.
-
Implement webhook ingestion with signature validation, queueing, and dedupe. Subscribe to
contact.propertyChangeanddeal.propertyChangeevents. ValidateX-HubSpot-Signature-V3. Acknowledge fast, process later. -
Build the outbound write path with batching and compare-before-write. Use HubSpot's batch upsert endpoints. A sync that processes 1,000 contacts should make ~10 API calls, not 1,000.
-
Add a cursor-based backfill and reconciliation job. Run it every 15 minutes as a safety net to catch anything webhooks miss. Store the
aftercursor and page forward in checkpointed windows. -
Abstract your mapping layer early. Even if you only support HubSpot today, write your field mappings as data (JSON config, JSONata expressions, or similar). When product asks you to add Salesforce next quarter, you'll add configuration instead of rewriting code.
-
Evaluate a unified API if you're supporting 3+ CRMs. The engineering cost of maintaining custom sync code for each CRM scales linearly. A unified API like Truto flattens that curve by handling vendor-specific translation in a shared execution pipeline.
That sequence is boring on purpose. Fancy workflow automation can wait. First make the sync trustworthy.
The difference between a HubSpot integration that works and one that works at scale is the architecture underneath it. Get the loop prevention, rate limit handling, and mapping layer right, and everything else follows.
FAQ
- What are HubSpot's current API rate limits?
- HubSpot OAuth public apps are limited to 110 requests every 10 seconds per account. Private apps on Professional or Enterprise plans get 190 requests per 10 seconds and up to 1 million requests per day. The CRM Search API has a separate, stricter limit of 5 requests per second shared at the account level across all integrations. Search responses also lack standard rate-limit headers, so you need your own pacing logic.
- How do you prevent infinite loops in a bidirectional HubSpot sync?
- Use origin tagging to stamp every outbound write with a source identifier. When a webhook arrives, check whether the change originated from your system and skip it if so. Combine this with a compare-before-write fingerprint (hashing incoming payloads against stored state) and idempotent upserts to eliminate echo writes without hardcoding vendor-specific logic.
- Can HubSpot webhooks replace polling completely?
- No. Webhooks are the fast path, not the only correctness path. HubSpot batches up to 100 events per request, retries failed notifications up to 10 times, and delivery is at-least-once. You should always pair webhooks with a periodic reconciliation job to catch dropped or missed events and fix data drift.
- Why is HubSpot's data mapping harder than other CRMs?
- HubSpot nests all fields inside a properties object, uses semicolon-delimited strings for multi-value fields like additional emails, requires filterGroups arrays for search queries, and restricts lifecycle stage to forward-only movement. Other CRMs like Salesforce use flat PascalCase fields and SQL-like query languages, making the two schemas completely incompatible.
- When does a unified API make sense for CRM sync?
- When you need the same sync behavior across multiple CRMs, want a common customer model for your product code, and your team's time is better spent on features than per-integration maintenance. If HubSpot is the only CRM you will ever support and you need deep HubSpot-only features everywhere, a custom build can still be rational.