What is the Best Way to Normalize Data Models Across Different CRMs?
Learn the best way to normalize CRM data models across Salesforce, HubSpot, and more. A practical guide to declarative mappings, custom field handling, and unified API architecture.
CRM data normalization is the practice of translating disparate data structures from platforms like Salesforce, HubSpot, Pipedrive, and Close into a single, canonical schema your application can rely on. If you're a product manager evaluating how to ship integrations across multiple CRMs without drowning your engineering team, the short answer is: treat integration behavior as data configuration, not hardcoded logic. This article breaks down exactly why that matters and how to do it.
Why CRM Data Normalization is a Nightmare
The problem starts with a deceptively simple question: "What is a Contact?"
In HubSpot, a Contact is a flat object. Data lives in a properties bag — properties.firstname, properties.lastname, properties.email. Timestamps are tracked via properties.hs_lastmodifieddate. HubSpot doesn't even have a separate Lead object — it uses a Lifecycle Stage property on the Contact to track progression from lead to customer.
Salesforce takes a fundamentally different approach. A Contact is a PascalCase object (FirstName, LastName, Email) that sits inside a sprawling web of standard and custom object relationships. Salesforce does have a separate Lead object, and when a Lead converts to a Contact, the original Lead record is destroyed. Filtering uses SOQL (a SQL dialect), modification tracking is via LastModifiedDate, and custom fields are identified by a __c suffix.
These aren't cosmetic differences. They represent fundamentally different opinions about how to model customer relationships. And every CRM in the market has its own opinion.
The financial stakes are real. Gartner estimates poor data quality costs the average enterprise $12.9 to $15 million annually. Research suggests duplication rates between 10-30% are normal for companies without active data quality programs, and integrations that don't normalize and deduplicate correctly are a primary driver of that problem. Plauti's analysis of 12 billion Salesforce records found 45% were duplicates across organizations, and that rate jumps to 80% for records created via API integrations.
Meanwhile, the integration surface area keeps expanding. Companies used an average of 106 SaaS applications in 2024, with large enterprises running up to 131 different apps. Your product must integrate with an increasingly fragmented software landscape, and your CRM — supposed to be your system of record — corrodes when integration data flows in without proper normalization.
Understanding why schema normalization is the hardest problem in SaaS integrations is the first step. The second is architecting a system that actually solves it.
The Traditional Approaches (And Why They Break)
When faced with schema normalization, engineering teams usually default to one of two flawed architectures.
Point-to-Point Integrations: The Spaghetti Factory
The most common first attempt is building direct integrations. Your team writes a Salesforce adapter, then a HubSpot adapter, then Pipedrive, then Zoho. Each has its own code path:
// This is how most teams start. It doesn't end well.
if (provider === 'hubspot') {
contact.firstName = data.properties.firstname;
contact.email = data.properties.email;
} else if (provider === 'salesforce') {
contact.firstName = data.FirstName;
contact.email = data.Email;
} else if (provider === 'pipedrive') {
contact.firstName = data.first_name;
contact.email = data.email[0].value;
}This works for two CRMs. By the fifth, you're maintaining a growing tangle of conditional logic, each branch with its own pagination strategy, auth flow, rate-limit handling, and undocumented edge cases. Every time a vendor deprecates an endpoint, alters a rate limit, or changes a pagination cursor, you have to write code, review it, and deploy it.
The cost isn't trivial. Adding third-party integrations and custom features to a SaaS platform typically pushes development costs into the $50,000 to $150,000+ range per integration when you account for initial build, testing, and ongoing maintenance. You are essentially paying top-tier engineers to read terrible vendor documentation.
Rigid Unified APIs: The Lowest-Common-Denominator Trap
To escape the maintenance burden of custom code, teams turn to standard unified APIs. The promise is alluring: write to one common data model, and the provider handles the translation.
The problem: most of these platforms implement what amounts to a rigid key-value mapping layer. They map vendor_field_A to unified_field_B and call it done. This works for the 20% of fields that are standard across CRMs. But what about the other 80%?
Every serious Salesforce deployment has custom fields (Lead_Score__c, Region__c, Contract_Value__c). Every HubSpot instance has custom properties. A unified API that strips away custom fields to maintain a clean common model is losing the data your customers care about most.
This is what we call the implementation gap — the distance between what a unified API's common model covers and what your customers actually need. If closing that gap requires filing support tickets and waiting for the provider to update their mappings, you've traded one bottleneck (building integrations) for another (waiting on a vendor). Your customers complain that the integration is missing critical data, and you end up building custom point-to-point integrations anyway.
What is the Best Way to Normalize Data Models Across Different CRMs?
The best approach is a declarative, configuration-driven architecture where:
- A canonical schema defines what your application sees (the unified interface)
- Per-integration mappings define how each CRM's native format translates to and from that schema (the transformation layer)
- A generic execution engine processes both without any integration-specific code
This means adding a new CRM integration or modifying how a field maps is a data operation — not a code deployment. This approach decouples the integration logic from your application runtime, allowing you to standardize the core entities while maintaining the flexibility to handle custom fields dynamically. Here's how to implement each piece.
1. Standardize the Core CRM Trinity
Before writing a single mapping, you need a well-designed canonical data model. For CRMs, this centers on what we call the Core Trinity: Accounts, Contacts, and Opportunities.
| Unified Entity | Salesforce | HubSpot | Pipedrive | Close |
|---|---|---|---|---|
| Account | Account | Company | Organization | Lead (organization-level) |
| Contact | Contact (+ Lead) | Contact | Person | Contact |
| Opportunity | Opportunity | Deal | Deal | Opportunity |
Your canonical schema should model the relationships between these entities, not just the fields:
- Account: The primary container for customer data (the company).
- Contact: The people who work at that Account.
- Opportunity: The active deals being negotiated with that Account.
- Lead: Unqualified prospects that exist independently at the top of the funnel.
When qualified, a Lead is typically converted into a Contact, an Account, and an Opportunity. Then comes the pipeline model: an Opportunity belongs to a specific Pipeline, and its status is defined by a Stage within that pipeline. Users generate Engagements, Notes, and Tasks that are appended to the relevant entity to create a historical timeline.
erDiagram
Account ||--o{ Contact : employs
Account ||--o{ Opportunity : has
Opportunity }o--|| Pipeline : belongs_to
Opportunity }o--|| Stage : current_status
Lead ||--o| Contact : converts_to
Lead ||--o| Account : converts_to
Lead ||--o| Opportunity : converts_to
User ||--o{ Engagement : creates
User ||--o{ Note : creates
User ||--o{ Task : createsGetting this relational model right matters more than getting every field mapped from day one. If your canonical schema has sound entity relationships, you can always add fields later. If the relationships are wrong, every downstream consumer breaks. By standardizing this relationship graph, your application only needs to understand one data shape.
2. Handle Custom Fields and Objects Dynamically
Standardizing the core trinity is the easy part. The real challenge is handling the long tail of custom fields — and it's where most normalization strategies fail.
Salesforce identifies custom fields by the __c suffix. HubSpot stores them as additional keys in the properties object. Pipedrive uses numeric custom field IDs. You need a normalization strategy that handles all three patterns without losing data.
The worst thing you can do is ignore them. Custom fields contain the business-specific data your customers configured their CRM around — deal scoring, contract types, region codes, compliance flags. Dropping them means your integration is only half-useful. Learning how to handle custom fields and custom objects in Salesforce via API highlights why this flexibility is mandatory.
The solution is a multi-layered override system that lets mappings be customized at different levels without requiring your engineers to write custom code:
- Platform level — the default mapping that works for 80% of use cases.
- Environment level — overrides for specific customer deployments (e.g., a customer's Salesforce instance with a non-standard field layout).
- Account level — overrides for individual connected accounts (e.g., one customer's
Contract_Value__cfield maps to a unifieddeal_valuefield).
flowchart TD
A["Platform Base Mapping<br>(default for all customers)"] --> B["Environment Override<br>(customer-specific config)"]
B --> C["Account Override<br>(per-connected-account config)"]
C --> D["Final Resolved Mapping<br>(deep-merged result)"]Each level deep-merges on top of the previous one. At runtime, the engine resolves the final mapping by combining all three layers:
// How overrides are applied at runtime
private mergeIntegrationMappingConfigs(
base: IntegrationMappingMethod,
override?: IntegrationMappingMethod
): IntegrationMappingMethod {
if (!override) return base;
return deepmerge(base, override, { arrayMerge: overwriteMerge });
}This means a customer can add their own custom field mappings to the unified response without anyone modifying source code. No engineering ticket. No deployment.
Always store your mapping overrides as JSON objects in your database. This allows you to hot-swap configurations without restarting your application servers.
This is a design pattern that pays for itself fast — especially when you consider how wildly different custom field implementations are across CRM platforms.
3. Abstract Away Pagination, Auth, and Rate Limits
True data normalization isn't just about JSON field names. It's about normalizing the behavior of every API your system talks to.
Consider pagination. HubSpot's contacts API uses cursor-based pagination with an after parameter. Salesforce uses a completely different cursor format returned in query results. Pipedrive uses offset-based pagination with a start parameter. Your application should not care. It should send a simple limit and cursor parameter to the unified layer, and the integration configuration should translate that into the provider's specific pagination strategy.
The same goes for authentication. Salesforce uses OAuth 2.0 with a specific token-refresh dance and instance-URL-based routing. HubSpot uses OAuth 2.0 with a different token format. Some CRMs still use API keys. Whether an API uses OAuth2 Bearer tokens, Basic Auth, or custom headers, the credential injection should happen at the proxy layer, entirely abstracted from the unified request. You should never be writing token refresh logic inside your data mapping functions.
And then there are rate limits — the silent integration killer. HubSpot enforces both per-second and daily rate limits. Salesforce has API call limits tied to your license tier. A unified API must detect 429 Too Many Requests responses, respect the Retry-After headers (which are formatted differently by every vendor), and handle the backoff transparently.
| API Behavior | Salesforce | HubSpot | Pipedrive |
|---|---|---|---|
| Pagination | SOQL cursor | Cursor (after param) |
Offset (start param) |
| Auth | OAuth 2.0 (instance URL) | OAuth 2.0 (bearer) | API key or OAuth 2.0 |
| Rate Limits | License-tier based | Per-second + daily | Per-second |
| Filtering | SOQL WHERE clause |
filterGroups array |
Query params |
| Timestamps | LastModifiedDate |
properties.hs_lastmodifieddate |
update_time |
A well-designed normalization layer captures all of these behavioral differences as configuration data — not as if/else branches in your codebase. Each integration's config describes its pagination strategy, auth scheme, and rate-limit detection rules. The runtime engine reads this config and executes the appropriate strategy generically.
4. Normalize Inbound Events (Unified Webhooks)
Normalization applies to inbound data just as much as outbound requests. When a third-party service fires a webhook, it hits your endpoint with a proprietary payload. Salesforce sends an outbound message; HubSpot sends a specific JSON structure; Pipedrive sends yet another format.
You must verify the webhook's authenticity (signature validation, JWT verification) and transform the raw payload into a standardized event format. A record:created event for a CRM contact should look identical whether it came from Salesforce or Pipedrive.
This requires a unified webhook receiver that:
- Evaluates expressions against incoming payloads to extract the event type and entity reference
- Enriches the event data by optionally fetching the full resource from the CRM
- Delivers the normalized event to your application in a consistent format
Without this layer, your application ends up with the same spaghetti problem on the inbound side — a tangle of provider-specific webhook handlers, each parsing a different payload format. This is how you build reliable real-time CRM syncs for enterprise.
Code vs. Configuration: The Zero-Integration-Code Architecture
Let's make this concrete. The architectural difference between code-per-integration and configuration-per-integration isn't incremental — it's categorical.
In a code-driven approach, adding a new CRM means writing new handler functions, adding database columns or schemas, sprinkling conditional branches into shared logic, writing integration-specific tests, and going through a full deploy cycle.
In a configuration-driven approach, adding a new CRM means:
- A JSON config describing how to talk to the API (base URL, endpoints, auth scheme, pagination strategy)
- Declarative mapping expressions that transform between the unified schema and the native format
Both are stored as data. No code changes. No deployment.
The mapping layer uses JSONata — a functional query and transformation language for JSON — as the universal transformation engine. There is no if (provider === 'hubspot') anywhere in the runtime. Integration-specific behavior is defined entirely as data using JSONata expressions.
Here's what declarative response mappings look like for a CRM contact across two radically different API shapes:
# Salesforce: flat PascalCase fields
response_mapping: |-
response.{
"id": Id,
"first_name": FirstName,
"last_name": LastName,
"email_addresses": [{ "email": Email }],
"phone_numbers": $filter([
{ "number": Phone, "type": "phone" },
{ "number": MobilePhone, "type": "mobile" }
], function($v) { $v.number }),
"created_at": CreatedDate,
"updated_at": LastModifiedDate,
"custom_fields": $sift($, function($v, $k) { $k ~> /__c$/i and $boolean($v) })
}# HubSpot: nested properties bag
response_mapping: |-
{
"id": response.id,
"first_name": response.properties.firstname,
"last_name": response.properties.lastname,
"email_addresses": [
response.properties.email
? { "email": response.properties.email, "is_primary": true }
],
"phone_numbers": [
response.properties.phone
? { "number": response.properties.phone, "type": "phone" },
response.properties.mobilephone
? { "number": response.properties.mobilephone, "type": "mobile" }
],
"created_at": response.createdAt,
"updated_at": response.updatedAt
}Two completely different API response shapes. Two declarative expressions stored as data. The exact same runtime code processing both. Notice how the Salesforce mapping dynamically extracts custom fields by matching the __c suffix pattern — no hardcoded field list required.
The query mapping layer works the same way. When a unified request comes in to search for contacts, the engine loads the integration mapping from the database. For HubSpot, the query mapping translates unified filter parameters into HubSpot's filterGroups search syntax:
request_body_mapping: >-
rawQuery.{
"filterGroups": $firstNonEmpty(first_name, last_name, email_addresses)
? [{
"filters":[
first_name ? { "propertyName": "firstname", "operator": "CONTAINS_TOKEN", "value": first_name },
last_name ? { "propertyName": "lastname", "operator": "CONTAINS_TOKEN", "value": last_name }
]
}],
"query": search_term
}For Salesforce, the exact same unified request is translated into a SOQL WHERE clause:
query_mapping: >-
(
$whereClause := query
? $convertQueryToSql(
query,
["created_at", "updated_at", "email_addresses"],
{
"created_at": "CreatedDate",
"updated_at": "LastModifiedDate",
"email_addresses": "Email"
}
);
{
"q": query.search_term
? "FIND {" & query.search_term & "} RETURNING Contact(Id, FirstName)",
"where": $whereClause ? "WHERE " & $whereClause
}
)The execution engine simply evaluates the JSONata expression against the incoming request or response. It doesn't know what the expression does. It just executes it.
Why does this matter for PMs? When a customer requests a new CRM integration, the question shifts from "How many sprints will this take?" to "How long does it take to write the config?" In many cases, that's days instead of weeks — and zero risk of breaking existing integrations.
This means adding a new CRM integration is a data operation, not a code operation. You add a JSON config describing the API and JSONata mapping expressions for the unified resources. The same unified API engine that handles 100 integrations today will handle the 101st without a single line of code being changed, compiled, or deployed.
Let's be honest about the trade-offs, though. A configuration-driven system requires upfront investment in building the generic execution engine. The mapping expressions have their own learning curve — JSONata isn't something most engineers know on day one. And edge cases in vendor APIs (undocumented response formats, inconsistent error codes, pagination bugs) still require investigation time, even if the fix is a config change rather than a code change.
The payoff comes at scale. When every integration flows through the same pipeline, bug fixes and improvements apply universally. Improved pagination logic benefits all 100+ integrations at once. An error-handling improvement works for every CRM simultaneously. The maintenance burden grows with the number of unique API patterns, not the number of integrations.
Stop Hardcoding Your CRM Integrations
If you're currently maintaining a pile of if (provider === 'salesforce') branches, or evaluating how to build the CRM integrations your B2B sales team actually asks for, here's the strategic takeaway:
- Design your canonical schema first — get the entity relationships right (Accounts, Contacts, Opportunities, Pipelines, Stages) before obsessing over individual field mappings.
- Build for custom fields from day one — a normalization layer that drops custom data is only solving the easy half of the problem.
- Normalize API behavior, not just data shapes — pagination, auth, rate limiting, webhooks, and error handling all need abstraction.
- Prefer configuration over code — every
if/elsebranch you write for a specific CRM is a maintenance liability that grows linearly with your integration count. - Implement a multi-level override system — so customers can customize mappings for their specific CRM setup without filing engineering tickets.
Data normalization should be handled by a configuration-driven unified API. By treating integration behavior as data rather than code, you decouple your application from the chaotic reality of third-party APIs. Your engineering team should be building your core product, not reading terrible API documentation.
FAQ
- What is CRM data normalization?
- CRM data normalization is the process of translating disparate data models from different CRM platforms (Salesforce, HubSpot, Pipedrive, etc.) into a single, canonical JSON schema. It abstracts away provider-specific endpoints, field naming conventions, and API behaviors so developers can interact with one standard data model.
- What are the biggest differences between Salesforce and HubSpot data models?
- Salesforce uses PascalCase fields, a separate Lead/Contact object split, SOQL for filtering, and a `__c` suffix for custom fields. HubSpot uses a flat `properties` object, no separate Lead entity (using a Lifecycle Stage property instead), `filterGroups` for search, and stores custom fields alongside standard properties.
- Why do rigid unified APIs fail with custom CRM fields?
- Rigid unified APIs rely on static 1-to-1 key-value mapping that only covers standard fields. When a customer uses custom fields or objects unique to their CRM instance, the rigid abstraction drops that data, forcing engineers to build custom point-to-point integrations anyway.
- How do you handle custom fields when normalizing CRM data?
- Use a multi-level override system that allows mappings to be customized at platform, environment, and individual account levels. Each level deep-merges on top of the previous one, so customers can add their own custom field mappings without requiring code changes or deployments.
- How does JSONata help with API integrations?
- JSONata is a functional query and transformation language for JSON that acts as a universal transformation engine. It lets you map complex JSON responses and queries declaratively without writing hardcoded execution logic, making integration behavior a data operation rather than a code deployment.