Salesforce Migration Retry Logic, Idempotency & Dead-Letter Queue Patterns

In any large-scale data migration, partial failures are not edge cases; they are expected events. Network timeouts, governor limit exhaustion, validation rule rejections, and downstream system unavailability all contribute to scenarios where individual records or entire batches fail to process. A migration framework without a hardened retry-and-recovery mechanism will inevitably produce data gaps that are expensive and time-consuming to detect and remediate.

This article walks through four patterns that, together, form the resilience layer of a migration: a failure taxonomy that tells you what you’re dealing with, exponential back-off with jitter for retrying transient errors, External ID upsert for idempotency, and the Dead-Letter Queue plus checkpoint-and-resume combination that makes sure nothing gets silently lost.

The Anatomy of Migration Failures

Failure Category	Root Cause	Detection Signal	Recovery Strategy
HTTP 400 (Bad Request)	Invalid field value, missing required field	failedResults CSV: sf__Error column	Data cleansing + re-submit corrected rows
HTTP 401 (Unauthorized)	Expired or revoked access token	401 response on any API call	Re-authenticate via OAuth, retry the request
HTTP 403 (Forbidden)	Insufficient field/object permissions	403 response body: errorCode	Adjust integration user profile/permissions
HTTP 429 (Too Many Reqs)	API limit exhausted	429 + Retry-After header	Honour Retry-After, implement back-off
HTTP 503 (Service Unavail)	Concurrent request limit exceeded	503 response body	Exponential back-off, reduce concurrency
Partial Job Success	Some records failed within a job	numberRecordsFailed > 0	Extract failedResults, fix, re-submit as new job
Duplicate Records	Missing idempotency key (upsert)	Duplicate Alert in org	Use External ID upsert instead of insert

Exponential Back-off with Jitter

The cornerstone of any resilient retry mechanism is exponential back-off with jitter. Pure exponential back-off (where the wait time doubles with each attempt) can cause thundering herd problems when many parallel jobs retry simultaneously. Adding randomised jitter distributes retry attempts across time, reducing the likelihood that all retrying clients hit the server at the same time.

				
					// Exponential Back-off with Full Jitter: Production Implementation
class RetryEngine {
  constructor(options = {}) {
    this.maxAttempts     = options.maxAttempts  ?? 7;
    this.baseDelayMs     = options.baseDelayMs  ?? 1000;   // 1s base
    this.maxDelayMs      = options.maxDelayMs   ?? 120000; // 2-minute ceiling
    this.retryableStatus = new Set([408, 429, 500, 502, 503, 504]);
  }

  // Full-jitter delay: random(0, min(cap, base * 2^attempt))
  computeDelay(attempt) {
    const exponential = this.baseDelayMs * Math.pow(2, attempt);
    const capped = Math.min(this.maxDelayMs, exponential);
    return Math.floor(Math.random() * capped);
  }

  async execute(fn, label = 'operation') {
    let lastError;
    for (let attempt = 0; attempt < this.maxAttempts; attempt++) {
      try {
        return await fn();
      } catch (err) {
        lastError = err;
        const status = err.response?.status;
        if (status && !this.retryableStatus.has(status)) throw err; // Non-retryable
        if (attempt === this.maxAttempts - 1) break;

        const delay = this.computeDelay(attempt);
        const retryAfter = err.response?.headers?.['retry-after'];
        const waitMs = retryAfter ? parseInt(retryAfter) * 1000 : delay;
        console.warn(`[Retry] ${label} attempt ${attempt + 1} failed (${status}).`,
                     `Retrying in ${waitMs}ms...`);
        await new Promise(r => setTimeout(r, waitMs));
      }
    }
    throw new Error(`[Retry] ${label} exhausted all ${this.maxAttempts} attempts.`);
  }
}

Idempotency: Preventing Duplicate Records via External ID Upsert

Idempotency guarantees that repeating the same operation multiple times produces the same outcome as performing it once. In the context of Salesforce data migration, the primary mechanism for achieving idempotency is the Upsert operation using an External ID field. An External ID is a custom field on a Salesforce object that is indexed, unique, and flagged as an external ID in the org schema, enabling Salesforce to match an incoming record to an existing one and update it rather than creating a duplicate.

				
					Upsert Idempotency Flow: External ID Matching Logic

  CSV Row: External_ID__c=EXT-00123, Name='Acme Corp', Industry='Technology'
                       |
                       v
       +----------------------------------+
       |   Salesforce Bulk API 2.0        |
       |   Operation: upsert              |
       |   externalIdFieldName:           |
       |       External_ID__c             |
       +---------------+------------------+
                       |
      Does a record with External_ID__c = 'EXT-00123' exist?
                       |
          +------------+-----------+
          |  YES                   |  NO
          v                        v
    UPDATE existing           INSERT new record
    record in-place           with External_ID__c
    (no duplicate created)    = 'EXT-00123'
          |                        |
          +------------+-----------+
                       v
       sf__Id, sf__Created in successfulResults.csv
       (sf__Created = true  --> new insert performed)
       (sf__Created = false --> existing record updated)

Dead-Letter Queue Pattern for Failed Records

Records that fail repeatedly despite retry attempts must be captured in a Dead-Letter Queue (DLQ), a persistent store of unprocessed records along with their failure reasons. The DLQ serves as both a recovery mechanism and an audit artefact. Operations teams can inspect DLQ contents to identify systemic data quality issues, perform manual corrections, and re-inject corrected records into the migration pipeline.

				
					// Dead-Letter Queue Implementation
class DeadLetterQueue {
  constructor(dlqPath) {
    this.dlqPath = dlqPath;
    if (!fs.existsSync(dlqPath)) fs.mkdirSync(dlqPath, { recursive: true });
  }

  async enqueue(jobId, failedRecords) {
    const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
    const filename  = `dlq_${jobId}_${timestamp}.json`;
    const entry = {
      enqueuedAt  : new Date().toISOString(),
      sourceJobId : jobId,
      recordCount : failedRecords.length,
      records     : failedRecords.map(r => ({
        externalId : r['External_ID__c'],
        errorCode  : r['sf__Error'],
        rawRow     : r,
        retryCount : 0,
        status     : 'PENDING_REVIEW',
      })),
    };
    fs.writeFileSync(`${this.dlqPath}/${filename}`, JSON.stringify(entry, null, 2));
    console.log(`[DLQ] Enqueued ${failedRecords.length} records to ${filename}`);
  }
}

Checkpoint and Resume Pattern

For migrations spanning multiple hours, a checkpoint-and-resume pattern is essential. The migration engine should periodically persist its progress state so that a crash or planned maintenance window does not require restarting from the beginning. This pattern, combined with the DLQ, ensures that no record is ever silently lost.

				
					// Migration Checkpoint Manager
class CheckpointManager {
  constructor(checkpointFile) {
    this.file  = checkpointFile;
    this.state = this._load();
  }

  _load() {
    try { return JSON.parse(fs.readFileSync(this.file, 'utf8')); }
    catch { return { completedChunks: [], lastUpdated: null }; }
  }

  markChunkComplete(chunkIndex, jobId, recordCount) {
    this.state.completedChunks.push({ chunkIndex, jobId, recordCount,
      completedAt: new Date().toISOString() });
    this._persist();
  }

  isChunkComplete(chunkIndex) {
    return this.state.completedChunks.some(c => c.chunkIndex === chunkIndex);
  }

  _persist() {
    this.state.lastUpdated = new Date().toISOString();
    fs.writeFileSync(this.file, JSON.stringify(this.state, null, 2));
  }
}

// Usage in migration loop:
for (let i = 0; i < chunks.length; i++) {
  if (checkpoint.isChunkComplete(i)) {
    console.log(`[Skip] Chunk ${i} already completed: resuming from next`);
    continue;
  }
  const job = await processChunk(chunks[i]);
  checkpoint.markChunkComplete(i, job.id, chunks[i].length);
}

How These Four Patterns Fit Together

It’s worth being explicit about why all four of these need to exist together, because each one only covers part of the failure surface:

The failure taxonomy tells you whether a failure is worth retrying at all.
Exponential back-off with jitter handles the retryable failures without overwhelming the API.
External ID upsert makes every retry safe — no duplicates, regardless of what partially succeeded before.
The DLQ and checkpoint-and-resume combination ensures that records that can’t be fixed automatically are never lost, and that a process failure doesn’t mean starting over.

Remove any one of these, and a gap opens. Retries without idempotency create duplicates. Idempotency without a DLQ means permanently failing records vanish without anyone noticing. A DLQ without checkpointing means a crash mid-migration still forces a full restart. They’re a set, not a menu.

What’s Next

With resilience covered, the next article in this series turns to a different concern entirely: ensuring the migration itself meets the regulatory and governance requirements that enterprise Salesforce orgs operate under. That means GDPR and CCPA data handling obligations, where Salesforce Shield’s Platform Encryption and Event Monitoring fit into a migration project, and how to set up the audit trail that proves to auditors, not just to your own team, that the migration was handled correctly.

Continue Reading: Salesforce Data Migration Mastery Series

The resilience patterns in this article assume the rest of the migration pipeline is already in place. If you’re building that pipeline from scratch or want to see how the pieces connect, here’s where each part fits:

Part	Article	How it connects to this one
1	Salesforce Data Migration: Org Readiness Assessment & What Most Teams Get Wrong	Where External ID fields get created — the foundation the upsert idempotency pattern in §5.3 depends on
2	OAuth 2.0, Named Credentials & Connected Apps: Building a Secure Salesforce Migration Architecture	Why a properly configured auth stack means 401 errors rarely need special handling in your retry engine
3	Salesforce Bulk API 2.0: A Complete Developer Guide to High-Volume Data Migration	The job lifecycle and parallelisation strategy that the RetryEngine and CheckpointManager in this article wrap around
4	Salesforce REST API Integration Patterns for Data Migration: Composite API, SObject Tree & SOQL	The SOQL validation queries that catch anything a DLQ entry didn't get to before go-live

Kiran Sreeram Prathi

Sr. Salesforce Developer – kiransreeram8@live.com

I’m Kiran Sreeram Prathi, a Salesforce Developer dedicated to building scalable, intelligent, and user-focused CRM solutions. Over the past five years, I’ve delivered Salesforce implementations across healthcare, finance, and service industries—focusing on both technical precision and user experience. My expertise spans Lightning Web Components (LWC), Apex, OmniStudio, and Experience Cloud, along with CI/CD automation using GitHub Actions and integrations with platforms such as DocuSign, Conga, and Zpaper. I take pride in transforming complex workflows into seamless digital journeys and implementing clean DevOps strategies that reduce downtime and accelerate delivery. Recognized by organizations like Novartis, WILCO, and Deloitte, I enjoy solving problems that make Salesforce work smarter and scale better. I’m always open to connecting with professionals who are passionate about process transformation, architecture design, and continuous innovation in the Salesforce ecosystem.

What's Hot

Building Resilient Salesforce Integrations: 5 Lessons Learned Supporting Enterprise Telecom Operations

From Salesforce Admin to Business Analyst: The Exact Skills to Add (and What to Stop Doing)

How to Think Like a Salesforce Architect: 4 Habits to Start Before the Title

The Anatomy of Migration Failures

Exponential Back-off with Jitter

Idempotency: Preventing Duplicate Records via External ID Upsert

Dead-Letter Queue Pattern for Failed Records

Checkpoint and Resume Pattern

How These Four Patterns Fit Together

What’s Next

Continue Reading: Salesforce Data Migration Mastery Series

Kiran Sreeram Prathi

Building Resilient Salesforce Integrations: 5 Lessons Learned Supporting Enterprise Telecom Operations

From Salesforce Admin to Business Analyst: The Exact Skills to Add (and What to Stop Doing)

How to Think Like a Salesforce Architect: 4 Habits to Start Before the Title

Salesforce Consultant Career Path: From Junior Consultant to Practice Lead

How to Hire Salesforce Consultants: Practical Tips Every Business Should Know

6 Proven Principles to Drive Faster Salesforce CRM Adoption

Driving Revenue Efficiency with Sales Cloud in Product Companies

Customizing Salesforce: Tailor the CRM to Fit Your Business Needs

Unlock 10 Powerful Sales Pitches to Boost Your Revenue by 30X

Subscribe to Updates

What's Hot

Salesforce Migration Retry Logic, Idempotency & Dead-Letter Queue Patterns

The Anatomy of Migration Failures

Exponential Back-off with Jitter

Idempotency: Preventing Duplicate Records via External ID Upsert

Dead-Letter Queue Pattern for Failed Records

Checkpoint and Resume Pattern

How These Four Patterns Fit Together

What’s Next

Continue Reading: Salesforce Data Migration Mastery Series

Kiran Sreeram Prathi

Related Posts