The Salesforce Bulk API 2.0 is the definitive mechanism for high-volume data migration. Introduced as a modernisation of the original Bulk API 1.0, version 2.0 streamlines the job lifecycle, eliminates the batch subdivision model, and supports a significantly higher throughput ceiling: up to 150 million records per rolling 24-hour period for orgs with appropriate licences.

This guide walks through everything you need to build a production-ready Bulk API 2.0 migration pipeline from job creation to CSV streaming, polling, result retrieval, and running multiple object migrations in parallel. If you’re following this series from the start, Article 1 covered org readiness, and Article 2 covered OAuth 2.0 authentication architecture. This is where the data actually moves.

Table of Contents

Why Bulk API 2.0 — Not REST, Not the Old Bulk API

The three Salesforce APIs that come up most often for data migration work quite differently under pressure:

APIMax RecordsAsynchronousBest Use Case
REST API200 (Composite)NoLow-volume, real-time operations
Bulk API 1.010,000/batchYesLarge volume, complex batch management
Bulk API 2.0150M/24 hoursYesLarge volume, simplified lifecycle

The decision point is clean: if you’re loading more than 200 records at a time, Bulk API 2.0 is the right tool.

What changed from version 1.0 to 2.0 matters more than the version number itself. Bulk API 1.0 requires you to break your data into individual batches, manage each batch separately, and poll them one by one. Bulk API 2.0 removes all of that. You submit a single CSV per job, Salesforce handles the internal batching, and you poll the job as a unit. The lifecycle is simpler, the throughput ceiling is dramatically higher, and when things go wrong, the failure model is much easier to reason about.

Bulk API 2.0 Job Lifecycle

				
					Bulk API 2.0 Complete Job Lifecycle

  CLIENT APPLICATION                      SALESFORCE BULK API 2.0
  ------------------                      -----------------------

  [1] POST /jobs/ingest
      { object, operation, lineEnding } ----------------------->
                                        <-----------------------
                                          { id, state: 'Open' }

  [2] PUT /jobs/ingest/{jobId}/batches   (Upload CSV data)
      Content-Type: text/csv            ----------------------->
      [CSV payload: up to 150M rows]    <-----------------------
                                          204 No Content

  [3] PATCH /jobs/ingest/{jobId}
      { state: 'UploadComplete' }       ----------------------->
                                        <-----------------------
                                          { state: 'UploadComplete' }

  [4] GET /jobs/ingest/{jobId}   (Poll every 30-120 seconds)
                                        ----------------------->
                                        <-----------------------
                                          { state: 'InProgress'
                                            OR   'JobComplete'
                                            OR   'Failed' }

  [5] GET /jobs/ingest/{jobId}/successfulResults
      GET /jobs/ingest/{jobId}/failedResults
      GET /jobs/ingest/{jobId}/unprocessedrecords
                                        ----------------------->
                                        <-----------------------
                                          (CSV files with results)

  [6] DELETE /jobs/ingest/{jobId}   (Cleanup: optional but recommended)


				
			

Implementing the Full Bulk API 2.0 Lifecycle: External Client Application

Step 1: Create the Ingest Job

				
					// Node.js: Bulk API 2.0 Job Creation
async function createBulkJob(token, object, operation) {
  // operation: 'insert' | 'update' | 'upsert' | 'delete' | 'hardDelete'
  const payload = {
    object         : object,        // e.g., 'Account', 'Contact', 'Opportunity__c'
    operation      : operation,
    lineEnding     : 'LF',          // 'LF' = Unix, 'CRLF' = Windows
    columnDelimiter: 'COMMA',
    contentType    : 'CSV',
    // For upsert: specify the external ID field
    ...(operation === 'upsert' && { externalIdFieldName: 'External_ID__c' }),
  };

  const res = await axios.post(
    `${token.instanceUrl}/services/data/v59.0/jobs/ingest`,
    payload,
    { headers: { Authorization: `Bearer ${token.accessToken}`,
                 'Content-Type': 'application/json' } }
  );
  console.log(`[Bulk] Job created: ${res.data.id}, state: ${res.data.state}`);
  return res.data; // { id, state, object, operation, ... }
}

				
			

Step 2: Upload CSV Data via Streaming

For very large datasets (millions of rows), upload the CSV data as a stream rather than loading the entire dataset into memory. The Salesforce API accepts the CSV body in a single PUT request per job, but you can split your source data across multiple jobs to parallelise the load.

				
					// Node.js: Stream CSV upload to Bulk API 2.0
const fs   = require('fs');
const path = require('path');

async function uploadCsvData(token, jobId, csvFilePath) {
  const fileSize = fs.statSync(csvFilePath).size;
  const stream   = fs.createReadStream(csvFilePath, { highWaterMark: 64 * 1024 });

  console.log(`[Bulk] Uploading ${(fileSize/1e6).toFixed(1)} MB to job ${jobId}`);

  const res = await axios.put(
    `${token.instanceUrl}/services/data/v59.0/jobs/ingest/${jobId}/batches`,
    stream,
    {
      headers: {
        Authorization  : `Bearer ${token.accessToken}`,
        'Content-Type' : 'text/csv',
        'Accept'       : 'application/json',
      },
      maxBodyLength   : Infinity,  // Disable axios body size limit
      maxContentLength: Infinity,
    }
  );
  if (res.status !== 201 && res.status !== 204) {
    throw new Error(`[Bulk] Upload failed: HTTP ${res.status}`);
  }
  console.log(`[Bulk] Upload complete for job ${jobId}`);
}

				
			

Step 3: Close the Job and Poll for Completion

				
					async function closeJobAndPoll(token, jobId, pollIntervalMs = 30000) {
  // Mark upload as complete: triggers Salesforce processing
  await axios.patch(
    `${token.instanceUrl}/services/data/v59.0/jobs/ingest/${jobId}`,
    { state: 'UploadComplete' },
    { headers: { Authorization: `Bearer ${token.accessToken}`,
                 'Content-Type': 'application/json' } }
  );
  console.log(`[Bulk] Job ${jobId} marked UploadComplete: awaiting processing`);

  // Polling loop with structured logging
  while (true) {
    await sleep(pollIntervalMs);
    const status = await getJobStatus(token, jobId);
    console.log(
      `[Poll] Job ${jobId}`,
      `| state: ${status.state}`,
      `| processed: ${status.numberRecordsProcessed}`,
      `| failed: ${status.numberRecordsFailed}`
    );
    if (status.state === 'JobComplete' || status.state === 'Failed') return status;
    if (status.state === 'Aborted') throw new Error(`Job ${jobId} was aborted`);
  }
}

async function getJobStatus(token, jobId) {
  const res = await axios.get(
    `${token.instanceUrl}/services/data/v59.0/jobs/ingest/${jobId}`,
    { headers: { Authorization: `Bearer ${token.accessToken}` } }
  );
  return res.data;
}

const sleep = ms => new Promise(resolve => setTimeout(resolve, ms));

				
			

Step 4: Retrieve Results and Triage Failures

				
					async function retrieveJobResults(token, jobId, outputDir) {
  const resultTypes = ['successfulResults', 'failedResults', 'unprocessedrecords'];
  const results = {};

  for (const type of resultTypes) {
    const res = await axios.get(
      `${token.instanceUrl}/services/data/v59.0/jobs/ingest/${jobId}/${type}`,
      {
        headers     : { Authorization: `Bearer ${token.accessToken}`, Accept: 'text/csv' },
        responseType: 'stream',
      }
    );
    const filePath = path.join(outputDir, `${jobId}_${type}.csv`);
    const writer   = fs.createWriteStream(filePath);
    res.data.pipe(writer);
    await new Promise((resolve, reject) => {
      writer.on('finish', resolve);
      writer.on('error', reject);
    });
    results[type] = filePath;
    console.log(`[Results] ${type}: saved to ${filePath}`);
  }
  return results;
}

				
			

Streaming Large CSVs — The Right Way to Handle Millions of Rows

Here’s a scenario that plays out more often than it should. A developer writes a migration script that reads an entire CSV into memory, builds a string, and passes it to the API. It works in testing with 5,000 records. It kills the Node process at 800,000 records on migration night.

The fix is straightforward. In Node.js, createReadStream() reads the file in small chunks and pipes it to the HTTP request without ever holding the full dataset in memory:

				
					const stream = fs.createReadStream(csvFilePath, { highWaterMark: 64 * 1024 });

await axios.put(uploadUrl, stream, {
  headers: {
    'Content-Type': 'text/csv',
    'Accept': 'application/json'
  },
  maxBodyLength: Infinity,
  maxContentLength: Infinity
});
				
			

The maxBodyLength: Infinity setting is easy to overlook. Without it, Axios applies a default body size limit and silently truncates large payloads. The API call succeeds, but only part of your data gets submitted.

The 150,000-records-per-job ceiling is a practical guideline, not a hard API limit. Breaking your source data across multiple jobs also gives you more granular control: if job 3 of 8 fails, you fix and re-run job 3 — you don’t restart the whole migration.

Parallelisation Strategy

To maximise throughput without exceeding concurrency limits, the migration orchestrator should parallelise at the object level: running Account, Contact, and Lead migrations concurrently where there are no dependency constraints, while serialising within an object family to respect parent-child insertion ordering (Account before Contact, Contact before Case).

				
					// Parallel migration with dependency wave ordering
const migrationPlan = {
  wave1: ['Account', 'Lead', 'Campaign'],              // No parent dependencies
  wave2: ['Contact', 'Opportunity', 'CampaignMember'], // Depend on wave1
  wave3: ['OpportunityLineItem', 'Case', 'Task'],      // Depend on wave2
  wave4: ['CaseComment', 'ContentVersion', 'Note'],    // Depend on wave3
};

async function runMigration(plan, token, options) {
  for (const [wave, objects] of Object.entries(plan)) {
    console.log(`\n=== Starting ${wave}: [${objects.join(', ')}] ===`);
    await Promise.all(objects.map(obj => migrateObject(obj, token, options)));
    console.log(`=== ${wave} complete ===\n`);
    await validateWaveCompletion(wave, objects, token);
  }
}

				
			

Three Mistakes That Will Ruin Your Bulk API Migration

These aren’t hypothetical scenarios. Each one turns up regularly on the Trailblazer Community.

  1. Using insert instead of upsert. If your pipeline fails at record 70,000 and you restart from the beginning, insert creates 70,000 duplicate records. upsert With a populated External ID field overwrites them cleanly. This is the single most impactful configuration decision in a migration project, and it’s a one-line change.
  2. Polling every one to five seconds. Processing happens server-side. Your polling frequency has zero effect on how fast Salesforce processes the records. Every unnecessary status call counts against your org’s daily API request limit — and on a large migration, that matters. Thirty seconds is a reasonable poll interval. Two minutes is fine for large jobs.
  3. Ignoring the failedResults CSV. “The job finished” is not the same as “the migration succeeded.” If your script checks for JobComplete and marks the job done without inspecting the failure file, you have no idea how many records actually landed in Salesforce. Store the failures, investigate them, correct them, and resubmit them. Article 4 in this series covers the retry logic and dead-letter queue pattern in detail.

Quick Reference — Key API Endpoints

OperationMethodEndpointNotes
Token (Client Creds)POST/services/oauth2/tokengrant_type=client_credentials
Token (JWT Bearer)POST/services/oauth2/tokengrant_type=urn:ietf:...jwt-bearer
Object DescribeGET/services/data/v59.0/sobjects/{Object}/describe/Returns field metadata
Create Bulk JobPOST/services/data/v59.0/jobs/ingestBody: JSON job config
Upload CSV DataPUT/services/data/v59.0/jobs/ingest/{id}/batchesBody: text/csv stream
Close JobPATCH/services/data/v59.0/jobs/ingest/{id}Body: {state: UploadComplete}
Poll Job StatusGET/services/data/v59.0/jobs/ingest/{id}Returns state + counts
Get Success ResultsGET/services/data/v59.0/jobs/ingest/{id}/successfulResultsReturns CSV
Get Failed ResultsGET/services/data/v59.0/jobs/ingest/{id}/failedResultsReturns CSV
Composite RequestPOST/services/data/v59.0/compositeUp to 25 sub-requests
SObject TreePOST/services/data/v59.0/composite/tree/{Object}Up to 200 records
SOQL QueryGET/services/data/v59.0/query/?q={SOQL}URL-encode the SOQL string
Check API LimitsGET/services/data/v59.0/limits/Monitor consumption in real time

Official reference: Salesforce Bulk API 2.0 Developer Guide

What Happens When It Goes Wrong at 2 am?

Getting data into Salesforce is one problem. Knowing what to do when 12,000 records fail mid-job, at 2 AM, with go-live in four hours, is a completely different one.

The next article in this series, Retry Logic, Idempotency & Dead-Letter Queues: How to Bulletproof Your Salesforce Migration, covers exactly that: exponential back-off with jitter, checkpoint-and-resume patterns, idempotency keys, and how to store and reprocess failed records without creating duplicates or losing data.

If you’re just starting this project, go back to Article 1 — Org Readiness Assessment before writing a line of migration code. The pre-migration work is where these projects succeed or fail, and it’s almost always skipped.

Kiran Sreeram Prathi
Kiran Sreeram Prathi
Sr. Salesforce Developer  kiransreeram8@live.com

I’m Kiran Sreeram Prathi, a Salesforce Developer dedicated to building scalable, intelligent, and user-focused CRM solutions. Over the past five years, I’ve delivered Salesforce implementations across healthcare, finance, and service industries—focusing on both technical precision and user experience. My expertise spans Lightning Web Components (LWC), Apex, OmniStudio, and Experience Cloud, along with CI/CD automation using GitHub Actions and integrations with platforms such as DocuSign, Conga, and Zpaper. I take pride in transforming complex workflows into seamless digital journeys and implementing clean DevOps strategies that reduce downtime and accelerate delivery. Recognized by organizations like Novartis, WILCO, and Deloitte, I enjoy solving problems that make Salesforce work smarter and scale better. I’m always open to connecting with professionals who are passionate about process transformation, architecture design, and continuous innovation in the Salesforce ecosystem.

Share.
Leave A Reply

Exit mobile version