The Salesforce Bulk API 2.0 is the definitive mechanism for high-volume data migration. Introduced as a modernisation of the original Bulk API 1.0, version 2.0 streamlines the job lifecycle, eliminates the batch subdivision model, and supports a significantly higher throughput ceiling: up to 150 million records per rolling 24-hour period for orgs with appropriate licences.
This guide walks through everything you need to build a production-ready Bulk API 2.0 migration pipeline from job creation to CSV streaming, polling, result retrieval, and running multiple object migrations in parallel. If you’re following this series from the start, Article 1 covered org readiness, and Article 2 covered OAuth 2.0 authentication architecture. This is where the data actually moves.
Table of Contents
Why Bulk API 2.0 — Not REST, Not the Old Bulk API
The three Salesforce APIs that come up most often for data migration work quite differently under pressure:
| API | Max Records | Asynchronous | Best Use Case |
|---|---|---|---|
| REST API | 200 (Composite) | No | Low-volume, real-time operations |
| Bulk API 1.0 | 10,000/batch | Yes | Large volume, complex batch management |
| Bulk API 2.0 | 150M/24 hours | Yes | Large volume, simplified lifecycle |
The decision point is clean: if you’re loading more than 200 records at a time, Bulk API 2.0 is the right tool.
What changed from version 1.0 to 2.0 matters more than the version number itself. Bulk API 1.0 requires you to break your data into individual batches, manage each batch separately, and poll them one by one. Bulk API 2.0 removes all of that. You submit a single CSV per job, Salesforce handles the internal batching, and you poll the job as a unit. The lifecycle is simpler, the throughput ceiling is dramatically higher, and when things go wrong, the failure model is much easier to reason about.
Bulk API 2.0 Job Lifecycle
Bulk API 2.0 Complete Job Lifecycle
CLIENT APPLICATION SALESFORCE BULK API 2.0
------------------ -----------------------
[1] POST /jobs/ingest
{ object, operation, lineEnding } ----------------------->
<-----------------------
{ id, state: 'Open' }
[2] PUT /jobs/ingest/{jobId}/batches (Upload CSV data)
Content-Type: text/csv ----------------------->
[CSV payload: up to 150M rows] <-----------------------
204 No Content
[3] PATCH /jobs/ingest/{jobId}
{ state: 'UploadComplete' } ----------------------->
<-----------------------
{ state: 'UploadComplete' }
[4] GET /jobs/ingest/{jobId} (Poll every 30-120 seconds)
----------------------->
<-----------------------
{ state: 'InProgress'
OR 'JobComplete'
OR 'Failed' }
[5] GET /jobs/ingest/{jobId}/successfulResults
GET /jobs/ingest/{jobId}/failedResults
GET /jobs/ingest/{jobId}/unprocessedrecords
----------------------->
<-----------------------
(CSV files with results)
[6] DELETE /jobs/ingest/{jobId} (Cleanup: optional but recommended)
Implementing the Full Bulk API 2.0 Lifecycle: External Client Application
Step 1: Create the Ingest Job
// Node.js: Bulk API 2.0 Job Creation
async function createBulkJob(token, object, operation) {
// operation: 'insert' | 'update' | 'upsert' | 'delete' | 'hardDelete'
const payload = {
object : object, // e.g., 'Account', 'Contact', 'Opportunity__c'
operation : operation,
lineEnding : 'LF', // 'LF' = Unix, 'CRLF' = Windows
columnDelimiter: 'COMMA',
contentType : 'CSV',
// For upsert: specify the external ID field
...(operation === 'upsert' && { externalIdFieldName: 'External_ID__c' }),
};
const res = await axios.post(
`${token.instanceUrl}/services/data/v59.0/jobs/ingest`,
payload,
{ headers: { Authorization: `Bearer ${token.accessToken}`,
'Content-Type': 'application/json' } }
);
console.log(`[Bulk] Job created: ${res.data.id}, state: ${res.data.state}`);
return res.data; // { id, state, object, operation, ... }
}
Step 2: Upload CSV Data via Streaming
For very large datasets (millions of rows), upload the CSV data as a stream rather than loading the entire dataset into memory. The Salesforce API accepts the CSV body in a single PUT request per job, but you can split your source data across multiple jobs to parallelise the load.
// Node.js: Stream CSV upload to Bulk API 2.0
const fs = require('fs');
const path = require('path');
async function uploadCsvData(token, jobId, csvFilePath) {
const fileSize = fs.statSync(csvFilePath).size;
const stream = fs.createReadStream(csvFilePath, { highWaterMark: 64 * 1024 });
console.log(`[Bulk] Uploading ${(fileSize/1e6).toFixed(1)} MB to job ${jobId}`);
const res = await axios.put(
`${token.instanceUrl}/services/data/v59.0/jobs/ingest/${jobId}/batches`,
stream,
{
headers: {
Authorization : `Bearer ${token.accessToken}`,
'Content-Type' : 'text/csv',
'Accept' : 'application/json',
},
maxBodyLength : Infinity, // Disable axios body size limit
maxContentLength: Infinity,
}
);
if (res.status !== 201 && res.status !== 204) {
throw new Error(`[Bulk] Upload failed: HTTP ${res.status}`);
}
console.log(`[Bulk] Upload complete for job ${jobId}`);
}
Step 3: Close the Job and Poll for Completion
async function closeJobAndPoll(token, jobId, pollIntervalMs = 30000) {
// Mark upload as complete: triggers Salesforce processing
await axios.patch(
`${token.instanceUrl}/services/data/v59.0/jobs/ingest/${jobId}`,
{ state: 'UploadComplete' },
{ headers: { Authorization: `Bearer ${token.accessToken}`,
'Content-Type': 'application/json' } }
);
console.log(`[Bulk] Job ${jobId} marked UploadComplete: awaiting processing`);
// Polling loop with structured logging
while (true) {
await sleep(pollIntervalMs);
const status = await getJobStatus(token, jobId);
console.log(
`[Poll] Job ${jobId}`,
`| state: ${status.state}`,
`| processed: ${status.numberRecordsProcessed}`,
`| failed: ${status.numberRecordsFailed}`
);
if (status.state === 'JobComplete' || status.state === 'Failed') return status;
if (status.state === 'Aborted') throw new Error(`Job ${jobId} was aborted`);
}
}
async function getJobStatus(token, jobId) {
const res = await axios.get(
`${token.instanceUrl}/services/data/v59.0/jobs/ingest/${jobId}`,
{ headers: { Authorization: `Bearer ${token.accessToken}` } }
);
return res.data;
}
const sleep = ms => new Promise(resolve => setTimeout(resolve, ms));
Step 4: Retrieve Results and Triage Failures
async function retrieveJobResults(token, jobId, outputDir) {
const resultTypes = ['successfulResults', 'failedResults', 'unprocessedrecords'];
const results = {};
for (const type of resultTypes) {
const res = await axios.get(
`${token.instanceUrl}/services/data/v59.0/jobs/ingest/${jobId}/${type}`,
{
headers : { Authorization: `Bearer ${token.accessToken}`, Accept: 'text/csv' },
responseType: 'stream',
}
);
const filePath = path.join(outputDir, `${jobId}_${type}.csv`);
const writer = fs.createWriteStream(filePath);
res.data.pipe(writer);
await new Promise((resolve, reject) => {
writer.on('finish', resolve);
writer.on('error', reject);
});
results[type] = filePath;
console.log(`[Results] ${type}: saved to ${filePath}`);
}
return results;
}
Streaming Large CSVs — The Right Way to Handle Millions of Rows
Here’s a scenario that plays out more often than it should. A developer writes a migration script that reads an entire CSV into memory, builds a string, and passes it to the API. It works in testing with 5,000 records. It kills the Node process at 800,000 records on migration night.
The fix is straightforward. In Node.js, createReadStream() reads the file in small chunks and pipes it to the HTTP request without ever holding the full dataset in memory:
const stream = fs.createReadStream(csvFilePath, { highWaterMark: 64 * 1024 });
await axios.put(uploadUrl, stream, {
headers: {
'Content-Type': 'text/csv',
'Accept': 'application/json'
},
maxBodyLength: Infinity,
maxContentLength: Infinity
});
The maxBodyLength: Infinity setting is easy to overlook. Without it, Axios applies a default body size limit and silently truncates large payloads. The API call succeeds, but only part of your data gets submitted.
The 150,000-records-per-job ceiling is a practical guideline, not a hard API limit. Breaking your source data across multiple jobs also gives you more granular control: if job 3 of 8 fails, you fix and re-run job 3 — you don’t restart the whole migration.
Parallelisation Strategy
To maximise throughput without exceeding concurrency limits, the migration orchestrator should parallelise at the object level: running Account, Contact, and Lead migrations concurrently where there are no dependency constraints, while serialising within an object family to respect parent-child insertion ordering (Account before Contact, Contact before Case).
// Parallel migration with dependency wave ordering
const migrationPlan = {
wave1: ['Account', 'Lead', 'Campaign'], // No parent dependencies
wave2: ['Contact', 'Opportunity', 'CampaignMember'], // Depend on wave1
wave3: ['OpportunityLineItem', 'Case', 'Task'], // Depend on wave2
wave4: ['CaseComment', 'ContentVersion', 'Note'], // Depend on wave3
};
async function runMigration(plan, token, options) {
for (const [wave, objects] of Object.entries(plan)) {
console.log(`\n=== Starting ${wave}: [${objects.join(', ')}] ===`);
await Promise.all(objects.map(obj => migrateObject(obj, token, options)));
console.log(`=== ${wave} complete ===\n`);
await validateWaveCompletion(wave, objects, token);
}
}
Three Mistakes That Will Ruin Your Bulk API Migration
These aren’t hypothetical scenarios. Each one turns up regularly on the Trailblazer Community.
- Using insert instead of upsert. If your pipeline fails at record 70,000 and you restart from the beginning,
insertcreates 70,000 duplicate records.upsertWith a populated External ID field overwrites them cleanly. This is the single most impactful configuration decision in a migration project, and it’s a one-line change. - Polling every one to five seconds. Processing happens server-side. Your polling frequency has zero effect on how fast Salesforce processes the records. Every unnecessary status call counts against your org’s daily API request limit — and on a large migration, that matters. Thirty seconds is a reasonable poll interval. Two minutes is fine for large jobs.
- Ignoring the failedResults CSV. “The job finished” is not the same as “the migration succeeded.” If your script checks for
JobCompleteand marks the job done without inspecting the failure file, you have no idea how many records actually landed in Salesforce. Store the failures, investigate them, correct them, and resubmit them. Article 4 in this series covers the retry logic and dead-letter queue pattern in detail.
Quick Reference — Key API Endpoints
| Operation | Method | Endpoint | Notes |
|---|---|---|---|
| Token (Client Creds) | POST | /services/oauth2/token | grant_type=client_credentials |
| Token (JWT Bearer) | POST | /services/oauth2/token | grant_type=urn:ietf:...jwt-bearer |
| Object Describe | GET | /services/data/v59.0/sobjects/{Object}/describe/ | Returns field metadata |
| Create Bulk Job | POST | /services/data/v59.0/jobs/ingest | Body: JSON job config |
| Upload CSV Data | PUT | /services/data/v59.0/jobs/ingest/{id}/batches | Body: text/csv stream |
| Close Job | PATCH | /services/data/v59.0/jobs/ingest/{id} | Body: {state: UploadComplete} |
| Poll Job Status | GET | /services/data/v59.0/jobs/ingest/{id} | Returns state + counts |
| Get Success Results | GET | /services/data/v59.0/jobs/ingest/{id}/successfulResults | Returns CSV |
| Get Failed Results | GET | /services/data/v59.0/jobs/ingest/{id}/failedResults | Returns CSV |
| Composite Request | POST | /services/data/v59.0/composite | Up to 25 sub-requests |
| SObject Tree | POST | /services/data/v59.0/composite/tree/{Object} | Up to 200 records |
| SOQL Query | GET | /services/data/v59.0/query/?q={SOQL} | URL-encode the SOQL string |
| Check API Limits | GET | /services/data/v59.0/limits/ | Monitor consumption in real time |
Official reference: Salesforce Bulk API 2.0 Developer Guide
What Happens When It Goes Wrong at 2 am?
Getting data into Salesforce is one problem. Knowing what to do when 12,000 records fail mid-job, at 2 AM, with go-live in four hours, is a completely different one.
The next article in this series, Retry Logic, Idempotency & Dead-Letter Queues: How to Bulletproof Your Salesforce Migration, covers exactly that: exponential back-off with jitter, checkpoint-and-resume patterns, idempotency keys, and how to store and reprocess failed records without creating duplicates or losing data.
If you’re just starting this project, go back to Article 1 — Org Readiness Assessment before writing a line of migration code. The pre-migration work is where these projects succeed or fail, and it’s almost always skipped.

Kiran Sreeram Prathi
I’m Kiran Sreeram Prathi, a Salesforce Developer dedicated to building scalable, intelligent, and user-focused CRM solutions. Over the past five years, I’ve delivered Salesforce implementations across healthcare, finance, and service industries—focusing on both technical precision and user experience. My expertise spans Lightning Web Components (LWC), Apex, OmniStudio, and Experience Cloud, along with CI/CD automation using GitHub Actions and integrations with platforms such as DocuSign, Conga, and Zpaper. I take pride in transforming complex workflows into seamless digital journeys and implementing clean DevOps strategies that reduce downtime and accelerate delivery. Recognized by organizations like Novartis, WILCO, and Deloitte, I enjoy solving problems that make Salesforce work smarter and scale better. I’m always open to connecting with professionals who are passionate about process transformation, architecture design, and continuous innovation in the Salesforce ecosystem.
- This author does not have any more posts.

