Sample use case - Daily Customer Master file
Note: This is Part 2 of our Data Cloud Bulk Ingestion API piece, which digs more into the technical details of the implementation. Check out our planning & prep stage here in Part 1.
At a zoomed out view, we're taking a Customer Master File in a .csv file, performing an API Call, which facilitates the transport of csv file into Data Cloud, where it resides in a Data Lake Object. This data from the csv file, when mapped into the Data Model, can help enrich & inform the Unified Profile, and ensures that Data Cloud has the latest & greatest Customer Master info available.
From csv to Data Cloud
- What does it look like to get data in to Data Cloud via the API?
- What are some considerations before we start to bring data in to Data Cloud via this pattern?

Getting our source data (and getting it right)
We'll need to understand more around the csv file, including field details, file frequency, as well as understanding any supporting elements. We need to understand how it's going to fit into Data Cloud.
Does the source data fit into what Data Cloud can accept?
Review these Data Cloud requirements for Bulk Ingestion (https://developer.salesforce.com/docs/atlas.en-us.c360a_api.meta/c360a_api/c360a_api_bulk_ingestion.htm)
- Empty field values are set to null
- Files must be UTF-8
- Bulk API only supports comma field delimiters (sorry | or tab)
- Data must not exceed 150MB (You can split a larger file into 150MB increments and batch of up to 100 files)
- Update records are full replace
Sample Steps
Jump to the following steps for more detail within our sample Proof of Concept
Sample Data (sample csv) (sample schema)
Ingestion API Schema, focus on the Data
Create a Private Key & Self-Signed Digital Certificate
Obtain Access Token with JWT assertion
Making Connections
Where to begin?
It's critical to understand the data model and mapping process. Getting the data there is just as important. The following activites can be evaluated prior to implementation:
1. Creating a private key and self-signed digital certificate for authentication purposes. We'll need a Private Key created and available
2. Creating a Connected App in Salesforce, including setting up OAuth scopes, callback URL, and IP relaxation policies. We'll need to have a Connected App created in Salesforce config
3. Using JWT tokens to authenticate API calls and obtain a new access token for Data Cloud. We'll need to generate a JWT & exchange this Token to proceed with our Job
4. Making a Data Cloud API call using the new access token obtained from the previous step.
5. Creating a Bulk API job for Data Cloud and uploading the job.
6. Closing the job and reviewing its status.
The guide provides step-by-step instructions and screenshots to help you complete each task successfully. By following this guide, you should be able to create a connected app for Salesforce and use it to perform a Batch API POST to Data Cloud.️
It's more than just a simple file upload
Keep reading for how to create a connected app for Salesforce and use it to perform a Batch API POST to Data Cloud.
This guide covers the following topics:
Dig into to Salesforce documention to learn more on the following topics:
- source data aligned to Data Cloud ingest capabilities (.csv file) (understand what we're working with and design ahead of time)
- We'll need a Private Key created and available (Step 1 above)
- We'll need to have a Connected App created in Salesforce config (Step 2)
- We'll need to generate a JWT & exchange this Token to proceed with our Job
- Align source data to Data Cloud structure
- Data Lake Object Creation
- Data Stream Creation
- Connect API Creation & Schema Alignment
- Create Job > Upload Job > Close Job
- Review & QA
Key players we'll need:
The Data - confirm source data and align. What will this inform for Segmentation & Identity Resolution? What DMOs will we map this into? It's helpful to develop a zoomed out view of what the scene will look like once the dust is settled.
The Team - We're going to need a Security/Systems Support. They'll be needed potentially for Private Key generation, Token Creation, Permission Assignments or even Connected App setup. This depends on your overall organization's roles and responsibilities.
Resources:
Auth Keys & Certs: https://developer.salesforce.com/docs/atlas.en-us.242.0.sfdx_dev.meta/sfdx_dev/sfdx_dev_auth_key_and_cert.htm
Access Tokens for Data Cloud: https://developer.salesforce.com/docs/atlas.en-us.c360a_api.meta/c360a_api/c360a_getting_started_with_cdp.htm
Alignment is Key
Configuration - how will this fit into existing business systems and processes? Your development and implementation team will need to become familiar with the "hops" in between the token exchange to make a Data Cloud Direct API call vs a Connect API call. Tools such as Postman and jwt.io are essential. Access to Connected App credentials and Salesforce Admin configuration is required.
Getting the plumbing connected
Independent activities can happen before we even have the Customer Master file. Let's make sure the plumbing works.
Standalone activities:
- Key Creation
- Connected App Setup
- Requirements: Key Creation
- Salesforce API Testing
- Requirements: Connected App, JWT Token creation, Postman, Sample data
Sample Action Item: Create a simple proof of concept using a sample csv file, and a test Data Lake Object & supporting Data Stream
Confirm the ask
Review the customer master layout that we have from Business. Will this work as-is? What might need to be tweaked here in order to be successful within Data Cloud?
Think about: Missing values, duplicate entries, data inconsistencies can impact overall insights and confidence in the implementation.
Full Name should be FirstName and LastName
Could we address this with a formula field at the DLO level?
Align the Customer Master & model for future in Data Cloud:
Schema
Review the Schema that will be required
I've uploaded a sample schema for you to download & start with: sample_data_cloud_schema.yaml
DLO
Design what the Data Lake Object will look like that will house the records from Customer Master.
Jump to our section here to compare between csv and DLO.
Data Stream
Design what the Data Stream will look like.
Frequency & timing: Are there any timing implications around downstream activities depending on when the data arrives that we need to be aware of?
Connect
Create the API Connection in Data Cloud Setup
We'll walk thru the steps from uploading your Schema thru to validation.
The big picture - the Data Model & Unified Profile. What DMOs will the DLO ultimately tie into to support success efforts?
A Proof of Concept Test
What data are we collecting?
Let's simulate trying to bring in a customer master file into Data Cloud. Even if we don't have customer data yet, we can simulate the data and steps we may encounter during implementation.
Create test data
Testing things out - use mock data & prepare for Data Cloud
In this example, I'll use mock data to test out our flow - mockaroo.com is a service that will work quite nicely for generating sample data that we can use for our PoC:

Sample Fields
Field Name | Mockaroo Type | Future Data Cloud Type | Notes |
cid | New Number | Number | Primary Key |
full_name | Full Name | text | |
first_name | First Name | text | |
last_name | Last Name | text | |
email_address | Email Address | text | |
address1 | Street Address | text | |
address2 | Address Line 2 | text | |
city | City | text | |
state | State (abbrev) | text | |
zip | Postal Code | text | |
DateAdded | Datetime | datetime | |
record_updated | Datetime | datetime | This will be our Record Modified Field in Data Cloud |
Our headers:
cid,fullname,firstname,lastname,emailaddress,address1,address2,city,state,zip,DateAdded,DT_Added
Preview output, then save to .csv to prepare for Bulk API load
How is this going to map into the Data Model? Think about what we're collecting and how it will need to be configured in Data Cloud

Create the DLO (Data Lake Object)
align the fields to what we're bringing in. We don't need to perform Data Mapping yet.
Source Name | Data Stream Field API Name | Data Stream Data Type | Data Lake Object Field API Name | Notes |
DataSource__c Auto added |
Text | DataSource__c Auto Added |
||
DataSourceObject__c Auto added |
Text | DataSourceObject__c Auto Added |
||
address1 | address1__c | Text | address1__c | |
address2 | address2__c | Text | address2__c | |
cid | cid__c | Number | cid__c | Used as Primary Key |
city | city__c | Text | city__c | |
dateadded | dateadded__c | Date | dateadded__c | |
dt_added | dt_added__c | DateTime | dt_added__c | Record Modified Field |
email_address | email_address__c | Text | email_address__c | |
first_name__c | first_name__c | Text | first_name__c | |
full_name__c | full_name__c | Text | full_name__c | |
last_name__c | last_name__c | Text | last_name__c | |
state__c | state__c | Text | state__c | |
zip__c | zip__c | Text | zip__c | |
cdp_sys_SourceVersion__c Auto added |
||||
KQ_cid__c Auto added |
Key Qualifier"cid" Key Qualifier |

Configure the Ingestion API
Getting there: Navigate to Data Cloud Setup > Ingestion API
Connector Name - we'll need to reference this when making our Postman API call. This will be the sourceName used in the call.


Ingestion API Schema - focus on the Data, not the Code
Getting there: The existing schema can be downloaded or updated via Data Cloud Setup > Ingestion API > Connector. The Download Schema and Update Schema options are available on the Schema detail page.

Start here to download a sample schema if needed:
https://help.salesforce.com/s/articleView?id=sf.c360_a_ingestion_api_schema_req.htm&type=5
Use the sample schema, and align to what you'll need for your use case. Note the distinction between date and date-time.
The schema simply describes the what.
The YAML schema is a blueprint for your data.
It describes that data that we'll be uploading using the Ingestion API. The data for us will be from the customer_master.csv file.
We need to tell Data Cloud what to expect. The structure of what we'll be sending to Data Cloud is defined within the OpenAPI Schema, in .yaml format.
Within the schema that we'll upload, we want to specify the data types that accurately represent what we're uploading to Data Cloud, aligned to how Data Cloud can accept it.
The Schema helps inform Data Cloud on what to expect



Save your schema file to .yaml, then Upload your schema:
Preview the schema to ensure that your fields have been appropriately identified:


Create the Data Stream
Data Streams > New Data Stream > Ingestion API
Select your API Connector you created in the previous step, and confirm the Object Name to be populated:

Configure Category (Profile, Engagement..), Primary Key and Record Modified Fields

Confirm Data Stream & Data Space Filtering details
Confirm your object & data space filtering if applicable. Click Deploy.

You can now verify your Data Stream details:

Ok. We have a Data Stream created, we have a DLO and a destination.
We now need to get the bridge built between our systems and Data Cloud, and that'll be a few hops of a Connected App, tokens and API calls.
Here's where it gets a little wild.
Connected What?
On the Salesforce config side, we'll create a Connected App, a Salesforce feature that has Security, Access and Flexibility at its core.
What is a Connected App? A way to access Salesforce and ensure:
Simplified User Management - Connected apps use tokens for authentication, eliminating the need to manage additional user credentials within Salesforce. The steps we'll take next help get aligned.
Enhanced Security - Leverage OAuth framework, allowing granular permissions. This minimizes risk of unauthorized access.
Improved Monitoring & Auditing - A clear audit trail can be established via API usage. Track which Connected Apps are making calls. This can help quite a bit while testing out the integration, and helps ensure compliance is aligned with data security regulations.
Reference:
Connected App terminology: https://help.salesforce.com/s/articleView?id=sf.remoteaccess_terminology.htm&type=5
Create a private key & self signed digital certificate
We'll create a key and details that we exchange with Salesforce to ensure that we are who we say we are. We'll use this detail as well as part of the JWT Token that gets passed to Salesforce, in order to exchange for our Data Cloud access token, in order to perform our Batch API POST.
Follow these steps. The output should be a server.key and a server.crt file that we'll use in our next activities.
Create Connected App
We'll create this in Data Cloud. Ensure the following:
Use digital signatures. Upload server.crt from previous step.
Ensure select OAuth Scopes align to what you're trying to accomplish.
Update Callback URL. In our example, I'm using the Postman callback URL https://oauth.pstmn.io/v1/callback
Permitted Users - Change to Admin approved users are pre-authorized
IP Relaxation - Change to Relax IP Restrictions
Refresh Token Policy - Change to "Refresh Token is Valid until revoked"
Click Manage Profiles and add your Profile to the Connected App.
Note your Connected App Client ID - you'll need that in the next step.
Need to get back to your Connected App credentials? Navigate to Setup > App Manager > Manage or View

Create a JWT
Wait - why not Client Credentials flow? It's doable, but with JWT we're not tying the Connected App to a User. The JWT Bearer Flow can be more scalable, more secure and a bit more complex. The JWT contains more context than just the sandard client_id & client_secret exchange.
JWTs can be generated and cached ahead of time - which may be beneficial as things grow.
JWTs help add a layer of trust and data security, rather than relying solely on being one-of-a-kind tokens. Our token will contain unique data, including our Client ID for the Connected App, the user, login url and expiration date. This information is encrypted using the RS256 algorithm and is shared in a future API call.
JWTs can be signed using cryptographic methods. This signature verifies the sender and ensures the content hasn't been tampered with during transmissions. This can be considered a form of uniqueness in data integrity.
We'll use www.jwt.io to create a JWT Token that will be used in future calls:
Reference: https://help.salesforce.com/s/articleView?id=sf.jwt_access_tokens.htm&type=5
JWT resources:
Steps to create a JWT for Salesforce
1. Navigate to jwt.io
2. Update Algorithm settings & Payload detail.
1. iss: Connected App Client ID
2. sub: user
3. exp: go to unixtimezone.com
3. under Verify Signature, clear out Public Key area.
4. On the left hand side (ignore Invalid Signature) copy the encoded detail. Use the newly Encoded Token, as the assertion in your next step:
SteveTechArc's JWT Bearer Flow video is an excellent resource

JWT Bearer Token Flow:
In Postman, we'll make the following POST with our assertion details, in order to obtain a new access token that will be used to get to Data Cloud.
POST to https://login.salesforce.com/services/oauth2/token
Content-Type: x-www-form-urlencoded
grant_type: urn:ietf:params:oauth:grant-type:jwt-bearer
assertion: JWT from previous step
client_id: connected app
client_secret: connected app

`{"accesstoken":"00DDn000003oeek!AQEAwwD4Dxh4uaOsqMNODfxZ1OFYG1gVuaMaQOri0mwXA0QqM6xtcUofclFWHlbX4NxtI7cVTGPPV6cO2WXO1QBDBhmLeClO","scope":"cdpqueryapi cdpingestapi api cdpapi","instanceurl":"https://yoururlhere.my.salesforce.com","id":"https://login.salesforce.com/id/00DDn000003qeekMAA/005Dn000002qXwsIAE","token_type":"Bearer"}`
Exchange for a Data Cloud Token
Reference reminder: https://developer.salesforce.com/blogs/2023/07/load-data-programmatically-with-the-ingestion-api
POST: https://yoururlhere.my.salesforce.com/services/a360/token
We're now going to take the access_token that we received back from Salesforce and use it as the subject token (aka the access_token) from our JWT Bearer Token Flow:

Make the Data Cloud API call with new token
Note: You'll want to reference your Tenant Specific Endpoint for these calls. This was also returned as the "instanceurl" in the above call.
Creating a Bulk API job for Data Cloud
Creating the job with Postman:
https://yoururlherec360a.salesforce.com/api/v1/ingest/jobs?limit=50&offset=&orderby=&state=
object = the object name for the API connector, which is found under Data Cloud Setup > Configuration > Ingestion API
sourceName = Source API Name
Operation = upsert

Upload the Bulk API job for Data Cloud
Use the ID from Create Job in your Upload Job URL:

Close the Bulk API job for Data Cloud - ISAAC TO UPDATE
`{
"object": "bb_customer_master",
"id": "558418e8-8cf4-493c-82ac-359f2273ca07",
"operation": "upsert",
"sourceName": "BB_dev2",
"createdById": "005Dn000002oXwsIAE",
"createdDate": "2024-03-15T14:51:44.317369Z",
"systemModstamp": "",
**"state": "UploadComplete",**
"contentType": "CSV",
"apiVersion": "v1"
}`
Review the Bulk API job for Data Cloud
Get Job Info call

state changes from InProgress to JobComplete

Validate in Data Cloud

We're not done quite yet!
While we might have data in Data Cloud, our job isn't quite complete yet. Our next steps may include mapping the DLO into the Data Model in order to actually have the data be of use for things like Identity Resolution or Segmentation.