Data is only as valuable as our ability to use it.
Note: This is Part 1 of our Data Cloud Ingestion API article, which digs into the considerations before implementation when using the Bulk pattern.
Before we bring data into Data Cloud via the Ingestion API, considerations include evaluating the data's format and quality, assessing the impact on existing systems, and determining the appropriate integration approach.️ Let's ensure we're setting ourselves up for success.
With the help of the Ingestion API, Salesforce Data Cloud truly can help drive success. Before bringing data into Data Cloud, it’s critical to evaluate its format, quality, systems impact, and integration needs.
By setting up Data Cloud for success on the front end, the Ingestion API can become a key tool for getting data modeled, segmented and activated quickly.
Both the Streaming and Bulk API patterns are great way to ingest a lot of data efficiently. This two-part article offers considerations and a sample proof of concept for walking through the Bulk API specifically.
What can success look like with the Ingestion API, a key tool in our Data Cloud playbook?
Let's dive deeper into using the Ingestion API, specifically the Bulk pattern (review this SF walkthru) one way to help get our data into the field (Modeled, Segmented & Activated) quickly and efficiently.
For our Quarterbacks/Product Owners/Success team:
What does it look like to get data in to Data Cloud via the API? The Ingestion API is one way to get the right data in to Data Cloud, aligned to the Data Model.
How do I determine if Bulk is the way to go? Who do I need to engage with to ensure success? What team members are crucial for me to get marching down the field quickly and efficiently?
What are some considerations before we start to bring data in to Data Cloud via this pattern?
Sample use case - Daily Customer Master file
The Business stores customer master data in a mainframe system. There is already an existing business process which outputs a daily csv file at 5:00am. This file is already available to other systems as customer_master.csv from a shared location.
While there may be multiple paths to getting data in to Data Cloud, the Business prefers to use existing systems & processes where possible.
They'd like to fit the use of Salesforce Ingestion API in order to get this data into Data Cloud on a daily basis as this pattern fits their existing processes quite nicely.
As data gets onboarded into Data Cloud, new capabilties become unlocked with respect to the Unified Profile, and the 360 view of the customer. This then will inform further Identity Resolution and Segmentation efforts with the Unified Profile in mind.
- What does it look like to get data in to Data Cloud via the API?
- What are some considerations before we start to bring data in to Data Cloud via this pattern?

The data we'll be working with
New insights from improved data quality drive growth.
Before we even start looking at the data we'll be working with, let's level set.
Data Cloud isn't necessarily designed as a way to standardize or fix your bad data. It's just better to clean it up ahead of time.
In an ideal state, data remediation should happen upstream at the source.
Plan for Improvement
Features within Data Cloud such as Data Explorer help to highlight potential data hygiene issues that may need to be addressed.
Unifying disorganized data within Data Cloud just might reveal new data points - these are opportunities to enrich your customer profiles!
Fix your data at the source == we have an opportunity!
Continuous Improvement is the way
It's helpful to schedule data cleanup tasks alongside your daily Data Cloud tasks. Leverage Data Cloud access capabilities within tools such as DBeaver or Data Explorer to review & improve. Keep track of your cleanup efforts - it's a success metric!

From csv to Data Cloud
At a zoomed out view, we're taking a Customer Master File in a .csv file, performing an API Call, which transports the csv file into Data Cloud, where it resides in a Data Lake Object.
This data from the csv file, when mapped into the Data Model, can help enrich & inform the Unified Profile, and ensures that Data Cloud has the latest & greatest Customer Master info available.
Peeling back the layers
How Much Data Are We Talking About?
Think about: site visit files, purchase history - the amount of data that ingested on a daily basis could build quickly.
Action Item:
Identify data sources and estimate the amount of data each will generate daily, weekly, monthly. Think beyond the first file.
Can We Trust the Data?
Think about: Missing values, duplicate entries, data inconsistencies can impact overall insights and confidence in the implementation.
Action Item:
Implement ongoing data cleaning and validation processes at the source, before feeding data to Data Cloud.
This takes time.
Prioritize the cleanup efforts and track leading indicators of improvements.
Action Item:
Evaluate if realtime or batch is needed for different data sources within Data Cloud.
Explore options like the Streaming API or pre-built connectivity for real time data ingestions.
How Fast is the Data Coming In & What is the Expectation?
Think about: Realtime site activity, Streaming API calls, 3rd party sensor inputs - all of these may require quick turnaround from a processing & availability perspective.
What Kind of Data Are We Dealing With?
Think about: Alignment on data needs to occur before it can be processed by Data Cloud.
Data needs to be normalized before it be mapped within Data Cloud. We're aligning unstructured data to a normalized structure.
It's important to gain an understanding of the field level data. This will then help determine how source data needs to be transformed, if necessary so that it can be successfully mapped into the Data Cloud data model.
Action Item:
Work towards data schema creation and definition in advance.
This is a great opportunity to partner with the Business to better understand the why around the data, as well as educate on Salesforce Data Cloud features to become aware of, such as alignment on the YAML specification for a Data Stream schema.
Action Item:
Prioritize data sources that help unlock more actionable insights & help enhance the Customer 360 profile.
Data Cloud can help unlock the value hidden in your data.
Understanding the Customer's Vision
Think about: Ingested data may help to inform and update existing behaviors, preferences and buying journeys. Think about how exisiting Calculated Insights (CI) might benefit.
What Might This Look Like Later?
Data Cloud operates on a consumption-based pricing structure. With that in mind, it's critical that we better understand the data that we're working with. What new customer insights might we gain?
Setting the stage
Configuration
How will this fit into existing business systems and processes?
Your development and implementation team will need to become familiar with the "hops" in between the token exchange to make a Data Cloud Direct API call vs a Connect API call.
Tools such as Postman and jwt.io are essential.
Access to Connected App credentials and Salesforce Admin configuration is required.
Getting the plumbing connected
Independent activities can happen before we even have the Customer Master file.
Let's make sure we can get the plumbing going first.
We can start today by looking into:
Key Creation
What's it look like to create a Private/Public Key and to use within a Connected App.
For more context, check out these other Connected App Use Cases from Salesforce.
Connected App Setup
Review Salesforce Connected App methodology and related API connectivity on it's way to the Data Cloud API.
Check out this helpful Trailhead on Connected Apps
Salesforce API Testing & Config
Become familiar with: Connected App, JWT Token creation, Postman, Sample data. Here's a great guide.
Sample Data Validation
Sample Action Item: Create a simple proof of concept using a sample csv file, and a test Data Lake Object & supporting Data Stream
Will this work as-is?
Consider:
Missing values, duplicate entries, data inconsistencies can impact overall insights and confidence in the implementation.
For our sample data (customer_master.csv), we'll want to address Full Name in order to align it to our Data Model, or we can use FirstName and LastName.
Could we address this with a formula field at the DLO level? Align the Customer Master & model for future in Data Cloud. Always consider what might need to be refined or adjusted here in order to be successful within Data Cloud?
Think about: Missing values, duplicate entries, data inconsistencies can impact overall insights and confidence in the implementation
Getting our source data (and getting it right)
From the Business side of things, we'll need to understand more around the csv file, including field details, file frequency, as well as understanding any supporting elements.
Key players we'll need:
The Data - confirm source data and align. What will this inform for Segmentation & Identity Resolution? What DMOs will we map this into? It's helpful to develop a zoomed out view of what the scene will look like once the dust is settled.
The Team - We're going to need a Security/Systems Support. They'll be needed potentially for Private Key generation, Token Creation, Permission Assignments or even Connected App setup. This depends on your overall organization's roles and responsibilities.
Resources
Auth Keys & Certs:
Access Tokens for Data Cloud:
Getting things connected & our sample use case
It's time to get our hands dirty. We'll walk thru the entire use case of getting data to Data Cloud!
We'll be digging into Salesforce Admin Configuration to get started. In Setup, we'll create a Connected App, a core Salesforce feature designed with Security, Access and Flexibility at its core.
What is a Connected App? A way to access Salesforce and ensure:
Simplified User Management - Connected apps use tokens for authentication, eliminating the need to manage additional user credentials within Salesforce. The steps we'll take next help get aligned.
Enhanced Security - Leverage OAuth framework, allowing granular permissions. This minimizes risk of unauthorized access.
Improved Monitoring & Auditing - A clear audit trail can be established via API usage. Track which Connected Apps are making calls. This can help quite a bit while testing out the integration, and helps ensure compliance is aligned with data security regulations.