Sample use case - Daily Customer Master file

Note: This is Part 2 of our Data Cloud Bulk Ingestion API piece, which digs more into the technical details of the implementation. Check out our planning & prep stage here in Part 1.

At a zoomed out view, we're taking a Customer Master File in a .csv file, performing an API Call, which facilitates the transport of csv file into Data Cloud, where it resides in a Data Lake Object. This data from the csv file, when mapped into the Data Model, can help enrich & inform the Unified Profile, and ensures that Data Cloud has the latest & greatest Customer Master info available.

From csv to Data Cloud

  • What does it look like to get data in to Data Cloud via the API?
  • What are some considerations before we start to bring data in to Data Cloud via this pattern?

Getting our source data (and getting it right)

We'll need to understand more around the csv file, including field details, file frequency, as well as understanding any supporting elements. We need to understand how it's going to fit into Data Cloud.

Does the source data fit into what Data Cloud can accept?

Review these Data Cloud requirements for Bulk Ingestion (https://developer.salesforce.com/docs/atlas.en-us.c360a_api.meta/c360a_api/c360a_api_bulk_ingestion.htm)

  • Empty field values are set to null
  • Files must be UTF-8
  • Bulk API only supports comma field delimiters (sorry | or tab)
  • Data must not exceed 150MB (You can split a larger file into 150MB increments and batch of up to 100 files)
  • Update records are full replace

Making Connections

Where to begin?

It's critical to understand the data model and mapping process. Getting the data there is just as important. The following activites can be evaluated prior to implementation:

1. Creating a private key and self-signed digital certificate for authentication purposes. We'll need a Private Key created and available

2. Creating a Connected App in Salesforce, including setting up OAuth scopes, callback URL, and IP relaxation policies. We'll need to have a Connected App created in Salesforce config

3. Using JWT tokens to authenticate API calls and obtain a new access token for Data Cloud. We'll need to generate a JWT & exchange this Token to proceed with our Job

4. Making a Data Cloud API call using the new access token obtained from the previous step.

5. Creating a Bulk API job for Data Cloud and uploading the job.

6. Closing the job and reviewing its status.

The guide provides step-by-step instructions and screenshots to help you complete each task successfully. By following this guide, you should be able to create a connected app for Salesforce and use it to perform a Batch API POST to Data Cloud.️

It's more than just a simple file upload

Keep reading for how to create a connected app for Salesforce and use it to perform a Batch API POST to Data Cloud.

This guide covers the following topics:

Dig into to Salesforce documention to learn more on the following topics:

  • source data aligned to Data Cloud ingest capabilities (.csv file) (understand what we're working with and design ahead of time)
  • We'll need a Private Key created and available (Step 1 above)
  • We'll need to have a Connected App created in Salesforce config (Step 2)
  • We'll need to generate a JWT & exchange this Token to proceed with our Job
  • Align source data to Data Cloud structure
  • Data Lake Object Creation
  • Data Stream Creation
  • Connect API Creation & Schema Alignment
  • Create Job > Upload Job > Close Job
  • Review & QA
Key players we'll need:

The Data - confirm source data and align. What will this inform for Segmentation & Identity Resolution? What DMOs will we map this into? It's helpful to develop a zoomed out view of what the scene will look like once the dust is settled.

The Team - We're going to need a Security/Systems Support. They'll be needed potentially for Private Key generation, Token Creation, Permission Assignments or even Connected App setup. This depends on your overall organization's roles and responsibilities.

Alignment is Key

Configuration - how will this fit into existing business systems and processes? Your development and implementation team will need to become familiar with the "hops" in between the token exchange to make a Data Cloud Direct API call vs a Connect API call. Tools such as Postman and jwt.io are essential. Access to Connected App credentials and Salesforce Admin configuration is required.

Getting the plumbing connected

Independent activities can happen before we even have the Customer Master file. Let's make sure the plumbing works.

Standalone activities:

  • Key Creation
  • Connected App Setup
    • Requirements: Key Creation
  • Salesforce API Testing
    • Requirements: Connected App, JWT Token creation, Postman, Sample data

Sample Action Item: Create a simple proof of concept using a sample csv file, and a test Data Lake Object & supporting Data Stream

Confirm the ask

Review the customer master layout that we have from Business. Will this work as-is? What might need to be tweaked here in order to be successful within Data Cloud?

Think about: Missing values, duplicate entries, data inconsistencies can impact overall insights and confidence in the implementation.

Full Name should be FirstName and LastName

Could we address this with a formula field at the DLO level?

Align the Customer Master & model for future in Data Cloud:

Schema

Review the Schema that will be required

I've uploaded a sample schema for you to download & start with: sample_data_cloud_schema.yaml

DLO

Design what the Data Lake Object will look like that will house the records from Customer Master.

Jump to our section here to compare between csv and DLO.

Data Stream

Design what the Data Stream will look like.

Frequency & timing: Are there any timing implications around downstream activities depending on when the data arrives that we need to be aware of?

Connect

Create the API Connection in Data Cloud Setup

We'll walk thru the steps from uploading your Schema thru to validation.

The big picture - the Data Model & Unified Profile. What DMOs will the DLO ultimately tie into to support success efforts?

A Proof of Concept Test

What data are we collecting?

Let's simulate trying to bring in a customer master file into Data Cloud. Even if we don't have customer data yet, we can simulate the data and steps we may encounter during implementation.

Create test data

Testing things out - use mock data & prepare for Data Cloud

In this example, I'll use mock data to test out our flow - mockaroo.com is a service that will work quite nicely for generating sample data that we can use for our PoC:

Using Mockaroo to create sample test data for our proof-of-concept

Sample Fields

Field Name Mockaroo Type Future Data Cloud Type Notes
cid New Number Number Primary Key
full_name Full Name text
first_name First Name text
last_name Last Name text
email_address Email Address text
address1 Street Address text
address2 Address Line 2 text
city City text
state State (abbrev) text
zip Postal Code text
DateAdded Datetime datetime
record_updated Datetime datetime This will be our Record Modified Field in Data Cloud

Our headers:

cid,fullname,firstname,lastname,emailaddress,address1,address2,city,state,zip,DateAdded,DT_Added

Preview output, then save to .csv to prepare for Bulk API load

How is this going to map into the Data Model? Think about what we're collecting and how it will need to be configured in Data Cloud

CSV output of test records created above

Create the DLO (Data Lake Object)

align the fields to what we're bringing in. We don't need to perform Data Mapping yet.

Source Name Data Stream Field API Name Data Stream Data Type Data Lake Object Field API Name
Notes
DataSource__c
Auto added
Text DataSource__c
Auto Added
DataSourceObject__c
Auto added
Text DataSourceObject__c
Auto Added
address1 address1__c Text address1__c
address2 address2__c Text address2__c
cid cid__c Number cid__c Used as Primary Key
city city__c Text city__c
dateadded dateadded__c Date dateadded__c
dt_added dt_added__c DateTime dt_added__c Record Modified Field
email_address email_address__c Text email_address__c
first_name__c first_name__c Text first_name__c
full_name__c full_name__c Text full_name__c
last_name__c last_name__c Text last_name__c
state__c state__c Text state__c
zip__c zip__c Text zip__c
cdp_sys_SourceVersion__c
Auto added
KQ_cid__c
Auto added
Key Qualifier"cid" Key Qualifier
Confirm Data Lake Mapping if not created already

Configure the Ingestion API

Getting there: Navigate to Data Cloud Setup > Ingestion API

Connector Name - we'll need to reference this when making our Postman API call. This will be the sourceName used in the call.

Data Cloud Setup > Ingestion API
View / Update Schema for API call

Ingestion API Schema - focus on the Data, not the Code

Getting there: The existing schema can be downloaded or updated via Data Cloud Setup > Ingestion API > Connector. The Download Schema and Update Schema options are available on the Schema detail page.

Review existing Schema

Start here to download a sample schema if needed:

https://help.salesforce.com/s/articleView?id=sf.c360_a_ingestion_api_schema_req.htm&type=5

Use the sample schema, and align to what you'll need for your use case. Note the distinction between date and date-time.

The schema simply describes the what.

The YAML schema is a blueprint for your data.

It describes that data that we'll be uploading using the Ingestion API. The data for us will be from the customer_master.csv file.

We need to tell Data Cloud what to expect. The structure of what we'll be sending to Data Cloud is defined within the OpenAPI Schema, in .yaml format.

Within the schema that we'll upload, we want to specify the data types that accurately represent what we're uploading to Data Cloud, aligned to how Data Cloud can accept it.

The Schema helps inform Data Cloud on what to expect

Sample schema in .yaml format
Save your schema file to .yaml, then Upload your schema:

Save your schema file to .yaml, then Upload your schema:

Preview the schema to ensure that your fields have been appropriately identified:

Preview Data Cloud Schema
Ingestion API is accessible from Data Cloud Setup

Create the Data Stream

Data Streams > New Data Stream > Ingestion API

Select your API Connector you created in the previous step, and confirm the Object Name to be populated:

Select API Connector from previous step

Configure Category (Profile, Engagement..), Primary Key and Record Modified Fields

Configure Details around Category, Primary Key, Record Modified Fields

Confirm Data Stream & Data Space Filtering details

Confirm your object & data space filtering if applicable. Click Deploy.

Confirm Object & Data Space Filtering (if applicable)

You can now verify your Data Stream details:

Verify Data Stream details

Ok. We have a Data Stream created, we have a DLO and a destination.

We now need to get the bridge built between our systems and Data Cloud, and that'll be a few hops of a Connected App, tokens and API calls.

Here's where it gets a little wild.

Connected What?

On the Salesforce config side, we'll create a Connected App, a Salesforce feature that has Security, Access and Flexibility at its core.

What is a Connected App? A way to access Salesforce and ensure:

Simplified User Management - Connected apps use tokens for authentication, eliminating the need to manage additional user credentials within Salesforce. The steps we'll take next help get aligned.

Enhanced Security - Leverage OAuth framework, allowing granular permissions. This minimizes risk of unauthorized access.

Improved Monitoring & Auditing - A clear audit trail can be established via API usage. Track which Connected Apps are making calls. This can help quite a bit while testing out the integration, and helps ensure compliance is aligned with data security regulations.

Reference:

Connected App terminology: https://help.salesforce.com/s/articleView?id=sf.remoteaccess_terminology.htm&type=5

Create a private key & self signed digital certificate

We'll create a key and details that we exchange with Salesforce to ensure that we are who we say we are. We'll use this detail as well as part of the JWT Token that gets passed to Salesforce, in order to exchange for our Data Cloud access token, in order to perform our Batch API POST.

Follow these steps. The output should be a server.key and a server.crt file that we'll use in our next activities.

Reference: https://developer.salesforce.com/docs/atlas.en-us.242.0.sfdx_dev.meta/sfdx_dev/sfdx_dev_auth_key_and_cert.htm

Create Connected App

We'll create this in Data Cloud. Ensure the following:

Use digital signatures. Upload server.crt from previous step.

Ensure select OAuth Scopes align to what you're trying to accomplish.

Update Callback URL. In our example, I'm using the Postman callback URL https://oauth.pstmn.io/v1/callback

Permitted Users - Change to Admin approved users are pre-authorized

IP Relaxation - Change to Relax IP Restrictions

Refresh Token Policy - Change to "Refresh Token is Valid until revoked"

Click Manage Profiles and add your Profile to the Connected App.

Note your Connected App Client ID - you'll need that in the next step.

Need to get back to your Connected App credentials? Navigate to Setup > App Manager > Manage or View

Create a JWT

Wait - why not Client Credentials flow? It's doable, but with JWT we're not tying the Connected App to a User. The JWT Bearer Flow can be more scalable, more secure and a bit more complex. The JWT contains more context than just the sandard client_id & client_secret exchange.

JWTs can be generated and cached ahead of time - which may be beneficial as things grow.

JWTs help add a layer of trust and data security, rather than relying solely on being one-of-a-kind tokens. Our token will contain unique data, including our Client ID for the Connected App, the user, login url and expiration date. This information is encrypted using the RS256 algorithm and is shared in a future API call.

JWTs can be signed using cryptographic methods. This signature verifies the sender and ensures the content hasn't been tampered with during transmissions. This can be considered a form of uniqueness in data integrity.

We'll use www.jwt.io to create a JWT Token that will be used in future calls:

Reference: https://help.salesforce.com/s/articleView?id=sf.jwt_access_tokens.htm&type=5

JWT resources:

Steps to create a JWT for Salesforce

1. Navigate to jwt.io

2. Update Algorithm settings & Payload detail.

1. iss: Connected App Client ID

2. sub: user

3. exp: go to unixtimezone.com

3. under Verify Signature, clear out Public Key area.

4. On the left hand side (ignore Invalid Signature) copy the encoded detail. Use the newly Encoded Token, as the assertion in your next step:

SteveTechArc's JWT Bearer Flow video is an excellent resource

Use jwt.io to Encode JWT that will be used in next step

JWT Bearer Token Flow:

In Postman, we'll make the following POST with our assertion details, in order to obtain a new access token that will be used to get to Data Cloud.

POST to https://login.salesforce.com/services/oauth2/token

Content-Type: x-www-form-urlencoded

grant_type: urn:ietf:params:oauth:grant-type:jwt-bearer

assertion: JWT from previous step

client_id: connected app

client_secret: connected app

Obtain access token we will use in future call
`{"accesstoken":"00DDn000003oeek!AQEAwwD4Dxh4uaOsqMNODfxZ1OFYG1gVuaMaQOri0mwXA0QqM6xtcUofclFWHlbX4NxtI7cVTGPPV6cO2WXO1QBDBhmLeClO","scope":"cdpqueryapi cdpingestapi api cdpapi","instanceurl":"https://yoururlhere.my.salesforce.com","id":"https://login.salesforce.com/id/00DDn000003qeekMAA/005Dn000002qXwsIAE","token_type":"Bearer"}`

Exchange for a Data Cloud Token

Reference reminder: https://developer.salesforce.com/blogs/2023/07/load-data-programmatically-with-the-ingestion-api

POST: https://yoururlhere.my.salesforce.com/services/a360/token

We're now going to take the access_token that we received back from Salesforce and use it as the subject token (aka the access_token) from our JWT Bearer Token Flow:

Make the Data Cloud API call with new token

Note: You'll want to reference your Tenant Specific Endpoint for these calls. This was also returned as the "instanceurl" in the above call.

Creating a Bulk API job for Data Cloud

Creating the job with Postman:

https://yoururlherec360a.salesforce.com/api/v1/ingest/jobs?limit=50&offset=&orderby=&state=

object = the object name for the API connector, which is found under Data Cloud Setup > Configuration > Ingestion API

sourceName = Source API Name

Operation = upsert

Create initial job from postman using /ingest/

Upload the Bulk API job for Data Cloud

Use the ID from Create Job in your Upload Job URL:

Upload the csv data in the /injest/ job

Close the Bulk API job for Data Cloud - ISAAC TO UPDATE

`{
"object": "bb_customer_master",
"id": "558418e8-8cf4-493c-82ac-359f2273ca07",
"operation": "upsert",
"sourceName": "BB_dev2",
"createdById": "005Dn000002oXwsIAE",
"createdDate": "2024-03-15T14:51:44.317369Z",
"systemModstamp": "",
**"state": "UploadComplete",**
"contentType": "CSV",
"apiVersion": "v1"
}`

Review the Bulk API job for Data Cloud

Get Job Info call

State can be "InProgress" for some time

state changes from InProgress to JobComplete

State is now "JobComplete", indicating our records should be accessible from Data Cloud

Validate in Data Cloud

Records successfully came in as expected during our 7:03 AM refresh

We're not done quite yet!

While we might have data in Data Cloud, our job isn't quite complete yet. Our next steps may include mapping the DLO into the Data Model in order to actually have the data be of use for things like Identity Resolution or Segmentation.