Implementation == this is my chance to get things right the first time.
This article dives into strategies to address potential risks during Data Cloud implementation, specifically Data Stream Ingestion from a Marketing Cloud data source (Data Extension), to ensure a smooth flow of data between Salesforce Data Cloud and Marketing Cloud.
We want to establish a fully harmonized common data model that delivers accurate, reliable, and actionable data promptly for informed decision-making and business success.
The goal? A fully harmonized, common data model.
How do we ensure that our implementation sets us up for success by providing accurate, reliable, and actionable data for decision-making and business success in a timely way that we'd expect?
We want to set our teams up for success by avoiding future concerns such as: "Why isn't the data from today's file showing up in my morning Segment like I expected?"
Following the flow of data going into Data Cloud
How can we better visualize Data Cloud & Marketing Cloud working together? Let's look at a common scenario:
Our use case: We'll be using Marketing Cloud & Data Cloud in order to meet the needs of a scheduled email send.
Marketing Cloud Scheduled Email Send: Our email send needs to go out from Marketing Cloud at 1:00pm EST.
Audience: The recipients should be from today's published Segment originating in Data Cloud that gets Activated to Marketing Cloud.
We'll need to ensure that we have the attributes required for the email send as part of the Activation (set in Data Cloud).
Reference
Salesforce resources that can be helpful when navigating between Marketing Cloud & Data Cloud
Create a Data Stream from a Data Extension
Data Stream Schedule - Marketing Cloud
Data Extension Extract Mode - Considerations
Data Cloud Decoded - Mapping Data Streams
Source data:
The source of underlying data that determines the audience is based upon a source vendor file, which already gets dropped to Marketing Cloud SFTP, residing in SFMC as a Data Extension.
This DE might get prepped & mapped in a Data Stream to Data Cloud, where it's ultimately available for Segmentation for our Marketing Cloud send (our Activation target). Data Streams leveraging the Marketing Cloud connector are going to run hourly or daily.

Floating down a stream...
Let's look at the flow of data in a different way:
Visualise yourself as a tiny file of data, floating down a stream. You're drifting along a Data Stream to a destination (a big lake).
If our only vantage point below was Data Cloud (the destination), we'd have an incomplete picture. We can see that there's a Mapped Data Stream coming in, but currently don't have context as to what that data consists of, how often it arrives, what time it arrives, or what we'd be expecting.
If we're going to harmonize, we need to better understand the Data Stream here - we need more context.
What drives this Data Stream? How often does this Data Stream run? What downstream processes rely upon this Data Stream? We'll want to identify in Data Cloud when the Data Stream runs (down to the minute).
What Segments & Activations rely upon it in Data Cloud? We'll need to visualize the Data Model to better understand the how of Segments and Activations.
But before we get there, the data needs to originate somewhere, right?
Where does the Data Stream originate?
Digging into Marketing Cloud
In this example, we see the file get dropped first in Marketing Cloud, specifically a Source File. Our current assumption is that the source file is dropped in Marketing Cloud SFTP for ingestion into a Data Extension.
In order to confirm this, we'll need to dig a bit deeper into Marketing Cloud. We'll want to spend time with our Marketing Cloud hat on, in order to understand the relevant Import Activities, Data Extensions and related Automations and SQL activities that may need to be designed to support this Data Stream.
Let's also assume in this scenario that our Source File is a daily .txt file with customer data that's useful for our Email Send. We'll need this detail in Data Cloud to ensure the customer shows up in our Daily published Segment when we expect it to.

Starting in Marketing Cloud with the Source File
While Importing our .txt file into Marketing Cloud, the first hop is a result DE. This DE (IN_DataExtension) should contain results that match our .txt file, provided there aren't any errors.
Reference Data Extensions
Now for our use case, we might need to enhance and supplement the source data before it gets to Data Cloud. Perhaps we need to look at a reference Data Extension in order to pull the SubscriberKey, Preferences, etc.
In Marketing Cloud, we'll often perform this via an Automation Studio SQL Activity, and then place the DE results in a final Data Extension. This is the DE (Data Extension) that we'll map as a Data Stream into Data Cloud.

The Data Stream flows
Let's zoom out again and switch gears back to Data Cloud. Recall that this Data Stream flows along eventually into a Data Lake (DLO).
The Lake and Cloud interact (evaporation anyone?)
We're data, floating along a Data Stream into Data Cloud. Once Harmonized and Unified within Data Cloud, the latest data is accessible to use as part of Segmentation
So how do I know the Data Stream is flowing?
Timing Considerations
How can the overall timing of the Data Stream impact my audience send?
Data Cloud checks for files to process hourly, but how many minutes after the hour? Can we change this afterwards?
Fun fact: The time (minutes) when the Data Stream is ultimately created determines the future time in minutes of the hour that the Data Stream will run.
Has the data "flowed" to its destination in Data Cloud in time for it to become part of a published Segment?
Let's visit a real world scenario:
In Marketing Cloud, we have an email that needs to go out at 1:00pm EST. It's using a published segment from Data Cloud.
Example: If I created a new Data Stream at 11:04am, the hourly data stream runs will occur in the future at a frequency of 4 minutes after the hour (2:04, 3:04, 4:04) and so on. Our Data Extension detail will get updated via the Data Stream on an hourly basis. We'll see shortly how this is a very important detail in our implementation.
In Data Cloud, we currently have a scheduled Segment Activation publish time of 12:00 pm EST.
How long will it take to get to Marketing Cloud? Can we confidently say that the email will go out at 1:00pm EST without issue?

Callouts:
Data Cloud Segment publish time: 12:00 pm
Marketing Cloud Scheduled Send Time: 1:00pm
Opportunities: Identify how long it takes for your source file to arrive in Data Cloud. Test out the time it takes for your Segment to get Activated to Marketing Cloud. Does this meet the needs of your use case?
Recommendations: Inspect your data stream timing if you have a qualifying use case, compare with source DE in Marketing Cloud.
Test!
Timestamps in the source file can be your friend.
We wouldn't want an email going out again today to yesterday's audience. What failsafe methods can we introduce here?
Using Marketing Cloud features in Segment Validation
Once we bring our relevant Segment data back into Marketing Cloud, we can validate a bit further. In the Shared Data Extension, a default timestamp field can be helpful when determining which Segment audience got published when. This can be addressed by adding a new field w/ a populated timestamp when for a Segment DE. This needs to be done on the Segment DE after the first successful publish (DE is then created).
In Marketing Cloud, we could use this date, along with other detail included from Data Cloud in our validation efforts prior to email send. This could be evaluated with SQL to be referenced by a Verification Activity in Automation Studio. If the segment didn't publish, we could then incorporate a way to stop the automation from proceeding & send an alert.
How could we fix the data stream to adjust the time (minutes)?
We can't edit the time of the Data Stream after initial creation.
To fix: We'll have the unlink any relationships from the DMO side, and then create a net new Data Stream, reminding yourself that the initial timestamp of (00:minutes) you created it determines future hourly runs. This then gets linked back into the Data Model.
Data Flow Timing (a recap of our example):
Source system to Marketing Cloud: 11:48 am (would miss Unification process and be too late for the scheduled Segment publish time.
or
Source system to Marketing Cloud: 10:08 am:
Marketing Cloud to Data Cloud via DFU: 10:34 am
Data Stream records arrive in Data Cloud: 10:46 am
Unification completes at 11:45 am (unification completion time is not exact)
Segment publish time: 12:00 pm
Segment processes into Marketing Cloud (or other destination): 12:35 pm
In Summary
Quality Planning
Document your steps, and design your Data Flow with the Delivery (who needs this, when is it needed) in mind. Work backwards from the destination - will the data we need arrive in time? Have we tested scenarios for when a Data Stream doesn't run?
Data Quality & Vision
Get the team comfortable! Getting data aligned between Marketing Cloud & Data Cloud is a team effort. When the team is aligned, everyone understands the importance of accurate, reliable and actionable data. Moving forward, this gives everyone more confidence in our steps and processes.
Data Quality Monitoring
Implement data quality controls and monitoring mechanisms (think Flows) to reguarly assess & monitor the accuracy, consistency and completeness of data stored in Salesforce Data Cloud.
Automated tools such as Flow Builder can be a lifesaver for identifying issues in real-time.
Lessons Learned
Continuously improve processes and procedures based on feedback, lessons learned from implementation, as well as changing business requirements.
Continuous improvement to drive enhancements in data quality is the pathway to success within Salesforce Data Cloud & Marketing Cloud!