Configuring Automatic Data Export in Adobe Experience Platform

- Advertisement -


The Adobe Experience Platform (AEP) has a wide range of native source connectors for data entry into the platform as well as a number of native destination connectors for publishing data to various marketing destinations enabled by the Real-Time Customer Data Platform (RTCDP). However, it is not uncommon that data extraction is required outside of the specific marketing activation use case provided by the RTCDP feature.

- Advertisement -

Multiple ways to export data from AEP

- Advertisement -

First, let’s take a look at the platform architecture and number all the different ways to extract data from the platform.

- Advertisement -

Real-time customer data platform – specific use case for marketing activation, for example, sending a profile to Facebook, Google profile lookup – using a known identity, looking up a single profile query service – data lake using the Postgresql client to query Commonly used to connect to adhoc analysis or BI reporting tools Data Access API – Bulk export of data from data lake

aep 2

As you can see there are a lot of options for data export, and at first glance it can be hard to figure out which method or combination of methods is right. I’ve put together a quick decision tree diagram that shows how we’ll decide which method fits a particular use case.

Hope this helps bring some context – the rest of the article we’re going to dive deeper into one of the most powerful yet complex methods out there, the Data Access API, and really put together a proof of concept! Adobe has a great tutorial on how to use the Data Access API to retrieve data from your data lake. In their tutorial, they show how to retrieve data by polling the Data Access API (see diagram below), which works perfectly but falls short of a fully automated data pipeline. An ideal data pipeline uses a push-pull mechanism to retrieve new data as it is loaded. We can accomplish this by combining the Data Access API with another feature, a webhook. Adobe Webhook will allow us to subscribe to events in the Experience Cloud in this Experience Platform and then use our own process to react to those events.

"graphic"

We’re going to walk you through building a proof of concept using a webhook and combine it with the Data Access API to build an automated data export pipeline from AEP to Google Cloud Platform (GCP). Note, this tutorial is using GCP because I personally wanted to learn more about GCP, but everything here may apply to other major cloud providers.

Creating an Automated Export Process 1. Create a Project in the Adobe Developer Console

First, we need to go to Adobe’s Developer Console. Contact your Adobe administrator to grant developer access to AEP if you haven’t already. If you already have a project, feel free to go ahead.

2. Create API Project
"graphic"

3. Webhook Event

AP

For this, you will need two components added to your project, “events” (for webhooks) and “api” (for data access APIs). Let’s start with adding the Webhook event to the project.

Your new project will be created with an automatic name, let’s first give it a meaningful name, then select “+Add to Project” and select Events.

This will open the “Add Event” overlay. Next, select “Experience Platforms” and then “Platform Notifications.”

"graphic"

On the next screen, you have several different events to subscribe to – for our purposes here select “Data Ingestion Notification”, which will let us know about new data in the AEP data lake.

"graphic"

On the next screen this webhook will ask for the URL. It is optional, but recommended, to set up a webhook via webhook.site, so that you can view the normal webhook payload. This article from Adobe has a good tutorial on setting it up. If you want to wait until the actual webhook is up and running, just enter a dummy URL here and save it.

4. Add Experience Platform API

graphic

Now, add the AEP API to the project. Start by clicking “+ Add to Project” and this time select “API”. This is necessary because you need an API project with credentials to access the Data Access API.

On the pop-up, select Adobe Experience Platform and then check “Experience Platform API”.

"graphic"

The next few screens will ask to either choose an existing key or upload a new one, then assign this API to the appropriate product profile. Select the appropriate option for your situation and press ‘Save’ at the end of the workflow. If you decide to generate credentials, be sure to store them in a safe place, as we will need them later.

5. Proof of Concept Solution Architecture

Below is a basic diagram that shows what we are going to use for this PoC in Google Cloud Platform (GCP) and it starts with using Google Cloud Functions to host the webhook endpoint. This function will listen to requests for Adobe.IO event subscriptions and, for each request, write the payload to a BigQuery table and then publish the Adobe batch ID to a pub/sub topic.

Then we have a second cloud function which is subscribing to the pub/sub topic, performs data retrieval from AEP, and writes the data to a Google cloud storage bucket.

"graphic"

This proof of concept is written in Python as it is my language of choice and you can find all the code in this post on Github. I have also put all the GCP command line (CLI) to create gcp resources in the respective readme files on Github.

Another thought, for this PoC I chose to use the new Gen2 Cloud Functions and at the time of writing they are still in beta. If you prefer the gen1 function, remove beta and –gen2 from the CLI command. This article from Google has a good explainer on the difference between the versions.

With that out of the way, let’s start with this real proof of concept!

To get started, let’s take a look at a sample event subscription payload –

{ “event_id”: “336ea0cb-c179-412c-b355-64a01189bf0a”, “event”: { “xdm:ingestionId”: “01GB3ANK6ZA1C0Y13NY39VBNXN”, “xdm:customerIngestionIdbe”C: “01GBKA6AN”: “0” xdm:complete”: 1661190748771, “xdm:datasetId”: “6303b525863a561c075703c3”, “xdm:eventCode”: “ing_load_success”, “xdm:sandboxName”: “dev” }, “recipient_client_client122_client_client_client_id”: “_client”: }

The most interesting information here is event.xdm:ingestionId, as it appears to be the AEP batch_id. It also has sandboxname and dataset id which will both be useful for retrieving data from the data lake. You can find Adobe’s documentation on the data ingestion notification payload here.

[Optional] Create BigQuery table

This is optional, but as someone who has worked with data systems for many years, a simple log table with what has been processed can really save you later. In this case we are just making some light changes and storing the payload of the payload in the BQ.

bq mk\–tables\mydataset.event_log\schema.json

*Note* You can find the schema.json file in the webhook folder in the Github repo.

6. Webhook Function

First, a quick pre-requisite, create a new pub/sub topic that the function will publish –

gcloud pubsub theme make aep-webhook

With this, clone the code from GitHub, navigate to the webhook directory sub-directory and then deploy as a cloud function:

gcloud beta functions deploy aep-webhook-test\ –gen2\ –runtime python39\ –trigger-http\ –entry-point webhook\ –allow-unauthenticated\ –source. \ –set-env-vars BQ_DATASET=webhook,BQ_TABLE=event_log,PUBSUB_TOPIC=aep-webbook

Once the deployment is complete, jump into the GCP console, navigate to Cloud Functions and you should see your new function, aep-webhook-test deployed. Copy new URL –

"graphic"

Then go back to Adobe Developer Console and enter this URL as your Webhook URL –

"graphic"

You should see an immediate request with a challenge parameter for the new webhook function. If everything is positioned correctly, the new function will respond with a challenge response and the Adobe Console will show its status as “Active”. If not, a good place to start is the Debug Tracing tab, this will show you the exact request Adobe sent and the response it received.

7. Data Processing Function

With the webhook function turned on, let’s go ahead and deploy the data processing function.

Let’s start building the storage bucket to land the data on –

gsutil mb gs://[yourname]-app-webhook-poc

If you cloned the code from Github, change the directory to subscribe-download-data, create a credentials folder and drop the credentials that were previously created in the Adobe Developer Console. Note: This is done for PoC only and it is recommended to use KMS (Key Management System) to store the credentials for the actual production pipeline.

gcloud deploy beta functions aep-pubsub-function-test \ –gen2 \ –runtime python39 \ –trigger-topic aep-webbook \ –entry-point subscribe \ –source . \ –memory=512MB \ –timeout=540 \ –set-env-vars GCS_STORAGE_BUCKET=[yourname]-webhook-poc

If everything runs correctly, after a few minutes you should see the function appear in your GCP Cloud Functions.

"graphic"
Depending on how busy your AEP environment is, it may take a few minutes to a few hours for the data to start showing up in the storage bucket.

"graphic"

You’ll notice that all the files are somewhat cryptically named parquet files. This is the basic format that is stored inside the AEP data lake.

after export

And with that, we have a simple pipeline that will automatically download and store .parquet files that are created in the AEP data lake. Obviously, we’ve just scratched the surface of what’s possible with a combination of event registration (webhooks) and data access APIs. Some of the thoughts I had in mind while working through this process were –

Land the files in a sub-folder per sandbox in the GCS bucket, using the API to see the name of the dataset associated with the parquet file to name it more user-friendly. Send data and notifications to a different location

Exporting data outside of AEP allows us to consider multiple use cases and activations, and from this demonstration, can be accomplished by following a few clearly outlined steps. I hope this tutorial was instructive and easy to follow, and perhaps inspires some new use cases for data activation!



Source link

- Advertisement -

Recent Articles

Related Stories