How to Integrate Salesforce to Snowflake: 3 Easy Methods
LIKE.TG 成立于2020年,总部位于马来西亚,是首家汇集全球互联网产品,提供一站式软件产品解决方案的综合性品牌。唯一官方网站:www.like.tg
Salesforce is an important CRM system and it acts as one of the basic source systems to integrate while building a Data Warehouse or a system for Analytics. Snowflake is a Software as a Service (SaaS) that provides Data Warehouse on Cloud-ready to use and has enough connectivity options to connect any reporting suite using JDBC or provided libraries.
This article uses APIs, UNIX commands or tools, and Snowflake’s web client that will be used to set up this data ingestion from Salesforce to Snowflake. It also focuses on high volume data and performance and these steps can be used to load millions of records from Salesforce to Snowflake.
What is Salesforce
Image Source
Salesforce is a leading Cloud-based CRM platform. As a Platform as a Service (Paas), Salesforce is known for its CRM applications for Sales, Marketing, Service, Community, Analytics etc. It also is highly Scalable and Flexible. As Salesforce contains CRM data including Sales, it is one of the important sources for Data Ingestion into Analytical tools or Databases like Snowflake.
What is Snowflake
Image Source
Snowflake is a fully relational ANSI SQL Data Warehouse provided as a Software-as-a-Service (SaaS). It provides a Cloud Data Warehouse ready to use, with Zero Management or Administration. It uses Cloud-based persistent Storage and Virtual Compute instances for computation purposes.
Key features of Snowflake include Time Travel, Fail-Safe, Web-based GUI client for administration and querying, SnowSQL, and an extensive set of connectors or drivers for major programming languages.
Methods to move data from Salesforce to Snowflake
- Method 1: Easily Move Data from Salesforce to Snowflake using LIKE.TG
- Method 2: Move Data From Salesforce to Snowflake using Bulk API
- Method 3: Load Data from Salesforce to Snowflake using Snowflake Output Connection (Beta)
LIKE.TG ">Method 1: Easily Move Data from Salesforce to Snowflake using LIKE.TG
LIKE.TG is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.
It is that simple. While you relax, LIKE.TG will take care of fetching the data from data sources like Salesforce, etc., and sending it to your destination warehouse for free.
Get started for Free with LIKE.TG !Here are the steps involved in moving the data from Salesforce to Snowflake:
Step 1: Configure your Salesforce Source
- Authenticate and configure your Salesforce data source as shown in the below image. To learn more about this step, visit here.
In the Configure Salesforce as Source Page, you can enter details such as your pipeline name, authorized user account, etc.
In the Historical Sync Duration, enter the duration for which you want to ingest the existing data from the Source. By default, it ingests the data for 3 months. You can select All Available Data, enabling you to ingest data since January 01, 1970, in your Salesforce account.
Step 2: Configure Snowflake Destination
- Configure the Snowflake destination by providing the details like Destination Name, Account Name, Account Region, Database User, Database Password, Database Schema, and Database Name to move data from Salesforce to Snowflake.
In addition to this, LIKE.TG lets you bring data from 150+ Data Sources (40+ free sources) such as Cloud Apps, Databases, SDKs, and more. You can explore the complete list here.
LIKE.TG will now take care of all the heavy-weight lifting to move data from Salesforce to Snowflake. Here are some of the benefits of LIKE.TG :
- In-built Transformations – Format your data on the fly with LIKE.TG ’s preload transformations using either the drag-and-drop interface, or our nifty python interface. Generate analysis-ready data in your warehouse using LIKE.TG ’s Postload Transformation
- Near Real-Time Replication – Get access to near real-time replication for all database sources with log based replication. For SaaS applications, near real time replication is subject to API limits.
- Auto-Schema Management – Correcting improper schema after the data is loaded into your warehouse is challenging. LIKE.TG automatically maps source schema with destination warehouse so that you don’t face the pain of schema errors.
- Transparent Pricing – Say goodbye to complex and hidden pricing models. LIKE.TG ’s Transparent Pricing brings complete visibility to your ELT spend. Choose a plan based on your business needs. Stay in control with spend alerts and configurable credit limits for unforeseen spikes in data flow.
- Security – Discover peace with end-to-end encryption and compliance with all major security certifications including HIPAA, GDPR, SOC-2.
Method 2: Move Data From Salesforce to Snowflake using Bulk API
What is Salesforce DATA APIs
As, we will be loading data from Salesforce to Snowflake, extracting data out from Salesforce is the initial step. Salesforce provides various general-purpose APIs that can be used to access Salesforce data, general-purpose APIs provided by Salesforce:
- REST API
- SOAP API
- Bulk API
- Streaming API
Along with these Salesforce provides various other specific purpose APIs such as Apex API, Chatter API, Metadata API, etc. which are beyond the scope of this post.
The following section gives a high-level overview of general-purpose APIs:
Synchronous API: Synchronous request blocks the application/client until the operation is completed and a response is received.
Asynchronous API: An Asynchronous API request doesn’t block the application/client making the request. In Salesforce this API type can be used to process/query a large amount of data, as Salesforce processes the batches/jobs at the background in Asynchronous calls.
Understanding the difference between Salesforce APIs is important, as depending on the use case we can choose the best of the available options for loading data from Salesforce to Snowflake.
APIs will be enabled by default for the Salesforce Enterprise edition, if not we can create a developer account and get the token required to access API. In this post, we will be using Bulk API to access and load the data from Salesforce to Snowflake.
The process flow for querying salesforce data using Bulk API:
The steps are given below, each one of them explained in detail to get data from Salesforce to Snowflake using Bulk API on a Unix-based machine.
Step 1: Log in to Salesforce API
Bulk API uses SOAP API for login as Bulk API doesn’t provide login operation.
Save the below XML as login.xml, and replace username and password with your respective salesforce account username and password, which will be a concatenation of the account password and access token.
<?xml version="1.0" encoding="utf-8" ?>
<env:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
<env:Body>
<n1:login xmlns:n1="urn:partner.soap.sforce.com">
<n1:username>username</n1:username>
<n1:password>password</n1:password>
</n1:login>
</env:Body>
</env:Envelope>
Using a Terminal, execute the following command:
curl <URL> -H "Content-Type: text/xml;
charset=UTF-8" -H "SOAPAction: login" -d @login.xml > login_response.xml
Above command if executed successfully will return an XML loginResponse with <sessionId> and <serverUrl> which will be used in subsequent API calls to download data.
login_response.xml will look as shown below:
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns="urn:partner.soap.sforce.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<loginResponse>
<result>
<metadataServerUrl><URL>
<passwordExpired>false</passwordExpired>
<sandbox>false</sandbox>
<serverUrl><URL>
<sessionId>00Dj00001234ABCD5!AQcAQBgaabcded12XS7C6i3FNE0TMf6EBwOasndsT4O</sessionId>
<userId>0010a00000ABCDefgh</userId>
<userInfo>
<currencySymbol>$</currencySymbol>
<organizationId>00XYZABCDEF123</organizationId>
<organizationName>ABCDEFGH</organizationName>
<sessionSecondsValid>43200</sessionSecondsValid>
<userDefaultCurrencyIsoCode xsi:nil="true"/>
<userEmail>user@organization</userEmail>
<userFullName>USERNAME</userFullName>
<userLanguage>en_US</userLanguage>
<userName>user@organization</userName>
<userTimeZone>America/Los_Angeles</userTimeZone>
</userInfo>
</result>
</loginResponse>
</soapenv:Body>
</soapenv:Envelope>
Using the above XML, we need to initialize three variables: serverUrl, sessionId, and instance. The first two variables are available in the response XML, the instance is the first part of the hostname in serverUrl.
The shell script snippet given below can extract these three variables from the login_response.xml file:
sessionId=$(xmllint --xpath
"/*[name()='soapenv:Envelope']/*[name()='soapenv:Body']/*[name()='loginResponse']/*
[name()='result']/*[name()='sessionId']/text()" login_response.xml)
serverUrl=$(xmllint --xpath
"/*[name()='soapenv:Envelope']/*[name()='soapenv:Body']/*[name()='loginResponse']/*
[name()='result']/*[name()='serverUrl']/text()" login_response.xml)
instance=$(echo ${serverUrl/.salesforce.com*/} | sed 's|https(colon)//||')
sessionId = 00Dj00001234ABCD5!AQcAQBgaabcded12XS7C6i3FNE0TMf6EBwOasndsT4O
serverUrl = <URL>
instance = organization
Step 2: Create a Job
Save the given below XML as job_account.xml. The XML given below is used to download Account object data from Salesforce in JSON format. Edit the bold text to download different objects or to change content type as per the requirement i.e. to CSV or XML. We are using JSON here.
job_account.xml:
<?xml version="1.0" encoding="UTF-8"?>
<jobInfo
xmlns="http://www.force.com/2009/06/asyncapi/dataload">
<operation>query</operation>
<object>Account</object>
<concurrencyMode>Parallel</concurrencyMode>
<contentType>JSON</contentType>
</jobInfo>
Execute the command given below to create the job and get the response, from the XML response received (account_jobresponse.xml), we will extract the jobId variable.
curl -s -H "X-SFDC-Session: ${sessionId}" -H "Content-Type: application/xml; charset=UTF-8" -d
@job_account.xml https://${instance}.salesforce.com/services/async/41.0/job >
account_job_response.xml
jobId = $(xmllint --xpath "/*[name()='jobInfo']/*[name()='id']/text()" account_job_response.xml)
account_job_response.xml:
<?xml version="1.0" encoding="UTF-8"?>
<jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload">
<id>1200a000001aABCD1</id>
<operation>query</operation>
<object>Account</object>
<createdById>00580000003KrL0AAK</createdById>
<createdDate>2018-05-22T06:09:45.000Z</createdDate>
<systemModstamp>2018-05-22T06:09:45.000Z</systemModstamp>
<state>Open</state>
<concurrencyMode>Parallel</concurrencyMode>
<contentType>JSON</contentType>
<numberBatchesQueued>0</numberBatchesQueued>
<numberBatchesInProgress>0</numberBatchesInProgress>
<numberBatchesCompleted>0</numberBatchesCompleted>
<numberBatchesFailed>0</numberBatchesFailed>
<numberBatchesTotal>0</numberBatchesTotal>
<numberRecordsProcessed>0</numberRecordsProcessed>
<numberRetries>0</numberRetries>
<apiVersion>41.0</apiVersion>
<numberRecordsFailed>0</numberRecordsFailed>
<totalProcessingTime>0</totalProcessingTime>
<apiActiveProcessingTime>0</apiActiveProcessingTime>
<apexProcessingTime>0</apexProcessingTime>
</jobInfo>
jobId = 1200a000001aABCD1
Step 3: Add a Batch to the Job
The next step is to add a batch to the Job created in the previous step. A batch contains a SQL query used to get the data from SFDC. After submitting the batch, we will extract the batchId from the JSON response received.
uery = ‘select ID,NAME,PARENTID,PHONE,ACCOUNT_STATUS from ACCOUNT’
curl -d "${query}" -H "X-SFDC-Session: ${sessionId}" -H "Content-Type: application/json;
charset=UTF-8" https://${instance}.salesforce.com/services/async/41.0/job/${jobId}/batch |
python -m json.tool > account_batch_response.json
batchId = $(grep "id": $work_dir/job_responses/account_batch_response.json | awk -F':' '{print $2}' | tr -d ' ,"')
account_batch_response.json:
{
"apexProcessingTime": 0,
"apiActiveProcessingTime": 0,
"createdDate": "2018-11-30T06:52:22.000+0000",
"id": "1230a00000A1zABCDE",
"jobId": "1200a000001aABCD1",
"numberRecordsFailed": 0,
"numberRecordsProcessed": 0,
"state": "Queued",
"stateMessage": null,
"systemModstamp": "2018-11-30T06:52:22.000+0000",
"totalProcessingTime": 0
}
batchId = 1230a00000A1zABCDE
Step 4: Check The Batch Status
As Bulk API is an Asynchronous API, the batch will be run at the Salesforce end and the state will be changed to Completed or Failed once the results are ready to download. We need to repeatedly check for the batch status until the status changes either to Completed or Failed.
status=""
while [ ! "$status" == "Completed" || ! "$status" == "Failed" ]
do
sleep 10; #check status every 10 seconds
curl -H "X-SFDC-Session: ${sessionId}"
https://${instance}.salesforce.com/services/async/41.0/job/${jobId}/batch/${batchId} |
python -m json.tool > account_batchstatus_response.json
status=$(grep -i '"state":' account_batchstatus_response.json | awk -F':' '{print $2}' |
tr -d ' ,"')
done;
account_batchstatus_response.json:
{
"apexProcessingTime": 0,
"apiActiveProcessingTime": 0,
"createdDate": "2018-11-30T06:52:22.000+0000",
"id": "7510a00000J6zNEAAZ",
"jobId": "7500a00000Igq5YAAR",
"numberRecordsFailed": 0,
"numberRecordsProcessed": 33917,
"state": "Completed",
"stateMessage": null,
"systemModstamp": "2018-11-30T06:52:53.000+0000",
"totalProcessingTime": 0
}
Step 5: Retrieve the Results
Once the state is updated to Completed, we can download the result dataset which will be in JSON format. The code snippet given below will extract the resultId from the JSON response and then will download the data using the resultId.
if [ "$status" == "Completed" ]; then
curl -H "X-SFDC-Session: ${sessionId}"
https(colon)//${instance}.salesforce(dot)com/services/async/41.0/job/${jobId}/batch/${batchId}/result |
python -m json.tool > account_result_response.json
resultId = $(grep '"' account_result_response.json | tr -d ' ,"')
curl -H "X-SFDC-Session: ${sessionId}"
https(colon)//${instance}.salesforce(dot)com/services/async/41.0/job/${jobId}/batch/${batchId}/result/
${resultId} > account.json
fi
account_result_response.json:
[
"7110x000008jb3a"
]
resultId = 7110x000008jb3a
Step 6: Close the Job
Once the results have been retrieved, we can close the Job. Save below XML as close-job.xml.
<?xml version="1.0" encoding="UTF-8"?>
<jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload">
<state>Closed</state>
</jobInfo>
Use the code given below to close the job, by suffixing the jobId to the close-job request URL.
curl -s -H "X-SFDC-Session: ${sessionId}" -H "Content-Type: text/csv; charset=UTF-8" -d
@close-job.xml https(colon)//${instance}.salesforce(dot)com/services/async/41.0/job/${jobId}
After running all the above steps, we will have the account.json generated in the current working directory, which contains the account data downloaded from Salesforce in JSON format, which we will use to load data into Snowflake in next steps.
Downloaded data file:
$ cat ./account.json
[ {
"attributes" : {
"type" : "Account",
"url" : "/services/data/v41.0/sobjects/Account/2x234abcdedg5j"
},
"Id": "2x234abcdedg5j",
"Name": "Some User",
"ParentId": "2x234abcdedgha",
"Phone": 124567890,
"Account_Status": "Active"
}, {
"attributes" : {
"type" : "Account",
"url" : "/services/data/v41.0/sobjects/Account/1x234abcdedg5j"
},
"Id": "1x234abcdedg5j",
"Name": "Some OtherUser",
"ParentId": "1x234abcdedgha",
"Phone": null,
"Account_Status": "Active"
} ]
Step 7: Loading Data from Salesforce to Snowflake
Now that we have the JSON file downloaded from Salesforce, we can use it to load the data into a Snowflake table. File extracted from Salesforce has to be uploaded to Snowflake’s internal stage or to an external stage such as Microsoft Azure or AWS S3 location. Then we can load the Snowflake table using the created Snowflake Stage.
Step 8: Creating a Snowflake Stage
Stage in the Snowflake is a location where data files are stored, and that location is accessible by Snowflake; then, we can use the Stage name to access the file in Snowflake or to load the table.
We can create a new stage, by following below steps:
- Login to the Snowflake Web Client UI.
- Select the desired Database from the Databases tab.
- Click on Stages tab
- Click Create, Select desired location (Internal, Azure or S3)
- Click Next
- Fill the form that appears in the next window (given below).
Fill the details i.e. Stage name, Stage schema of Snowflake, Bucket URL and the required access keys to access the Stage location such as AWS keys to access AWS S3 bucket.
- Click Finish.
Step 9: Creating Snowflake File Format
Once the stage is created, we are all set with the file location. The next step is to create the file format in Snowflake. File Format menu can be used to create the named file format, which can be used for bulk loading data into Snowflake using that file format.
As we have JSON format for the extracted Salesforce file, we will create the file format to read a JSON file.
Steps to create File Format:
- Login to Snowflake Web Client UI.
- Select the Databases tab.
- Click the File Formats tab.
- Click Create.
This will open a new window where we can mention the file format properties.
We have selected type as JSON, Schema as Format which stores all our File Formats. Also, we have selected Strip Outer Array option, this is required to strip the outer array (square brace that encloses entire JSON) that Salesforce adds to the JSON file.
File Format can also be created using SQL in Snowflake. Also, grants have to be given to allow other roles to access this format or stage we have created.
create or replace file format format.JSON_STRIP_OUTER
type = 'json'
field_delimiter = none
record_delimiter = '
'
STRIP_OUTER_ARRAY = TRUE;
grant USAGE on FILE FORMAT FORMAT.JSON_STRIP_OUTER to role developer_role;
Step 10: Loading Salesforce JSON Data to Snowflake Table
Now that we have created the required Stage and File Format of Snowflake, we can use them to bulk load the generated Salesforce JSON file and load data into Snowflake.
The advantage of JSON type in Snowflake:
Snowflake can access the semi-structured type like JSON or XML as a schemaless object and can directly query/parse the required fields without loading them to a staging table. To know more about accessing semi-structured data in Snowflake, click here.
Step 11: Parsing JSON File in Snowflake
Using the PARSE_JSON function we can interpret the JSON in Snowflake, we can write a query as given below to parse the JSON file into a tabular format. Explicit type casting is required when using parse_json as it’ll always default to string.
SELECT
parse_json($1):Id::string,
parse_json($1):Name::string,
parse_json($1):ParentId::string,
parse_json($1):Phone::int,
parse_json($1):Account_Status::string
from @STAGE.salesforce_stage/account.json
( file_format=>('format.JSON_STRIP_OUTER')) t;
We will create a table in snowflake and use the above query to insert data into it. We are using Snowflake’s web client UI for running these queries.
Upload file to S3:
Table creation and insert query:
Data inserted into the Snowflake target table:
Hurray!! You have successfully loaded data from Salesforce to Snowflake.
Limitations of Loading Data from Salesforce to Snowflake using Bulk API
- The maximum single file size is 1GB (Data that is more than 1GB, will be broken into multiple parts while retrieving results).
- Bulk API query doesn’t support the following in SOQL query:
COUNT, ROLLUP, SUM, GROUP BY CUBE, OFFSET, and Nested SOQL queries. - Bulk API doesn’t support base64 data type fields.
Method 3: Load Data from Salesforce to Snowflake using Snowflake Output Connection (Beta)
In June 2020, Snowflake and Salesforce launched native integration so that customers can move data from Salesforce to Snowflake. This can be analyzed using Salesforce’s Einstein Analytics or Tableau. This integration is available in open beta for Einstein Analytics customers.
Steps for Salesforce to Snowflake Integration
- Enable the Snowflake Output Connector
- Create the Output Connection
- Configure the Connection Settings
Limitations of Loading Data from Salesforce to Snowflake using Snowflake Output Connection (Beta):
- Snowflake Output Connection (Beta) is not a full ETL solution. It extracts and loads data but lacks the capacity for complex transformations.
- It has limited scalability as there are limitations on the amount of data that can be transferred per object per hour. So, using Snowflake Output Connection as Salesforce to Snowflake connector is not very efficient.
Use Cases of Salesforce to Snowflake Integration
- Real-Time Forecasting: When you connect Salesforce to Snowflake, it can be used in business for predicting end-of-the-month/ quarter/year forecasts that help in better decision-making. For example, you can use opportunity data from Salesforce with ERP and finance data from Snowflake to do so.
- Performance Analytics: After you import data from Salesforce to Snowflake, you can analyze your marketing campaign’s performance. You can analyze conversion rates by merging click data from Salesforce with the finance data in Snowflake.
- AI and Machine Learning: It can be used in business organizations to determine customer purchases of specific products. This can be done by combining Salesforce’s objects, such as website visits, with Snowflake’s POS and product category data.
Conclusion
This blog has covered all the steps required to extract data using Bulk API to move data from Salesforce to Snowflake. Additionally, an easier alternative using LIKE.TG has also been discussed to load data from Salesforce to Snowflake.
Visit our Website to Explore LIKE.TGWant to take LIKE.TG for a spin? Sign Up for a 14-day free trial and experience the feature-rich LIKE.TG suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.
Do leave a comment on your experience of replicating data from Salesforce to Snowflake and let us know what worked for you.
LIKE.TG 专注全球社交流量推广,致力于为全球出海企业提供有关的私域营销获客、国际电商、全球客服、金融支持等最新资讯和实用工具。免费领取【WhatsApp、LINE、Telegram、Twitter、ZALO】等云控系统试用;点击【联系客服】 ,或关注【LIKE.TG出海指南频道】、【LIKE.TG生态链-全球资源互联社区】了解更多最新资讯
本文由LIKE.TG编辑部转载自互联网并编辑,如有侵权影响,请联系官方客服,将为您妥善处理。
This article is republished from public internet and edited by the LIKE.TG editorial department. If there is any infringement, please contact our official customer service for proper handling.