Loading Data from Oracle to Redshift: 2 Easy Methods
LIKE.TG 成立于2020年,总部位于马来西亚,是首家汇集全球互联网产品,提供一站式软件产品解决方案的综合性品牌。唯一官方网站:www.like.tg
Is your Oracle server getting too slow for analytical queries now? Or do you think you are paying too much money to increase the storage capacity or compute power of your Oracle instance? Or are you looking to join and combine data from multiple databases seamlessly? Whatever the case may be, Amazon Redshift offers amazing solutions to the above problems. Hence there is little to think about before moving your data from an Oracle to Amazon Redshift cluster.
This article covers the basic idea behind the two architectures and the detailed steps you need to follow to migrate data from Oracle to Redshift. Additionally, it also covers why you should consider implementing an ETL solution such as LIKE.TG Data to make the migration smooth and efficient.
Overview on Oracle and Amazon Redshift
Oracle is fundamentally a Proprietary, Multi-Model, Relational Database System used for Data Warehousing and Online Transaction Processing (OLTP). However, the most recent versions include features similar to cloud-based solutions (such as Amazon Redshift) like columnar storage, on-cloud deployment, etc.
Amazon Redshift is a PostgreSQL standard-based, efficiently scalable, entirely managed, on-cloud database optimized for Online Analytical Processing (OLAP) and data warehousing. One can get things started very quickly in just two steps –
- Launch a Redshift cluster via simple API calls or through the AWS Management Console.
- Connect the local SQL client to this Redshift instance.
There are many advantages of Redshift’s Unique Architecture. Deciding to move data from Oracle to Redshift is the right step in stepping up your data analytics infrastructure.
Method 1: Custom ETL Scripts to Load Data from Oracle to Redshift
Hand code ETL scripts and configure jobs to move Oracle data to Redshift
Method 2: Setting Up Oracle to Redshift Integration using LIKE.TG Data
LIKE.TG Data, a No-code Data Pipeline, provides you a fully automated platform to set up Oracle to Redshift Integration for free. It is a hassle-free solution to directly connect Oracle to Redshift when you don’t have technical expertise in this field.
Sign up here for a 14-day Free Trial!Methods to Load Data from Oracle to Redshift
There are majorly 2 methods of loading data from Oracle to Redshift:
- Method 1: Custome ETL Scripts to Load Data from Oracle to Redshift
- Method 2: Setting Up Oracle to Redshift Integration using LIKE.TG Data
Let’s walk through these methods one by one.
Method 1: Custome ETL Scripts to Load Data from Oracle to Redshift
It is a really easy and straightforward way to move data from Oracle to Amazon Redshift. This method involves 4 major steps:
- Step 1: Exporting Data from an Oracle Table via Spool
- Step 2: Copying a Flat File onto an AWS S3 Bucket
- Step 3: Creating an Empty Table and Loading Data from the AWS S3 Bucket
These steps are illustrated in technical detail via an example in the following section.
Step 1: Exporting Data from an Oracle Table via Spool
One of the most common ways to export Oracle data onto a flat-file is using the Spool command. Here’s an example of how to do it –
SPOOL c:oracleorgemp.csv
SELECT employeeno || ',' ||
employeename || ',' ||
job || ',' ||
manager || ',' ||
TO_CHAR(hiredate,'YYYY-MM-DD') AS hiredate || ',' ||
salary || ',' ||
FROM employee
ORDER BY employeeno;
SPOOL OFF
The above code exports all records available in employees into the emp.csv file under the org folder as mentioned. The CSV file could then be zipped (using “$ gzip emp.csv”) for compression before moving to the AWS S3 Bucket.
Step 2: Copying a Flat File onto an AWS S3 Bucket
AWS provides S3 Buckets to store files that could be loaded into an Amazon Redshift instance using the COPY command. To drop a local file into an AWS S3 Bucket, you could run a ‘COPY command’ on the AWS Command Line Interface. Here’s how you would do it –
aws s3 cp //oracle/org/emp.csv s3://org/empl/emp.csv.gz
However, if you’d prefer the Graphical User Interface (GUI) way, you could go over to your AWS S3 console https://console.aws.amazon.com/s3/home, and copy-paste your “emp.csv” file into the desired Amazon S3 Bucket.
Step 3: Creating an Empty Table and Loading Data from the AWS S3 Bucket
Before running the COPY command, an empty table must be created in the database to absorb the “emp.csv” file now available on the Amazon S3 Bucket.
The employee table on Redshift can be created using the following code:
SET SEARCH_PATH TO PUBLIC; // selecting the schema
CREATE TABLE EMPLOYEE (
cmployeeno INTEGER NOT NULL,
employeename VARCHAR,
job VARCHAR,
manager VARCHAR,
hiredate DATE,
salary INTEGER
DISTKEY(hiredate)
SORTKEY(employeeno)
)
The flat file copied over to S3 can be loaded into the above table using the following :
SET SEARCH_PATH TO PUBLIC;
COPY EMPLOYEE
FROM 's3://org/empl/emp.csv.gz'
'AWS_ACCESS_KEY_ID=MY_ACCESS_KEY AWS_SECRET_ACCESS_KEY=MY_SECRET_KEY'
GZIP;
Once you are done with the above steps, you need to increment the load from Oracle to Redshift. So, keep reading!
Incremental Load from Oracle to Redshift
The above is an example to demonstrate the process of moving data from Oracle to Redshift. In reality, this would be performed, typically every day, on an entire database consisting of 10s or 100s of tables in a scheduled and automated fashion. Here is how this is done.
- Step 1: Iterative Exporting of Tables
- Step 2: Copying CSV Files to AWS S3
- Step 3: Importing AWS S3 Data into Redshift
Step 1: Iterative Exporting of Tables
The following script will go through each table one by one. Next, it will export the data in each of them into a separate CSV file with the filename as the *name of the table*_s3.
begin
for item in (select table_name from user_tables)
loop
dbms_output.put_line('spool '||item.table_name||'_s3.csv');
dbms_output.put_line('select * from'||item.table_name||’;’);
dbms_output.put_line('spool off');
end loop;
end;
Step 2: Copying CSV Files to AWS S3
The exported .csv files can be uploaded to an S3 bucket using the following command:
aws s3 cp <your directory path> s3://<your bucket name> --grants read=uri=http://acs.amazonaws.com/groups/global/AllUsers --recursive
Step 3: Importing AWS S3 Data into Redshift
As mentioned before, this process is typically done every 24 hours on a whole lot of data. Hence, you must ensure that there is no data loss as well as no duplicate data. The 2nd part (duplicate data) is particularly relevant when copying data over to Redshift because Redshift doesn’t enforce Primary Key constraints.
Now, you can drop all the data in your Redshift instance and load the entire Oracle Database every time you are performing the data load. However, this is quite risky in regards to data loss and also very inefficient and computationally intensive. Hence, a good way to efficiently perform the data loads while ensuring data consistency would be to:
- Copy the AWS S3 flat file data into a temp table: This is achieved by running the ‘COPY’ command the same way as explained in “Step 3” before.
- Compare the temp table data with the incoming data (the .csv files): See the section Data Loads: SCD Type 1 and Type
- Resolve any data inconsistency issues: See the section Data Loads: SCD Type 1 and Type 2
- Remove data from the Parent Table and copy the new and clean up data from the Temp Table. Run the following commands:
begin;
delete from employee where *condition* (depends on what data is available in the temp table)
insert into employee select * from emp_temp_table;
end;
Data Loads – SCD Type 1 and Type 2
Generally, while comparing the existing table data with the new stream of data (S3 bucket data, in this case) one or both of the following methods is used to complete the data load:
Type 1 or Upsert: A new record is either inserted or updated. The update happens only when the primary key of the incoming record matches with the primary key of an existing record. Here is an example:
Existing Record:
Incoming Record:
Final Table (After Upsert):
Type 2 or Maintain History: In this scenario, if the primary key of the incoming record matches with the primary key of an existing record, the existing record is end dated or flagged to reflect that it is a past record. Here is the Type 2 for the above example –
Existing Record:
Incoming Record:
Final Table (After Type 2):
Limitations of Using Custom ETL Scripts to Load Data from Oracle to Redshift
Although a Custom Script (or more likely a combination of Custom Scripts) written to execute the above steps will work, it will be tedious to ensure the smooth functioning of such a system due to the following reasons:
- There are many different kinds of steps that are needed to be executed in a dependent fashion without failure.
- The incremental load is especially difficult to code and execute in such a way as to ensure there is no data loss and/or data inconsistencies. Doing a full load every time puts a lot of load on the Oracle database.
- As mentioned, this is typically done once every day. Lately, however, people want to look at more real-time data. Hence, this will have to be executed a lot more frequently than once in 24 hours. That is going to test the robustness and thoroughness of your solution a lot more.
Method 2: Setting Up Oracle to Redshift Integration using LIKE.TG Data
LIKE.TG Data, a No-code Data Pipeline helps to Load Data from any data source such as Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services and simplifies the ETL process. It supports 100+ data sources including Oracle, etc., and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. LIKE.TG loads the data onto the desired Data Warehouse, enriches the data, and transforms it into an analysis-ready form without writing a single line of code.
Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different Business Intelligence (BI) tools as well.
Get Started with LIKE.TG for freeCheck out why LIKE.TG is the Best:
- Secure: LIKE.TG has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: LIKE.TG takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Minimal Learning: LIKE.TG , with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- LIKE.TG Is Built To Scale: As the number of sources and the volume of your data grows, LIKE.TG scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: LIKE.TG allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The LIKE.TG team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
- Live Monitoring: LIKE.TG allows you to monitor the data flow and check where your data is at a particular point in time.
Conclusion
Furthermore, LIKE.TG has an intuitive user interface that lets, even the not-so-technical people, easily tweak the parameters of your data load settings. This would come in super handy once you have everything up and running.
With LIKE.TG , you can achieve seamless and accurate data replication from Oracle to Redshift. With its fault-tolerant architecture, LIKE.TG ensures that no data is lost while loading. This empowers you to focus on the right projects instead of worrying about data availability.
Visit our Website to Explore LIKE.TGWant to take LIKE.TG for a spin? Sign Up for a 14-day free trial and experience the feature-rich LIKE.TG suite first hand.
Share your experience of loading data from Oracle to Redshift in the comments section below!
LIKE.TG 专注全球社交流量推广,致力于为全球出海企业提供有关的私域营销获客、国际电商、全球客服、金融支持等最新资讯和实用工具。免费领取【WhatsApp、LINE、Telegram、Twitter、ZALO】等云控系统试用;点击【联系客服】 ,或关注【LIKE.TG出海指南频道】、【LIKE.TG生态链-全球资源互联社区】了解更多最新资讯
本文由LIKE.TG编辑部转载自互联网并编辑,如有侵权影响,请联系官方客服,将为您妥善处理。
This article is republished from public internet and edited by the LIKE.TG editorial department. If there is any infringement, please contact our official customer service for proper handling.