Replicating data from MySQL to BigQuery: 2 Easy Methods

营销拓客

2024-08-14 08:28:55

LIKE.TG 成立于2020年，总部位于马来西亚，是首家汇集全球互联网产品，提供一站式软件产品解决方案的综合性品牌。唯一官方网站：www.like.tg

With the BigQuery MySQL Connector, users can perform data analysis on MySQL data stored in BigQuery without the need for complex data migration processes. With MySQL BigQuery integration, organizations can leverage the scalability and power of BigQuery for handling large datasets stored in MySQL.

Migrate MySQL to BigQuery can be a complex undertaking, necessitating thorough testing and validation to minimize downtime and ensure a smooth transition. This blog will provide 2 easy methods to connect MySQL to BigQuery in real time. The first method uses LIKE.TG ’s automated Data Pipeline to set up this connection while the second method involves writing custom ETL Scripts to perform this data transfer from MySQL to BigQuery. Read along and decide which method suits you the best!

Methods to Connect MySQL to BigQuery

Following are the 2 methods using which you can set up your MySQL to BigQuery integration:

Method 1: Using LIKE.TG Data to Connect MySQL to BigQuery
Method 2: Manual ETL Process to Connect MySQL to BigQuery

Method 1: Using LIKE.TG Data to Connect MySQL to BigQuery

LIKE.TG is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Get Started with LIKE.TG for Free

With a ready-to-use Data Integration Platform, LIKE.TG , you can easily move data from MySQL to BigQuery with just 2 simple steps. This does not need you to write any code and will provide you with an error-free, fully managed setup to move data in minutes.

Step 1: Connect and configure your MySQL database.

Click PIPELINES in the Navigation Bar.
Click + CREATE in the Pipelines List View.
In the Select Source Type page, select the MySQL as your source.
In the Configure your MySQL Source page, specify the connection settings for your MySQL Source.

Step 2: Choose BigQuery as your Destination

Click DESTINATIONS in the Navigation Bar.
Click + CREATE in the Destinations List View.
In Add Destination page select Google BigQuery as the Destination type.
In the Configure your Google BigQuery Warehouse page, specify the following details:

It is that simple. While you relax, LIKE.TG will fetch the data and send it to your destination Warehouse.

Instead of building a lot of these custom connections, ourselves, LIKE.TG Data has been really flexible in helping us meet them where they are.
– Josh Kennedy, Head of Data and Business Systems

In addition to this, LIKE.TG lets you bring data from a wide array of sources – Cloud Apps, Databases, SDKs, and more. You can check out the complete list of available integrations.

Method 2: Manual ETL Process to Connect MySQL to BigQuery

The manual method of connecting MySQL to BigQuery involves writing custom ETL scripts to set up this data transfer process. This method can be implemented in 2 different forms:

Full Dump and Load
Incremental Dump and Load

1. Full Dump and Load

This approach is relatively simple, where complete data from the source MySQL table is extracted and migrated to BigQuery. If the target table already exists, drop it and create a new table ( Or delete complete data and insert newly extracted data).

Full Dump and Load is the only option for the first-time load even if the incremental load approach is used for recurring loads. The full load approach can be followed for relatively small tables even for further recurring loads. You can also check out MySQL to Redshift integration.

The high-level steps to be followed to replicate MySQL to BigQuery are:

Step 1: Extract Data from MySQL
Step 2: Clean and Transform the Data
Step 3: Upload to Google Cloud Storage(GCS)
Step 4: Upload to the BigQuery Table from GCS

Let’s take a detailed look at each step to migrate sqlite to mariadb.

Step 1: Extract Data from MySQL

There are 2 popular ways to extract data from MySQL – using mysqldump and using SQL query.

Extract data using mysqldump

Mysqldump is a client utility coming with Mysql installation. It is mainly used to create a logical backup of a database or table. Here, is how it can be used to extract one table:

mysqldump -u <db_username> -h <db_host> -p db_name table_name > table_name.sql

Here output file table_name.sql will be in the form of insert statements like

INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);

This output has to be converted into a CSV file. You have to write a small script to perform this. Here is a well-accepted python library doing the same – mysqldump_to_csv.py

Alternatively, you can create a CSV file using the below command. However, this option works only when mysqldump is run on the same machine as the mysqld server which is not the case normally.

mysqldump -u [username] -p -t -T/path/to/directory [database] --fields-terminated-by=,

Extract Data using SQL query

MySQL client utility can be used to run SQL commands and redirect output to file.

mysql -B -u user database_name -h mysql_host   -e "select * from table_name;" > 
table_name_data_raw.txt

Further, it can be piped with text editing utilities like sed or awk to clean and format data.

Example:

mysql -B -u user database_name -h mysql_host -e "select * from table_name;" |   
sed "s/'/'/;s/t/","/g;s/^/"/;s/$/"/;s/n//g"  > table_name_data.csv

Step 2: Clean and Transform the Data

Apart from transforming data for business logic, there are some basic things to keep in mind:

BigQuery expects CSV data to be UTF-8 encoded.
BigQuery does not enforce Primary Key and unique key constraints. ETL process has to take care of that.
Column types are slightly different. Most of the types have either equivalent or convertible types. Here is a list of common data types.

Fortunately, the default date format in MySQL is the same, YYYY-MM-DD. Hence, while taking mysqldump there is no need to do any specific changes for this. If you are using a string field to store date and want to convert to date while moving to BigQuery you can use STR_TO_DATE function.
DATE value must be dash(-) separated and in the form YYYY-MM-DD (year-month-day). You can visit their official page to know more about BigQuery data types.

Syntax : 
STR_TO_DATE(str,format)
Example : 
SELECT STR_TO_DATE('31,12,1999','%d,%m,%Y');
Result : 
1999-12-31

The hh:mm: ss (hour-minute-second) portion of the timestamp must use a colon (:) separator.
Make sure text columns are quoted if it can potentially have delimiter characters.

Step 3: Upload to Google Cloud Storage(GCS)

Gsutil is a command-line tool for manipulating objects in GCS. It can be used to upload files from different locations to your GCS bucket.

To copy a file to GCS:

gsutil cp table_name_data.csv  gs://my-bucket/path/to/folder/

To copy an entire folder:

gsutil cp -r dir gs://my-bucket/path/to/parent/

If the files are present in S3, the same command can be used to transfer to GCS.

gsutil cp -R s3://bucketname/source/path gs://bucketname/destination/path

Storage Transfer Service

Storage Transfer Service from Google cloud is another option to upload files to GCS from S3 or other online data sources like HTTP/HTTPS location. Destination or sink is always a Cloud Storage bucket. It can also be used to transfer data from one GCS bucket to another.

This service is extremely handy when comes to data movement to GCS with support for:

Schedule one-time or recurring data transfer.
Delete existing objects in the destination if no corresponding source object is present.
Deletion of source object after transferring.
Periodic synchronization between source and sink with advanced filters based on file creation dates, file name, etc.

Upload from Web Console

If you are uploading from your local machine, web console UI can also be used to upload files to GCS. Here are the steps to upload a file to GCS with screenshots.

Login to your GCP account. In the left bar, click Storage and go to Browser.

2. Select the GCS bucket you want to upload the file. Here the bucket we are using is test-data-LIKE.TG . Click on the bucket.

3. On the bucket details page below, click the upload files button and select file from your system.

4. Wait till the upload is completed. Now, the uploaded file will be listed in the bucket:

Step 4: Upload to the BigQuery Table from GCS

You can use the bq command to interact with BigQuery. It is extremely convenient to upload data to the table from GCS.Use the bq load command, and specify CSV as the source_format.

The general syntax of bq load:

bq --location=[LOCATION] load --source_format=[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE] [SCHEMA]

[LOCATION] is your location. This is optional.
[FORMAT] is CSV.
[DATASET] is an existing dataset.
[TABLE] is the name of the table into which you’re loading data.
[PATH_TO_SOURCE] is a fully-qualified Cloud Storage URI.
[SCHEMA] is a valid schema. The schema can be a local JSON file or inline.– autodetect flag also can be used instead of supplying a schema definition.

There are a bunch of options specific to CSV data load :

To see full list options visit Bigquery documentation on loading data cloud storage CSV, visit here.

Following are some example commands to load data:

Specify schema using a JSON file:

bq --location=US load --source_format=CSV mydataset.mytable gs://mybucket/mydata.csv ./myschema.json

If you want schema auto-detected from the file:

bq --location=US load --autodetect --source_format=CSV mydataset.mytable gs://mybucket/mydata.csv

If you are writing to the existing table, BigQuery provides three options – Write if empty, Append to the table, Overwrite table. Also, it is possible to add new fields to the table while uploading data. Let us see each with an example.

To overwrite the existing table:

bq --location=US load --autodetect --replace --source_format=CSV mydataset.mytable 
gs://mybucket/mydata.csv

To append to an existing table:

bq --location=US load --autodetect --noreplace --source_format=CSV mydataset.mytable 
gs://mybucket/mydata.csv ./myschema.json

To add a new field to the table. Here new schema file with an extra field is given :

bq --location=asia-northeast1 load --noreplace --schema_update_option=ALLOW_FIELD_ADDITION 
--source_format=CSV mydataset.mytable gs://mybucket/mydata.csv ./myschema.json

2. Incremental Dump and Load

In certain use cases, loading data once from MySQL to BigQuery will not be enough. There might be use cases where once initial data is extracted from the source, we need to keep the target table in sync with the source. For a small table doing a full data dump every time might be feasible but if the volume data is higher, we should think of a delta approach.

The following steps are used in the Incremental approach to connect MySQL to Bigquery:

Step 1: Extract Data from MySQL
Step 2: Update Target Table in BigQuery

Step 1: Extract Data from MySQL

For incremental data extraction from MySQL use SQL with proper predicates and write output to file. mysqldump cannot be used here as it always extracts full data.

Eg: Extracting rows based on the updated_timestamp column and converting to CSV.

mysql -B -u user database_name -h mysql_host  -e "select * from table_name where
updated_timestamp < now() and  updated_timestamp >'#max_updated_ts_in_last_run#'"| 
sed "s/'/'/;s/t/","/g;s/^/"/;s/$/"/;s/n//g"  > table_name_data.csv

Note: In case of any hard delete happened in the source table, it will not be reflected in the target table.

Step 2: Update Target Table in BigQuery

First, upload the data into a staging table to upsert newly extracted data to the BigQuery table. This will be a full load. Please refer full data load section above. Let’s call it delta_table. Now there are two approaches to load data to the final table:

Update the values existing records in the final table and insert new rows from the delta table which are not in the final table.

UPDATE data_set.final_table t
SET t.value = s.value
FROM  data_set.delta_table s
WHERE t.id = s.id;

INSERT data_set.final_table (id, value)
SELECT id, value
FROM  data_set.delta_table
WHERE NOT id IN (SELECT id FROM data_set.final_table);

2. Delete rows from the final table which are present in the delta table. Then insert all rows from the delta table to the final table.

DELETE data_set.final_table f
WHERE f.id IN (SELECT id from data_set.delta_table);

INSERT data_set.final_table (id, value)
SELECT id, value
FROM  data_set.delta_table;

Disadvantages of Manually Loading Data

Manually loading data from MySQL to BigQuery presents several drawbacks:

Cumbersome Process: While custom code suits one-time data movements, frequent updates become burdensome manually, leading to inefficiency and bulkiness.
Data Consistency Issues: BigQuery lacks guaranteed data consistency for external sources, potentially causing unexpected behavior during query execution amidst data changes.
Location Constraint: The data set’s location must align with the Cloud Storage Bucket’s region or multi-region, restricting flexibility in data storage.
Limitation with CSV Format: CSV files cannot accommodate nested or repeated data due to format constraints, limiting data representation possibilities.
File Compression Limitation: Mixing compressed and uncompressed files in the same load job using CSV format is not feasible, adding complexity to data loading tasks.
File Size Restriction: The maximum size for a gzip file in CSV format is capped at 4 GB, potentially limiting the handling of large datasets efficiently.

What Can Be Migrated From MySQL To BigQuery?

Since the 1980s, MySQL has been the most widely used open-source relational database management system (RDBMS), with businesses of all kinds using it today.

MySQL is fundamentally a relational database. It is renowned for its dependability and speedy performance and is used to arrange and query data in systems of rows and columns.

Both MySQL and BigQuery use tables to store their data. When you migrate a table from MySQL to BigQuery, it is stored as a standard, or managed, table.

Both MySQL and BigQuery employ SQL, but they accept distinct data types, therefore you’ll need to convert MySQL data types to BigQuery equivalents. Depending on the data pipeline you utilize, there are several options for dealing with this.

Once in BigQuery, the table is encrypted and kept in Google’s warehouse. Users may execute complicated queries or accomplish any BigQuery-enabled job.

The Advantages of Connecting MySQL To BigQuery

BigQuery is intended for efficient and speedy analytics, and it does so without compromising operational workloads, which you will most likely continue to manage in MySQL.
It improves workflows and establishes a single source of truth. Switching between platforms can be difficult and time-consuming for analysts. Updating BigQuery with MySQL ensures that both data storage systems are aligned around the same source of truth and that other platforms, whether operational or analytical, are constantly bringing in the right data.
BigQuery increases data security. By replicating data from MySQL to BigQuery, customers avoid the requirement to provide rights to other data engineers on operational systems.
BigQuery handles Online Analytical Processing (OLAP), whereas MySQL is designed for Online Transaction Processing (OLTP). Because it is a cost-effective, serverless, and multi-cloud data warehouse, BigQuery can deliver deeper data insights and aid in the conversion of large data into useful insights.

Conclusion

The article listed 2 methods to set up your BigQuery MySQL integration. The first method relies on LIKE.TG ’s automated Data Pipeline to transfer data, while the second method requires you to write custom scripts to perform ETL processes from MySQL to BigQuery.

Complex analytics on data requires moving data to Data Warehouses like BigQuery. It takes multiple steps to extract data, clean it and upload it. It requires real effort to ensure there is no data loss at each stage of the process, whether it happens due to data anomalies or type mismatches.

Visit our Website to Explore LIKE.TG

Want to take LIKE.TG for a spin? Sign Up for a 14-day free trial and experience the feature-rich LIKE.TG suite first hand. Check out LIKE.TG pricing to choose the best plan for your organization.

Share your understanding of connecting MySQL to BigQuery in the comments section below!

LIKE.TG 专注全球社交流量推广，致力于为全球出海企业提供有关的私域营销获客、国际电商、全球客服、金融支持等最新资讯和实用工具。免费领取【WhatsApp、LINE、Telegram、Twitter、ZALO】等云控系统试用；点击【联系客服】，或关注【LIKE.TG出海指南频道】、【LIKE.TG生态链-全球资源互联社区】了解更多最新资讯

本文由LIKE.TG编辑部转载自互联网并编辑，如有侵权影响，请联系官方客服，将为您妥善处理。

This article is republished from public internet and edited by the LIKE.TG editorial department. If there is any infringement, please contact our official customer service for proper handling.

客服坐席客服系统坐席多开效率工具

相关产品推荐