How-to

Data Warehouse Migration Challenges and How to Meet Them

942

Of your peers have already read this article.

7:30 Minutes

The most insightful time you'll spend today!

If your company is thinking of migrating to a modern data warehouse from a legacy environment and you are asking how you can minimize migration risks, how much will it cost, how you'll migrate your data to the target data warehouse, and how quickly you will see equal or better performance, read on.

In the last blog post, we discussed why legacy data warehouses are not cutting it any more and why organizations are moving their data warehouses to cloud.

At GCP, we often hear that customers feel that migration is an uphill battle because the migration strategy was not deliberately considered. 

Migrating to a modern data warehouse from a legacy environment can require a massive up-front investment in time and resources. There’s a lot to think about before and during the process, so your organization has to take a strategic approach to streamline the process.

At Google Cloud, we work with enterprises shifting data to our BigQuery data warehouse, and we’ve helped companies of all kinds successfully migrate to cloud. Here are some of the questions we frequently hear around migrating a data warehouse to the cloud:

  • How do we minimize any migration risks or security challenges?
  • How much will it cost?
  • How do we migrate our data to the target data warehouse?
  • How quickly will we see equal or better performance?

These are big, important questions to ask—and have answered—when you’re starting your migration. Let’s take them in order.

How do we minimize any migration risks or security challenges?
It’s easy to consider an on-premises data warehouse secure because, well, it’s on-site and you can manage its data protection. But if scaling up an on-prem data warehouse is difficult, so is securing it as your business scales. 

We’ve built in multiple features to secure BigQuery. For enterprise users, Cloud Identity and Access Management (Cloud IAM) is key to setting appropriate role-based user access to data.

You can also take advantage of SQL’s security views within BigQuery. And all BigQuery data is encrypted at rest and in transit.

You can add the protection of customer-managed encryption keys to establish even stronger security measures. Using virtual private cloud (VPC) security controls can secure your migration path, since it helps reduce data exfiltration risks. 

How much will it cost?
The cost of a cloud data warehouse has a different structure from what you’re likely used to with a legacy data warehouse. An on-prem system like Teradata may depend on your IT team paying every three years for the hardware, then paying for licenses for users who need to access the system. Capacity increases come at an additional cost outside of that hardware budget.

With cloud, you’ve got a lot more options for cost and scale. Instead of a fixed set of costs, you’re now working on a price-utility gradient, where if you want to get more out of your data warehouse, you can spend more to do so immediately, or vice versa. While cloud data warehouses help reduce or eliminate capital and fixed costs, they are not all the same.

You’ll find varying levels of simplicity and cost savings across vendors, so it’s important to check out the operational costs of each data warehouse in relation to its performance. 

With a cloud data warehouse like BigQuery, TCO becomes an important metric for customers when they’ve migrated to BigQuery (check out ESG’s report on that), and Google Cloud’s flexibility makes it easy to optimize costs.

How do we migrate all of our data to the target data warehouse?
This question encompasses both migrating your extract, transform, load (ETL) jobs and SAS/BI application workloads to the target data warehouse, as well as migrating all your queries, stored procedures, and other extract, load, transform (ELT) jobs.

Actually getting all of a company’s data into the cloud can seem daunting at the outset of the migration journey. We know that most businesses have a lot of siloed data. That might be multiple data lakes set up over the years for various teams, or systems acquired through acquisition that handle just one or two crucial applications. You may be moving data from an on-prem or cloud data warehouse to BigQuery and type systems or representations don’t match up.

One big step you can take to prepare for a successful migration is to do some workload and use case discovery.

That might involve auditing which use cases exist today and whether those use cases are part of a bigger workload, as well as identifying which datasets, tables, and schemas underpin each use case.

Use cases will vary by industry and by job role. So, for example, a retail pricing analyst may want to analyze past product price changes to calculate future pricing. Use cases may include the need to ingest data from a transactional database, transforming data into a single time series per product, storing the results in a data warehouse table, and more. 

After the preparation and discovery phase, you should assess the current state of your legacy environment to plan for your migration. This includes cataloging and prioritizing your use cases, auditing data to decide what will be moved and what won’t, and evaluating data formats across your organization to decide what you’ll need to convert or rewrite.

Once that’s decided, choose your ingest and pipeline methods. All of these tasks take both technology and people management, and require some organizational consensus on what success will look like once the migration is complete. 

How quickly will we see equal or better performance?
Managing a legacy data warehouse isn’t usually synonymous with speed. Performance often comes at the cost of capacity, so users can’t do the analysis they need till other queries have finished running.

Reporting and other analytics functions may take hours or days, which is especially true for running large reports with a lot of data, like an end-of-quarter sales calculation. As the amount of data and number of users rapidly grows, performance begins to melt down and organizations often face disruptive outages.

However, with a modern cloud data warehouse like BigQuery, compute and storage are decoupled, so you can scale immediately without facing capital infrastructure constraints. 

BigQuery helps you modernize because it uses a familiar SQL interface, so users can run queries in seconds and share insights right away. Home Depot is an example of a customer that migrated their warehouse and reduced eight-hour workloads to five minutes. 

Moving to cloud may seem daunting, especially when you’re migrating an entrenched legacy system. But it brings the benefits of adopting technology that lets the business grow, rather than simply adopting a tool. It’s likely you’ve already seen that the business demand exists. Now it’s time to stop standing in the way of that demand and instead make way for growth.

More Relevant Stories for Your Company

Blog

Making Streaming Analytics Simpler and More Cost-Effective With Cloud Dataflow

Streaming analytics helps businesses to understand their customers in real time and adjust their offerings and actions to better serve customer needs. It’s an important part of modern data analytics, and can open up possibilities for faster decisions. For streaming analytics projects to be successful, the tools have to be

How-to

What is Dataflow?

What is Dataflow, and how can you use it for your data processing needs? In this episode of Google Cloud Drawing Board, Priyanka Vergadia walks you through Dataflow, a serverless system for processing and enriching data, supporting both streaming and batch models. Here's what's inside: 0:00 - 0:14 Video snapshot0:15

How-to

How Modern is Your Data Warehouse? Find Out With This Test

As more and more businesses turn to advanced data analytics to help them make smarter decisions, run real-time analytics, and improve business operations, an increasing number are modernizing their data warehouses to make it all possible. For many businesses, knowing how to modernize means understanding where their data warehouse sits

Podcast

Why it’s Easier Than Ever for Developers to Break Into Machine Learning and Data Science

This week we talk about machine learning, its best use cases, and how developers can break into machine learning and data science. Dale Markowitz, Google Developer Advocate, talks about natural language processing as well, explaining that it’s basically the intersection of machine learning and text processing. It can be used

SHOW MORE STORIES