Learn how to transition from an on-premises data warehouse to BigQuery on Google Cloud starting from a schema and data transfer overview, to data governance and data pipelines, and finally to reporting and analysis, and performance optimization.
Over the past few decades, organizations have mastered the science of data warehousing. They have increasingly applied descriptive analytics to large quantities of stored data, gaining insight into their core business operations. Conventional Business Intelligence (BI), which focuses on querying, reporting, and Online Analytical Processing, might have been a differentiating factor in the past, either making or breaking a company, but it’s no longer sufficient.
Today, not only do organizations need to understand past events using descriptive analytics, they need predictive analytics, which often uses machine learning (ML) to extract data patterns and make probabilistic claims about the future. The ultimate goal is to develop prescriptive analytics that combine lessons from the past with predictions about the future to automatically guide real-time actions.
Traditional data warehouse practices capture raw data from various sources, which are often Online Transactional Processing (OLTP) systems. Then, a subset of data is extracted in batches, transformed based on a defined schema, and loaded into the data warehouse. Because traditional data warehouses capture a subset of data in batches and store data based on rigid schemas, they are unsuitable for handling real-time analysis or responding to spontaneous queries. Google designed BigQuery in part in response to these inherent limitations.
Innovative ideas are often slowed by the size and complexity of the IT organization that implements and maintains these traditional data warehouses. It can take years and substantial investment to build a scalable, highly available, and secure data warehouse architecture. BigQuery offers sophisticated software as a service (SaaS) technology that can be used for serverless data warehouse operations. This lets you focus on advancing your core business while delegating infrastructure maintenance and platform development to Google Cloud.
BigQuery offers access to structured data storage, processing, and analytics that’s scalable, flexible, and cost effective. These characteristics are essential when your data volumes are growing exponentially—to make storage and processing resources available as needed, as well as to get value from that data. Furthermore, for organizations that are just starting with big data analytics and machine learning, and that want to avoid the potential complexities of on-premises big data systems, BigQuery offers a pay-as-you-go way to experiment with managed services.
With BigQuery, you can find answers to previously intractable problems, apply machine learning to discover emerging data patterns, and test new hypotheses. As a result, you have timely insight into how your business is performing, which enables you to modify processes for better results. In addition, the end user’s experience is often enriched with relevant insights gleaned from big data analysis, as we explain later in this series.
The migration framework
Undertaking a migration can be a complex and lengthy endeavor. Therefore, we recommend adhering to a framework to organize and structure the migration work in phases:
- Prepare and discover: Prepare for your migration with workload and use case discovery.
- Assess and plan: Assess and prioritize use cases, define measures of success, and plan your migration.
- Execute: Iterate the following steps for each use case:
- Migrate (offload): Migrate only your data, schema, and downstream business applications.
- Migrate (full): Alternatively, migrate the use case fully end-to-end. The same as Migrate (offload), with the addition of the upstream data pipelines.
- Verify and validate: Test and validate the migration to assess return on investment.
The following diagram illustrates the recommended framework and shows how the different phases are connected:
For a deeper understanding, read Migrating data warehouses to BigQuery: Introduction and overview