Google Cloud Professional Architect Certification Notes

For the past several months, I have been enrolled in Coursera to learn Google Cloud courses thinking of getting my certification by end of third quarter of 2019. I am happy to announce I took the exam and completed the certification on 9/13. Prior to that I might have spent many hours of video content and google cloud documentation. It is amazing how Google has built the infrastructure services thoughtfully to solve each and every aspect of application development and hardware allocation in spite of being late entrant to the cloud offerings. In summary, my journey is below:

  1. Enrolled in Google Kickstart programs offered in collaboration with Coursera
  2. Completed challenge quests in Qwicklabs
  3. Took five practice exams in Udemy (hat tip to Nizam Guntakal for recommending it)
  4. Reviewed notes from fellow architect

Overall enjoyed the journey!

 See my certification. 

Google Cloud Platform: Reference architecture for Data Warehouse

It had been a great journey to learn and understand Google Cloud Platform also called as GCP. Among the top cloud providers, Google seemed to have nailed the cloud technology very well. As I was exploring the services, I got some Proof of concepts working and some reference architecture defined. One of the reference architecture defined is for the Data Warehouse. Our use case is creating an analytics dashboard and reporting platform for the internal and external users. The solution requires three standard serverless services from Google Cloud Platform:

  1. Cloud Dataflow – fully managed service from Google for streaming, batch-processing and enriching the data ingested into various storage options in Google
  2. BigQuery – serverless, highly-scalable, cloud Data Warehouse with a built-in In-memory BI Engine and Machine learning capabilities
  3. DataStudio – serverless BI engine, highly-scalable with flexible suite of data analytics tool

In this case, let us assume there are four sources Source 1, 2 and 3 residing within US and Source 4 residing outside of US requiring some data separation. Cloud Dataflow powered by Apache Beam can be utilized to stream or batch ingest the data from the sources. We can develop a pipeline for one source and leverage it for other sources. Dataflow can be created with Java or Python. Once we have the data, the industry practice is to have a Data Lake in BigQuery to store the raw data for in-depth analytics or run machine learning algorithms. From there dimensional modeling may or may not be required depending on the nature of the end output. For the benefit of clarity BigQuery is shown in both the Data Lake and Data Warehouse but the structures may reside as one. Both Google DataStudio which is now called Google Cloud BI Solution and Tableau are visualization solutions. Either of these could be extended to support the goal of the organization. This provides a high level overview for a Data Warehousing reference architecture in the Google Cloud Platform.