What is our primary use case?
We are getting some data into our
BigQuery. Once our data comes in, dataflow jobs trigger automatically when it sees the data got refreshed in the
BigQuery table. These jobs use business rules to load the data, after trimming and massaging, into the final table.
What is most valuable?
The integration within
Google Cloud Platform is very good. We have compiled graphs showing everything in a pictorial format, and we perform all the necessary coding in Python files kept in
GitHub. YAML and JSON files are created for pre-commits, and the best part is it runs within
Google Cloud Platform. It sends email communication to the end user if any job fails.
What needs improvement?
I am not sure, as we built only one job, and it is running on a daily basis. Everything else is managed using BigQuery schedulers and Talend. However, occasionally, dealing with a huge volume of data causes failure due to array size. This might be an issue with Talend component JAR files.
For how long have I used the solution?
I built a solution using
Google Cloud Dataflow, initially because queries that were too long would not work in BigQuery schedulers. We built
Google Cloud Dataflow during that time and have been using it this year, in 2023. It is still in progress.
What was my experience with deployment of the solution?
It is inbuilt. If you have Google Cloud Platform, you just need to activate it. We didn't do any installation, but the platform team provided the access. It's a plugin case.
What do I think about the stability of the solution?
I can rate it eight out of ten. The job we built has not failed once over six to seven months.
What do I think about the scalability of the solution?
I rate it nine out of ten. As a team lead, I'm responsible for handling five to six applications, but Google Cloud Dataflow seems to handle our use case effectively.
How are customer service and support?
The customer service and support are very good. Whenever we have issues, we can consult with Google.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Previously, we used SAP BODS, an ETL tool, and we migrated directly to Google Cloud Platform, where we utilize BigQuery and schedule queries.
How was the initial setup?
It is inbuilt within Google Cloud Platform, and no specific setup was required on our side. We coordinated with the platform team for access.
What about the implementation team?
We did not require an external team; our internal team managed it with the platform team for access.
What's my experience with pricing, setup cost, and licensing?
Pricing is normal. It is part of a package received from Google, and they are not charging us too high.
What other advice do I have?
Based on my experience, I rate Google Cloud Dataflow seven out of ten. We have a good experience with Google overall.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Google