Redshift is an AWS warehouse solution. We have structured datasets, and we don't load all the amplitude data into Redshift. We first do this via Hudl, a data integration solution partner, but then later, it's directly loaded by an interaction. Then we run DBT against Redshift. We have our data models in DBT, and we run data analytics threats against the data warehouse.
Service accounts are used in both Amazon Redshift and Google Cloud. For example, I could create a service account for my desktop to access Redshift or a service account for multiple users to access Redshift. In BigQuery, creating a service account is very simple, and you get full control over the access, so you can limit what the service account can do. This prevents accidental exposure of data or deletion of data. Only certain features are available, which is very handy.
Postgres syntax requires 25 synthetic scrubs to Postgresify. It's handy, but there are no blockers when using the query. It's more competitive, but the price is very reasonable. I was always aware of what I would pay, and if I reserved servers, I knew what it would cost. There is no alternative in choosing a solution. We had to use the server version of AWS, but it had limited features. A few features were lacking, which couldn't front Redshift against it or access it from the API. We had our nodes, which were sent from Amazon. It has a minimal setup, with two services running only.
It was predictable because the performance was good. When a complex BBT model was running, we reached its limits. If there was a one-node setup, not all the storage was available on the server. For example, in a machine with 72 gigabytes of storage, only four were available in a single setup. I had another node, with 64Gb. All the storage of the two servers was available and when you are running these complex queries, it's not only a bit of computing but also temporarily eats up the storage. I couldn't use a single server because temporary tables ate up the storage. BigQuery’s authentication is straightforward. Besides that, it's doing what it's expected to do. There are no major problems.
It would be good to see Redshift as a serverless offering. The proposition may be unclear, but at the time, there were certain limitations with the pay-as-you-go offering. However, a serverless offering would be more flexible on-demand pricing, which would be good to see because Redshift is not expensive, but I always have to buy a new server if I need more computing than I have. Setting up a new server is an easy task, but it would be better if I could scale my Redshift cluster up or down as needed; still, there is a need for manual control. For example, my analyst team is working on a job that requires a lot of computing and is only needed for this month, week, or even today. The job should scale up and down automatically, but it is not yet fully developed.
I have been using Amazon Redshift for one and a half years.
We've had some cases where queries would get stuck, and we'd be on them for ages. I don't have the transparency to see what other queries are already running or if we're running out of some kind of resource. There weren't many major problems, but sometimes we'd get these annoying issues, especially when running complex queries.
If we can immediately set up new servers, it's easy to do, but an automatic solution or a threshold would be ideal. This feature may be already available, but I'm not sure. We have three users using this solution. I rate the solution’s scalability a seven out of ten.
Amazon Redshift support is not always available, so it can be challenging to reach them. You have to buy time and schedule with them. There is no real need for a technical hub, but it is not there when there is a need.
The initial setup wasn't very complex.
The solution has very competitive pricing. It can be expensive for the first time when you are building your site. Time and the amount of data also take some time to downsize. It would be cheaper than to have a server, but for Plexigos storage, you have to buy a specific size of compute power. Initially, it was more expensive than BigQuery pay-as-you-go, but it got cheaper later. The more data you have, the relative ratio becomes cheaper. It depends on the use case. In AWS, you must invest and understand the setups, such as what kind of servers you need. Then, you can set up your own, which can be very cheap. Redshift can be engineering-focused to set up, which is not ideal. Azure and Google Cloud, are more efficient for data analysts who are not data engineers. But it can be effective once you get used to it and set up a process. If you are utilizing the on-demand stuff, Redshift is the only vendor offering a dedicated service.
From time to time, the solution needs to be restarted for maintenance. I recommend BigQuery over Amazon Redshift. I don't have experience with Snowflake, but it's set to be more feature-rich than BigQuery or HSA. I was more happy using BigQuery. Redshift is doing what it's expected to do, but you had to invest in learning the setup. Overall, I rate the solution a seven out of ten.