What is our primary use case?
Amazon Athena is mostly used for querying data. In teams with analysts and project managers, there are many AWS services helping to understand data, such as Redshift and Amazon Athena. The main use case is to understand, visualize, analyze, and query the data on the lake.
A significant use case for Amazon Athena is its ability to query unstructured data. For example, any data kept in S3 can be read directly and queried.
We have used Amazon Athena integration with AWS Glue. We have integrated AWS Glue with Amazon Athena with some Iceberg tables. Iceberg is currently the de-facto standard that Amazon Athena supports. We have Glue Catalog on top of that, and we query those Iceberg tables with Amazon Athena.
Amazon Athena's ability to query structured and unstructured data has been beneficial. For a startup, procuring a database such as Redshift is challenging to maintain. You have to run jobs and hire engineers to maintain the database. To remove these hassles, a startup can easily put the data on cloud and query it with Amazon Athena. It reduces operational overhead significantly. If your data scale is not extensive, you can query very cost-effectively because Amazon Athena's pricing is approximately $5 per TB of scan.
What is most valuable?
Amazon Athena is very compatible with data lake concepts. You can put all your data in S3, and there are different data formats. In the last few years, many open data formats have emerged. Amazon Athena supports formats such as Parquet, CSV, and Iceberg natively. It also supports Hoodie, created by Uber, and Avro. It has extensive support for different types of data structures.
Amazon Athena is cost-effective and performs efficiently. Being serverless means you don't need any compute resources. You don't have to manage how long the data will take, making it quite scalable.
What needs improvement?
Amazon Athena is based on Trino, an open database. When a company wants to run ETL on Amazon Athena, they cannot do it easily. For instance, if you want to delete something on a primary key or perform CRUD operations with Step Functions to automate processes, these operations are not straightforward in Amazon Athena.
Transaction support is one of the biggest missing features. If you are running multiple statements, such as a delete followed by an insert, and something goes wrong during insertion, the deletion should be reverted, but that doesn't happen. We have to implement workarounds, whereas these capabilities are available in Redshift.
While Amazon Athena has notebook support where analysts can write their work, scheduling these notebooks is not user-friendly. If an analyst wants to schedule a notebook to trigger at a specific time, they need a developer's assistance.
There should be unanimous access management in Amazon Athena, which is not readily available. Though they have Lake Formation and other features, there isn't one place to manage access. For example, restricting access to specific columns for particular users requires alternative approaches.
The service is only available on-demand from AWS, not through the Marketplace.
For how long have I used the solution?
I have been working with Amazon Athena for approximately five years, including usage in my previous organization.
How are customer service and support?
I contacted Amazon support regarding Amazon Athena long ago. After using it for so long, I have discovered that I know more than some support staff. I had an issue with transactions about a year ago, and I found that their support staff sometimes lacks extensive experience. While they have knowledge, I have had the opportunity to work with the infrastructure hands-on. The support people AWS provides often don't have much context, though senior staff members might have more expertise.
How would you rate customer service and support?
Which other solutions did I evaluate?
I have evaluated various solutions including Databricks, Snowflake, and Google BigQuery. For any startup operating in one cloud, it is better to remain in that ecosystem rather than managing multiple clouds as it complicates the infrastructure. Since we were in AWS, we chose to use their available services instead of moving outside, following the industry trend of consolidation.
What other advice do I have?
We tried Amazon QuickSight with Amazon Athena for visualization, but we are using Superset on top of that.
The ecosystem is straightforward and excellent to get started with. If organizations don't want to spend time managing databases and prefer focusing on their business terms to move quickly, Amazon Athena is a good solution.
Amazon Athena can become costly if analysts lack knowledge of partition keys and querying tables efficiently. Since the pricing is $5 per TB, large queries on big datasets can result in substantial bills. There aren't many resources available to help optimize queries in Amazon Athena, with only a few blogs available. They should provide more information about query optimization.
I rate Amazon Athena an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)