Athena
#aws #analytics #athena #federated #query
Athena is the serverless query service to analyze data store in Amazon S3
Specifications
- Support SQL
- Data format: CSV, JSON, Parquet, ORC, Arvo
- Commonly used with Quicksight for integration for reporting
Use cases
- Reporting, analyze logs, business intelligence
- Analyze data in S3 by using SQL language
Performance Optimization
- Use columnar data for cost-savings (less-scan): Use Glue for conversion to Apache Parquet or ORC.
- Compress data for small retrievals
- Partition data: Example: s3://athena/flight/2019/10/10 to analyze data on day/month or year.
- Use larger file (>128MB) to minimize overhead
Athena Federated Query
- SQL query for analyzing on any datasource (AWS or on-premise) by using Data Source Connector that runs on Lambda function.
Athena Federated Query Diagram
![[Drawing 2023-03-03 16.49.52.excalidraw|600]]