Glue
#aws #analytics #glue #emr
Glue is serverless service for extract, transform, and load (ETL) service. Use for preparing and transforming data for analytics.
Integrate with Data warehouse
![[Drawing 2023-03-03 17.49.11.excalidraw|600]]
Integrate with Athena for analytics
![[Drawing 2023-03-03 17.51.59.excalidraw|600]]
Glue datalog
It is Glue feature, catalog of datasets. By using crawler to collect table metadata of datasets and set it to datalog for data discovery ![[Drawing 2023-03-03 17.56.12_0.excalidraw|650]]
Glue features
- Glue Job Bookmarks: Prevent process old data.
- Elastic Views: (Materialized View), leverage a ‘virtual’ table to combine and replicate data across multiple datasources using SQL
- DataBrew: Normalization and clean data using pre-built transformation.
- Studio: New GUI
- Glue Streaming ETL: Tranform data, integrated with Kafka, Kinesis data stream, MSK.
Glue and EMR
- Glue can combined with EMR for big data processing leveraging Spark. We can use Glue for load and transform data then put it into the S3 as results.