AWS S3 Tables is a relatively new service, introduced at AWS re:Invent 2024, designed to simplify structured data access directly from Amazon S3. This review shares early impressions and practical insights based on initial hands-on use in FinOps scenarios.
Overview
AWS S3 Tables provides a practical solution bridging Amazon S3’s object storage with structured data querying capabilities. It simplifies processes traditionally managed by AWS Glue and manual ETL workflows, enabling quicker, easier data accessibility. AWS utilizes the Apache Iceberg storage format under the hood, aiming to offer significant improvements over traditional Parquet storage in terms of data management and query performance.
This review is based on my practical experience with AWS S3 Tables primarily from a FinOps perspective, involving the analysis of Cost and Usage Reports, Compute Optimization data, and Trusted Advisor recommendations.
Technical Evaluation
AWS S3 Tables notably streamlines the transformation of raw data into queryable formats. For users accustomed to leveraging AWS Lambda for API data extraction and storing JSON or CSV files in S3, S3 Tables simplifies integration by removing the need for Glue Crawlers and manual table creation. Iceberg tables offer advanced data organization capabilities compared to Parquet, including schema evolution, time-travel querying, and improved transactional operations.
AWS claims significant performance benefits using Iceberg, suggesting up to 3x faster query throughput and up to 10x higher transactions per second compared to self-managed tables. This makes S3 Tables especially appealing for projects requiring efficient querying and frequent updates.
Pros
- Simplifies data querying and reduces ETL complexity.
- Leverages familiar SQL querying (via Athena).
- Potentially reduces operational costs and simplifies FinOps management.
- Serverless and scalable integration with existing AWS services.
- Built on Apache Iceberg, enabling enhanced schema management, time-travel queries, and transactional capabilities.
Cons
- Costs may become unpredictable with frequent access or complex querying patterns.
- Lacks robust transactional support, potentially impacting real-time data applications.
- Limited direct support for infrastructure automation tools like CloudFormation and AWS CDK.
- Encountered difficulties integrating AWS Step Functions executing SQL queries in Athena against S3 Tables, likely related to access management complexities with AWS Lake Formation.
Summary
AWS S3 Tables is beneficial for teams seeking simplified structured data management without the traditional complexity of Glue-based workflows. While AWS’s performance claims are promising, users should remain cautious regarding potential cost unpredictability, access management intricacies, and ensure it aligns with their project’s transactional requirements and infrastructure management capabilities.