Summary
Cherre supports the ability to ingest data from an Amazon S3 bucket. The S3 bucket can be set up in either the client environment or Cherre’s. The purpose of this guide is to outline the steps required to set up access to S3 for Cherre to leverage to ingest data.
The implementation is very straightforward and leverages standard Amazon S3 roles and permissions, with the two delivery options outlined below:
Cherre S3 Bucket
The partner or client uploads data to a bucket owned and managed by Cherre
Cherre handles all aspects of access and permission management
Organized bucket structures (e.g., using prefixes or specific paths) within Cherre-owned buckets are mutually agreed upon
The partner or client shares their own AWS user_id, and Cherre grants that user access to emulate a role on the Cherre bucket
Client S3 Bucket
The partner or client creates an IAM Role in their AWS account
In the trust policy, specify Cherre’s AWS account/user/role as the trusted principal
Attach permissions to the Role
Attach a policy granting the necessary S3 permissions (e.g., s3:GetObject, s3:ListBucket) to the bucket
Share the Role ARN with Cherre
This allows Cherre to assume the role from our user
Cherre’s ingest can then assume the Role and access data
Nice to have: The shared bucket should share all files that need to be ingested by Cherre into a clean folder structure so that our ingestion can point to a single location in the bucket
Introduction to Amazon S3 Buckets
Amazon’s S3 bucket offering offers a scalable, secure and reliable solution for storage and management of data. S3 buckets can be set up, managed and owned by Cherre’s partners or clients, but Cherre also hosts S3 buckets that can be used for data delivery where needed. Throughout this implementation guide, Amazon Resource Name (ARN) is referenced multiple times for bucket identification. Details about ARNs can be found here.
Implementation Checklist
The process of Cherre ingesting data from an S3 bucket varies slightly depending on the ownership of the bucket. Both paths are outlined below:
Cherre S3 Bucket
The partner or client creates an AWS user
The partner or client shares the user’s ARN with Cherre
Cherre grants the user ARN permission to access our bucket
The partner or client provides data to the Cherre S3 bucket
Cherre’s ingest pulls the data via the AWS access key + secret
Client S3 Bucket
Cherre shares a user ARN with the client for the purposes of data access
The partner or client creates an AWS role, grants Cherre's user ARN permission to assume the role, and grants the role permission to access and read data from the bucket
The partner or client shares the role ARN and bucket id with Cherre
Cherre’s ingest pulls the data via the AWS access key + secret + role_arn
Best Practices
Bucket Ownership by Data Partner or Client: It is generally smoother for the data partner or client to own the S3 bucket.
Organized Bucket Structure: Maintaining a well-organized bucket structure with specific paths can significantly simplify data ingestion, regardless of bucket ownership.