simalexan

How to create an S3 Batch Operations Job with Lambda Invoke

Recently, I have been playing a lot with AWS S3 Batch Operations. The service is amazing for processing large amounts of data in parallel. The only downside is that it is not very well documented and there are a few steps to do before you can start processing your data.

Sometimes I prefer a minimum effective dose of a post, so here is an example of creating an AWS S3 Batch Operations job using the AWS SDK for JavaScript (v3) in TypeScript:

Minimal Example

import { S3ControlClient, CreateJobCommand } from "@aws-sdk/client-s3-control";

// Q: why s3-control and not s3? 
// A: S3ControlClient is used for account-level operations like Batch Operations,
//    while the regular S3Client is for object/bucket-level operations.
//    Since Batch Operations work across multiple objects, we need S3ControlClient.
const client = new S3ControlClient({ region: "us-east-1" });

(async () => {
  try {
    const command = new CreateJobCommand({
      AccountId: '123456789012', // Your AWS account ID
      Manifest: {
         // this defines the manifest format: CSV Format + Bucket and Key fields
        Spec: {
          Format: "S3BatchOperations_CSV_20180820",
          Fields: ["Bucket", "Key"]
        },
        /**
         * this defines the location of the manifest file.
         * you need to have:
         * - a separate bucket with the manifest file, (same account)
         * - the correct etag of the manifest file.
         * */
        Location: {
          ObjectArn: "arn:aws:s3:::my-manifest-bucket/my-manifest.csv",
          ETag: "b7f9b5a8f7f1c8a9f9e17f654e92859c"
        }
      },
      Operation: {
        // Example: run a Lambda function against each object
        LambdaInvoke: {
          FunctionArn: "arn:aws:lambda:us-east-1:123456789012:function:my-lambda-fn"
        }
      },
      Report: {
        /**
         * this defines the location where to put the report file.
         * you should have:
         * - a separate bucket with the report file, (same account)
         * - the correct etag of the report file.
         * */
        Bucket: "arn:aws:s3:::your-report-bucket",
        Format: "Report_CSV_20180820",
        Enabled: true,
        Prefix: "reports",
        ReportScope: "AllTasks",
      },
      Priority: 1,
      RoleArn: "arn:aws:iam::123456789012:role/S3BatchOpsRole",
      ConfirmationRequired: false, // by default its true, but 
      // Reason for false: we want to run the job without confirming
      Description: "Your S3 Batch Operations Job"
    });

    const response = await client.send(command);
    console.log("Job created:", response);
  } catch (err) {
    console.error("Error creating job:", err);
  }
})();

For this to run and create an S3 Batch Job, we need to create:

  • an IAM role (the role that will perform the operations on the objects)
  • a manifest file (all files that we want to process)
  • a Lambda function (the function that will perform the operations on the objects)
  • a separate bucket for the manifest file (the bucket that will contain the manifest file)
  • a separate bucket for the report file (the bucket that will contain the report file)

IAM Role

Create an IAM role with the trust relationship of the role to be assumed by the S3 Batch Operations service:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "batchoperations.s3.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Attach the following IAM policy to the role. so that the role can perform operations on the objects in the bucket:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "s3:PutObject",
        "s3:PutObjectAcl",
        "s3:PutObjectTagging",
        "s3:GetObject",
        "s3:GetObjectVersion" // if you are using versioning
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::YOUR-BUCKET-NAME/*"
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:GetObjectAcl",
        "s3:GetObjectTagging",
        "s3:ListBucket"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::YOUR-BUCKET-NAME",
        "arn:aws:s3:::YOUR-BUCKET-NAME/*"
      ]
    },
  ]
}

Manifest

Here is an example of a CSV manifest file:

Bucket,Key
my-bucket,prefix/my-object-1
my-bucket,prefix/my-object-2

Note: the entries in the manifest file are separated by a newline.

Lambda Function

The Lambda function is the function that will perform the operations on the objects.