S3 multipart upload is a great feature for merging uploaded objects in a bucket. When working with large uploads using short-lived STS credentials, one might run into interruptions where resuming an upload is not possible. A solution for this case can be one of the features of multipart upload - copying parts of the objects into an object right within the S3.

Consider a 10GB database file that needs to be uploaded to S3 over low-throughput network within an hour (default session time). Using plain aws s3 sync won't work because the token will expire before file upload is finished and resuming the upload will start from the beginning.

One way to address that scenario is to split the file into smaller pieces and then sync it to S3. This will handle an interruption by picking up at the left over piece. Once all the pieces are in S3, they can be concatenated using S3's UploadPartCopy API. I put together a CLI utility for that called s3welder, let's go through its usage.

Test case

Split a large file into 10MB pieces (5MiB < file size < 5GiB). Assumes bash shell.

$ mkdir parts
$ split -b 10M largefile parts/part-
$ aws s3 sync parts s3://acme-bucket/
upload: parts/part-aa to s3://acme-bucket/part-aa
...

Installation

Tested with Python 3.9+.

$ python -m venv venv
$ ./venv/bin/activate
(venv) $ pip install https://gitlab.com/sherzoddotcom/s3-welder/-/archive/master/s3-welder-master.tar.gz

Usage

List bucket objects (or parts of the larger file)

(venv) $ s3welder list acme-bucket
part-aa
...

Merge the objects in acme-bucket into largefile and re-list the bucket

(venv) $ s3welder merge acme-bucket--final-object largefile
(venv) $ s3welder list acme-bucket
largefile
part-aa

Verify the checksums of your parts for integrity. Use md5sum to match ETag calculation.

$ md5sum largefile
7d6cd80952ba0fbadc3903d5d2b0dfb6 largefile
$ cat parts/* | md5sum
7d6cd80952ba0fbadc3903d5d2b0dfb6 - 

s3welder comes with checksumming command to generate ETag from parts that should match the ETag of the newly formed largefile.

$ s3welder checksum parts # <-- parts is the folder name
91afb433e56b64be30c7411fe2e87bda-6