How to write Pandas Dataframe as compressed zip file to S3?

1 min readApr 22, 2021

I thought this would be simple task, but I didn’t get a reference how to write .zip file instead of .gz file

Everyone might not have 7-Zip installed, hence I have to create a .zip file. Also I don’t want to save the compressed file in temp memory which incurs a lot of memory incase of large pandas dataframe.

import pandas as pd
from zipfile import ZipFile
from io import BytesIO
import boto3my_dict = { 'name' : ["a", "b", "c", "d", "e","f", "g"],
            'age' : [20,27, 35, 55, 18, 21, 35],
            'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO",     "MD"]}
df = pd.DataFrame(my_dict)# returns string format of df
csv_data = result_df.to_csv(index=False)# create bytes buffer
csv_buffer = BytesIO()# csv_data holds the dataframe which we want to save into the zip as a csv file
with ZipFile(csv_buffer, mode='w') as zfile:
    zfile.writestr("my_df.csv", csv_data)# #Go to beginning
csv_buffer.seek(0)# Write buffer to S3 object
s3_resource = boto3.resource("s3", region_name=REGION)
upload_response = s3_resource.Object("bucket", "path/to/my_df.csv.zip").put(Body=csv_buffer.read())

How to write Pandas Dataframe as compressed zip file to S3?

Written by Rajashree