How to write Pandas Dataframe as compressed zip file to S3?

Rajashree
1 min readApr 22, 2021

I thought this would be simple task, but I didn’t get a reference how to write .zip file instead of .gz file

Everyone might not have 7-Zip installed, hence I have to create a .zip file. Also I don’t want to save the compressed file in temp memory which incurs a lot of memory incase of large pandas dataframe.

import pandas as pd
from zipfile import ZipFile
from io import BytesIO
import boto3
my_dict = { 'name' : ["a", "b", "c", "d", "e","f", "g"],
'age' : [20,27, 35, 55, 18, 21, 35],
'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]}
df = pd.DataFrame(my_dict)
# returns string format of df
csv_data = result_df.to_csv(index=False)
# create bytes buffer
csv_buffer = BytesIO()
# csv_data holds the dataframe which we want to save into the zip as a csv file
with
ZipFile(csv_buffer, mode='w') as zfile:
zfile.writestr("my_df.csv", csv_data)
# #Go to beginning
csv_buffer.seek(0)
# Write buffer to S3 object
s3_resource = boto3.resource("s3", region_name=REGION)
upload_response = s3_resource.Object("bucket", "path/to/my_df.csv.zip").put(Body=csv_buffer.read())

--

--