Bitquery earlier introduced our new offering “Blockchain Data on the Cloud” where you can have any blockchain data uploaded to AWS.
In the previous article we saw how to extract blockchain data from AWS S3 buckets where the data was in protobuf schema.
This article will focus on extracting blockchain data from AWS S3 buckets in parquet format.
Step-by-step tutorial :
- Install and Import the necessary libraries.
import boto3
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
import parquet
The boto3
library allows us to interact with Amazon S3, the pyarrow
library allows us to read and write Parquet files, the pandas
library allows us to manipulate DataFrames, and the parquet
library provides additional functions for working with Parquet files.
- Set up your AWS credentials.
For this step, you need to get your AWS Access Keys.
- Go to your AWS Console AWS Management Console
- Navigate to your profile → Security credentials → Generate Access Key
s3 = boto3.client('s3', aws_access_key_id='your id', aws_secret_access_key='your key', region_name='us-east-1')
Replace your id
and your key
with your actual AWS access key ID and secret access key. You can find these values in the AWS Management Console.
- Specify the S3 bucket and file name.
bucket_name = 'parquet-bsc'
bsc_key='dex_trades_tx/2020-09-12_410000_88573E9FA803AF8D_10000.parquet'
The above mentioned bucket details are from the sample.
Replace parquet-bsc
with the name of your S3 bucket and dex_trades_tx/2020-09-12_410000_88573E9FA803AF8D_10000.parquet
with the name of the Parquet file you want to download.
- Download the Parquet file to a local path.
bsc_local_path = 'C:/Your Path/sample.parquet'
file_obj = s3.download_file(bucket_name, bsc_key, bsc_local_path)
The download_file()
method will download the Parquet file from the S3 bucket to the local path you specified.
- Read the Parquet file into a Pandas DataFrame.
df = pd.read_parquet(bsc_local_path)
The read_parquet()
method will read the Parquet file into a Pandas DataFrame.
- Write the DataFrame to a CSV file.
df.to_csv('parquet_output.csv', index=False)
The to_csv()
method will write the Pandas DataFrame to a CSV file. The index=False
argument tells the method not to write the DataFrame index to the CSV file.