How to access blockchain data from AWS S3 - Parquet Format

Bitquery earlier introduced our new offering “Blockchain Data on the Cloud” where you can have any blockchain data uploaded to AWS.
In the previous article we saw how to extract blockchain data from AWS S3 buckets where the data was in protobuf schema.
This article will focus on extracting blockchain data from AWS S3 buckets in parquet format.

Step-by-step tutorial :

  1. Install and Import the necessary libraries.
import boto3
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd 
import parquet

The boto3 library allows us to interact with Amazon S3, the pyarrow library allows us to read and write Parquet files, the pandas library allows us to manipulate DataFrames, and the parquet library provides additional functions for working with Parquet files.

  1. Set up your AWS credentials.

For this step, you need to get your AWS Access Keys.

s3 = boto3.client('s3', aws_access_key_id='your id', aws_secret_access_key='your key', region_name='us-east-1')

Replace your id and your key with your actual AWS access key ID and secret access key. You can find these values in the AWS Management Console.

  1. Specify the S3 bucket and file name.
bucket_name = 'parquet-bsc'
bsc_key='dex_trades_tx/2020-09-12_410000_88573E9FA803AF8D_10000.parquet'

The above mentioned bucket details are from the sample.
Replace parquet-bsc with the name of your S3 bucket and dex_trades_tx/2020-09-12_410000_88573E9FA803AF8D_10000.parquet with the name of the Parquet file you want to download.

  1. Download the Parquet file to a local path.
bsc_local_path = 'C:/Your Path/sample.parquet'

file_obj = s3.download_file(bucket_name, bsc_key, bsc_local_path)

The download_file() method will download the Parquet file from the S3 bucket to the local path you specified.

  1. Read the Parquet file into a Pandas DataFrame.
df = pd.read_parquet(bsc_local_path)

The read_parquet() method will read the Parquet file into a Pandas DataFrame.

  1. Write the DataFrame to a CSV file.
df.to_csv('parquet_output.csv', index=False)

The to_csv() method will write the Pandas DataFrame to a CSV file. The index=False argument tells the method not to write the DataFrame index to the CSV file.