How to access blockchain data from AWS S3 - Parquet Format

Divya · August 29, 2023, 12:27pm

Bitquery earlier introduced our new offering “Blockchain Data on the Cloud” where you can have any blockchain data uploaded to AWS.
In the previous article we saw how to extract blockchain data from AWS S3 buckets where the data was in protobuf schema.
This article will focus on extracting blockchain data from AWS S3 buckets in parquet format.

Step-by-step tutorial :

Install and Import the necessary libraries.

import boto3
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd 
import parquet

The boto3 library allows us to interact with Amazon S3, the pyarrow library allows us to read and write Parquet files, the pandas library allows us to manipulate DataFrames, and the parquet library provides additional functions for working with Parquet files.

Set up your AWS credentials.

For this step, you need to get your AWS Access Keys.

Go to your AWS Console AWS Management Console
Navigate to your profile → Security credentials → Generate Access Key

image1360×250 19.1 KB

s3 = boto3.client('s3', aws_access_key_id='your id', aws_secret_access_key='your key', region_name='us-east-1')

Replace your id and your key with your actual AWS access key ID and secret access key. You can find these values in the AWS Management Console.

Specify the S3 bucket and file name.

bucket_name = 'parquet-bsc'
bsc_key='dex_trades_tx/2020-09-12_410000_88573E9FA803AF8D_10000.parquet'

The above mentioned bucket details are from the sample.
Replace parquet-bsc with the name of your S3 bucket and dex_trades_tx/2020-09-12_410000_88573E9FA803AF8D_10000.parquet with the name of the Parquet file you want to download.

Download the Parquet file to a local path.

bsc_local_path = 'C:/Your Path/sample.parquet'

file_obj = s3.download_file(bucket_name, bsc_key, bsc_local_path)

The download_file() method will download the Parquet file from the S3 bucket to the local path you specified.

Read the Parquet file into a Pandas DataFrame.

df = pd.read_parquet(bsc_local_path)

The read_parquet() method will read the Parquet file into a Pandas DataFrame.

Write the DataFrame to a CSV file.

df.to_csv('parquet_output.csv', index=False)

The to_csv() method will write the Pandas DataFrame to a CSV file. The index=False argument tells the method not to write the DataFrame index to the CSV file.

Topic		Replies	Views
How to access blockchain data from AWS S3 - Protobul Format GraphQL Tutorials support , closed	1	1157	April 21, 2023
Utilizing Bitquery's Blockchain Data with Amazon Athena: A Comprehensive Guide GraphQL Tutorials support	0	299	December 13, 2023
Getting data in the form of a CSV file from Bitquery GraphQL API using NodeJS GraphQL Tutorials nodejs , closed	2	571	July 10, 2021
Exporting account transactions as a CSV file GraphQL Tutorials	2	1137	May 23, 2023
How to Use Bitquery APIs in Google Apps Script GraphQL Tutorials	0	338	July 31, 2023

How to access blockchain data from AWS S3 - Parquet Format

Step-by-step tutorial :

Related topics