Datasets

Lists

Stack Exchange Data Dump

Google Cloud Crypto Dataset

https://cloud.google.com/bigquery/public-data/

bigquery-public-data.crypto_bitcoin

Big query:

Each BigQuery dataset contains several tables and views. Views are virtual tables, defined by queries upon actual tables. Unlike tables, views are not persisted. So it is important to store query results into a destination table if you want to persist it (instead of saving as a view).

Standard SQL is the default language for Big Query. It is basically same as the "Legacy SQL", but can be surprising in a lot of ways. Let's start with some of the snippets for our task - extracting address-transaction graph from the complete transactions datastore:

SELECT * FROM `bigquery-public-data.crypto_bitcoin.inputs`
WHERE (
  SELECT COUNT(1) FROM UNNEST(addresses) AS addr
  WHERE addr IN ("1GoXSErTnH7GsaYRCe37mqWU6RK6yn4Fqy",
                 "1C36o6D3VJNfEUT2dznZkYA2yW3ARzMgYe",
                 "12DSitB2SDq5Vid6vjcMw8yzEG7RnfsFGh")
) >= 1
AND
block_number > 570000
LIMIT 10

For each input, it might be associated with multiple addresses that contribute to this portion of money, but normally there is only a single address that pays for that input value. Using the expression UNNEST(addresses) , we transformed the ARRAY type field into a set of rows (for more, check out documentation here). We will count the number of addresses that is one of "1GoXS..", "1C36o..", "12DSi.. " for each input. We return all inputs that have at least such address and has a block_number more than 570000 (at the time of writing, they mean pretty recent blocks).