Import Bitcoin Historical Csv Data to Kdb+
This article is part of a wider tutorial, that outlines the major components that could be used to make a bitcoin trading system based upon kdb+ tick. It's intended as an exercise for those that have completed the TimeStored online kdb+ training course, if you have completed one of our courses contact us to download the source code.
Loading Data into kdb+
Bitcoin Historical Data
Historical data within kdb+ is stored in a process called the HDB. However before we can load up that process we must first import our data from our raw data files then agree on and implement a database structure optimised for our system and the queries we would like to consider.
Parsing and Importing Files
Recommended Tutorials: Casting and Parsing, Importing Data into kdb+
Bitcoin historical data is available as CSV files here. We want to create a function that will:
- Dynamically pull the list of files (hint: use wget)
- Create an empty table on disk (splayed table tutorial)
- Download each zip file, expand, parse the csv's and insert their data to the master table (hint: use gunzip / 7z)
- Finally we'll add exchange columns, a kdb time format column and a currency column
Starting from a CSV file such as mtgoxUSD.csv that has three columns (timestamp, price, size):
Once you have
- Read the data from file and parsed it using 0:
- Pulled the currency and exchange from the filename.
- Converted the unix timestamp to kdb format
Structuring Our Database
Our bitcoin data is tiny (1.5GB in kdb+ format) on a good system we could easily store it fully in RAM. However for the purposes of learning and to allow efficient access from a slow laptop with a single drive and limited RAM, we want to consider partitioning our data. Partioning is logically dividing a database into distinct independent parts to increase manageability and performance. We cover partitioning in our Partitioned Database Tables tutorial, you should remember the diagram:
Partitioning by date as is common with typical stock data is too fine grained for our data, it would result in thousands of files less than 1kb in size.
I suggest partitioning on currency, then sorting the data by time. This will allow fast access for queries such as:
What was the USD Bitcoin rate between 17th and 23rd of June
Pseudocode
The functions you will want to create are:
Function | Description |
---|---|
getCsvFileList | return table of csv zip file names at bitcoincharts (http://api.bitcoincharts.com/v1/csv/)
Columns are: file, d-date, t-time, size, link |
downToFolder:[urls; folder] | Delete folder and then fill with files from urls. |
unzipAndGet[zdir; csvZipFile] | Unzips file to csv directory, parses csv files and returns table
return table with columns ts,price,size,cur,ex,tp where each row is a trade |
These then give a main loop of events:
- Get the csv zip file list.
- Download the files.
- Create empty on disk table.
- Unzip and load each file to disk.
- Sort and apply attributes.