Thursday 15 September 2016

Loading and Manipulating Historical Data From .csv Files

In my last post I said I was going to look at data wrangling my data, and this post outlines what I have done since then.

My problem was that I have numerous csv files containing historical data with different date formats and frequency, e.g. tick level and hourly and daily OHLC, and in the past I have always struggled with this. However, I have finally found a solution using the R quantmod package, which makes it easy to change data into a lower frequency. It took me some time to finally get what I wanted but the code box below shows the relevant R code to convert hourly OHLC, contained in one .csv file, to daily OHLC which is then written to a new .csv file.
library("quantmod", lib.loc="~/R/x86_64-pc-linux-gnu-library/3.3")
price_data = read.csv( "path/to/file.csv" , header = FALSE )
price_data = xts( price_data[,2:6] , order.by = as.Date.POSIXlt( strptime( price_data[,1] , format = "%d/%m/%y %H:%M" , tz = "" ) ) )
price_data_daily = to.daily( price_data , drop.time = TRUE )
write.zoo( price_data_daily , file = "path/to/new/file.csv" , sep = "," , row.names = FALSE , col.names = FALSE )
To finally achieve such a small snippet of working code I can't believe how much time I had to spend reading documentation and looking online.

This next code box shows Octave code to load the above written .csv file into Octave
fid = fopen( 'path/to/file' , 'rt' ) ;
data = textscan( fid , '%s %f %f %f %f' , 'Delimiter' , ',' , 'CollectOutput', 1 ) ;
fclose( fid ) ;
eurusd = [ datenum( data{1} , 'yyyy-mm-dd' ) data{2} ] ;
clear data fid
Hopefully, in both cases, manipulating the format strings "%d/%m/%y %H:%M" and 'yyyy-mm-dd' in these two respective code snippets will save you the hours I spent.

Useful links that helped me are:

Saturday 3 September 2016

Possible Addition of NARX Network to Conditional Restricted Boltzmann Machine

It has been over three months since my last post, due to working away from home for some of the summer, a summer holiday and moving home. However, during this time I have continued with my online reading and some new thinking about my conditional restricted boltzmann machine based trading system has developed, namely the use of a nonlinear autoregressive exogenous model in the bottom layer gaussian units of the CRBM. Some links to reading on the same are shown below.
The exogenous time series I am thinking of using, at least for the major forex pairs and perhaps precious metals, oil and US treasuries, is a currency strength indicator based on the US dollar. In order to create the currency strength indicator I will have to delve into some data wrangling with the historical forex data I have, and this will be the subject of my next post.