Thursday, 20 April 2017

Using the BayesOpt Library to Optimise my Planned Neural Net

Following on from my last post, I have recently been using the BayesOpt library to optimise my planned neural net, and this post is a brief outline, with code, of what I have been doing.

My intent was to design a Nonlinear autoregressive exogenous model using my currency strength indicator as the main exogenous input, along with other features derived from the use of Savitzky-Golay filter convolution to model velocity, acceleration etc. I decided that rather than model prices directly, I would model the 20 period simple moving average because it would seem reasonable to assume that modelling a smooth function would be easier, and from this average it is a trivial matter to reverse engineer to get to the underlying price.

Given that my projected feature space/lookback length/number of nodes combination is/was a triple digit, discrete dimensional problem, I used the "bayesoptdisc" function from the BayesOpt library to perform a discrete Bayesian optimisation over these parameters, the main Octave script for this being shown below.
clear all ;

% all_rel_strengths_non_smooth = [ usd_rel_strength_non_smooth eur_rel_strength_non_smooth gbp_rel_strength_non_smooth chf_rel_strength_non_smooth ...
% jpy_rel_strength_non_smooth aud_rel_strength_non_smooth cad_rel_strength_non_smooth ] ;

% extract relevant data
% price = ( eurusd_daily_bars( : , 3 ) .+ eurusd_daily_bars( : , 4 ) ) ./ 2 ; % midprice
% price = ( gbpusd_daily_bars( : , 3 ) .+ gbpusd_daily_bars( : , 4 ) ) ./ 2 ; % midprice
% price = ( usdchf_daily_bars( : , 3 ) .+ usdchf_daily_bars( : , 4 ) ) ./ 2 ; % midprice
price = ( usdjpy_daily_bars( : , 3 ) .+ usdjpy_daily_bars( : , 4 ) ) ./ 2 ; % midprice
base_strength = all_rel_strengths_non_smooth( : , 1 ) .- 0.5 ;
term_strength = all_rel_strengths_non_smooth( : , 5 ) .- 0.5 ;

% clear unwanted data
% clear eurusd_daily_bars all_rel_strengths_non_smooth ;
% clear gbpusd_daily_bars all_rel_strengths_non_smooth ;
% clear usdchf_daily_bars all_rel_strengths_non_smooth ;
clear usdjpy_daily_bars all_rel_strengths_non_smooth ;

global start_opt_line_no = 200 ;
global stop_opt_line_no = 7545 ;

% get matrix coeffs
slope_coeffs = generalised_sgolay_filter_coeffs( 5 , 2 , 1 ) ;
accel_coeffs = generalised_sgolay_filter_coeffs( 5 , 2 , 2 ) ;
jerk_coeffs = generalised_sgolay_filter_coeffs( 5 , 3 , 3 ) ;

% create features
sma20 = sma( price , 20 ) ;
global targets = sma20 ;
[ sma_max , sma_min ] = adjustable_lookback_max_min( sma20 , 20 ) ;
global sma20r = zeros( size(sma20,1) , 5 ) ;
global sma20slope = zeros( size(sma20,1) , 5 ) ;
global sma20accel = zeros( size(sma20,1) , 5 ) ;
global sma20jerk = zeros( size(sma20,1) , 5 ) ;

global sma20diffs = zeros( size(sma20,1) , 5 ) ;
global sma20diffslope = zeros( size(sma20,1) , 5 ) ;
global sma20diffaccel = zeros( size(sma20,1) , 5 ) ;
global sma20diffjerk = zeros( size(sma20,1) , 5 ) ;

global base_strength_f = zeros( size(sma20,1) , 5 ) ;
global term_strength_f = zeros( size(sma20,1) , 5 ) ;

base_term_osc = base_strength .- term_strength ;
global base_term_osc_f = zeros( size(sma20,1) , 5 ) ;
slope_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_bt_osc_f = zeros( size(sma20,1) , 5 ) ;
accel_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_bt_osc_f = zeros( size(sma20,1) , 5 ) ;
jerk_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_bt_osc_f = zeros( size(sma20,1) , 5 ) ;

slope_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_base_strength_f = zeros( size(sma20,1) , 5 ) ;
accel_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_base_strength_f = zeros( size(sma20,1) , 5 ) ;
jerk_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_base_strength_f = zeros( size(sma20,1) , 5 ) ;

slope_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_term_strength_f = zeros( size(sma20,1) , 5 ) ;
accel_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_term_strength_f = zeros( size(sma20,1) , 5 ) ;
jerk_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_term_strength_f = zeros( size(sma20,1) , 5 ) ;

min_max_range = sma_max .- sma_min ;

for ii = 51 : size( sma20 , 1 ) - 1 % one step ahead is target

targets(ii) = 2 * ( ( sma20(ii+1) - sma_min(ii) ) / min_max_range(ii) - 0.5 ) ;

% scaled sma20
sma20r(ii,:) = 2 .* ( ( flipud( sma20(ii-4:ii,1) )' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ;
sma20slope(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * slope_coeffs ) ;
sma20accel(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * accel_coeffs ) ;
sma20jerk(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * jerk_coeffs ) ;

% scaled diffs of sma20
sma20diffs(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' ) ;
sma20diffslope(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * slope_coeffs ) ;
sma20diffaccel(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * accel_coeffs ) ;
sma20diffjerk(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * jerk_coeffs ) ;

% base strength
base_strength_f(ii,:) = fliplr( base_strength(ii-4:ii)' ) ;
slope_base_strength_f(ii,:) = fliplr( slope_base_strength(ii-4:ii)' ) ;
accel_base_strength_f(ii,:) = fliplr( accel_base_strength(ii-4:ii)' ) ;
jerk_base_strength_f(ii,:) = fliplr( jerk_base_strength(ii-4:ii)' ) ;

% term strength
term_strength_f(ii,:) = fliplr( term_strength(ii-4:ii)' ) ;
slope_term_strength_f(ii,:) = fliplr( slope_term_strength(ii-4:ii)' ) ;
accel_term_strength_f(ii,:) = fliplr( accel_term_strength(ii-4:ii)' ) ;
jerk_term_strength_f(ii,:) = fliplr( jerk_term_strength(ii-4:ii)' ) ;

% base term oscillator
base_term_osc_f(ii,:) = fliplr( base_term_osc(ii-4:ii)' ) ;
slope_bt_osc_f(ii,:) = fliplr( slope_bt_osc(ii-4:ii)' ) ;
accel_bt_osc_f(ii,:) = fliplr( accel_bt_osc(ii-4:ii)' ) ;
jerk_bt_osc_f(ii,:) = fliplr( jerk_bt_osc(ii-4:ii)' ) ;

endfor

% create xset for bayes routine
% raw indicator
xset = zeros( 4 , 5 ) ; xset( 1 , : ) = 1 : 5 ;
to_add = zeros( 4 , 15 ) ;
to_add( 1 , : ) = [ 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 ] ;
to_add( 2 , : ) = [ 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 ] ;
xset = [ xset to_add ] ;
to_add = zeros( 4 , 21 ) ;
to_add( 1 , : ) = [ 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 ] ;
to_add( 2 , : ) = [ 1 1 2 2 1 2 2 3 3 3 1 2 3 4 2 3 3 4 4 4 4 ] ;
to_add( 3 , : ) = [ 1 1 1 2 1 1 2 1 2 3 1 1 1 1 2 2 3 1 2 3 4 ] ;
xset = [ xset to_add ] ;
to_add = zeros( 4 , 70 ) ;
to_add( 1 , : ) = [ 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ] ;
to_add( 2 , : ) = [ 1 1 2 2 2 1 2 2 2 3 3 3 3 3 3 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ] ;
to_add( 3 , : ) = [ 1 1 1 2 2 1 1 2 2 1 2 2 3 3 3 1 1 2 2 1 2 2 3 3 3 1 2 2 3 3 3 4 4 4 4 1 1 2 2 1 2 2 3 3 3 1 2 2 3 3 3 4 4 4 4 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 ] ;
to_add( 4 , : ) = [ 1 1 1 1 2 1 1 1 2 1 1 2 1 2 3 1 1 1 2 1 1 2 1 2 3 1 1 2 1 2 3 1 2 3 4 1 1 1 2 1 1 2 1 2 3 1 1 2 1 2 3 1 2 3 4 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 ] ;
xset = [ xset to_add ] ;
% construct all_xset for combinations of indicators and look back lengths
all_zeros = zeros( size( xset ) ) ;
all_xset = [ xset ; repmat( all_zeros , 3 , 1 ) ] ;
all_xset = [ all_xset [ xset ; xset ; all_zeros ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; xset ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; all_zeros ; xset ] ] ;
all_xset = [ all_xset [ xset ; xset ; xset ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; xset ; all_zeros ; xset ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; xset ; xset ] ] ;
all_xset = [ all_xset repmat( xset , 4 , 1 ) ] ;

ones_all_xset = ones( 1 , size( all_xset , 2 ) ) ;

% now add layer for number of neurons and extend as necessary
max_number_of_neurons_in_layer = 20 ;

parameter_matrix = [] ;

for ii = 2 : max_number_of_neurons_in_layer % min no. of neurons is 2, max = max_number_of_neurons_in_layer
parameter_matrix = [ parameter_matrix [ ii .* ones_all_xset ; all_xset ] ] ;
endfor

% now the actual bayes optimisation routine
% set the parameters
params.n_iterations = 190; % bayesopt library default is 190
params.n_init_samples = 10;
params.crit_name = 'cEIa'; % cEI is default. cEIa is an annealed version
params.surr_name = 'sStudentTProcessNIG';
params.noise = 1e-6;
params.kernel_name = 'kMaternARD5';
params.kernel_hp_mean = [1];
params.kernel_hp_std = [10];
params.verbose_level = 1; % 3 to path below
params.log_filename = '/home/dekalog/Documents/octave/cplusplus.oct_functions/nn_functions/optimise_narx_ind_lookback_nodes_log';
params.l_type = 'L_MCMC' ; % L_EMPIRICAL is default
params.epsilon = 0.5 ; % probability of performing a random (blind) evaluation of the target function.
% Higher values implies forced exploration while lower values relies more on the exploration/exploitation policy of the criterion. 0 is default

% the function to optimise
fun = 'optimise_narx_ind_lookback_nodes_rolling' ;

% the call to the Bayesopt library function
bayesoptdisc( fun , parameter_matrix , params ) ;
% result is the minimum as a vector (x_out) and the value of the function at the minimum (y_out)
What this script basically does is:
1. load all the relevant data ( in this case a forex pair )
2. creates a set of scaled features
3. creates a necessary parameter matrix for the discrete optimisation function
4. sets the parameters for the optimisation routine
5. and finally calls the "bayesoptdisc" function
Note that in step 2 all the features are declared as global variables, this being necessary because the "bayesoptdisc" function of the BayesOpt library does not appear to admit passing these variables as inputs to the function.

The actual function to be optimised is given in the following code box, and is basically a looped neural net training routine.
## Copyright (C) 2017 dekalog
##
## This program is free software; you can redistribute it and/or modify it
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .

## -*- texinfo -*-
## @deftypefn {} {@var{retval} =} optimise_narx_ind_lookback_nodes_rolling (@var{input1})
##
## @seealso{}
## @end deftypefn

## Author: dekalog
## Created: 2017-03-21

function [ retval ] = optimise_narx_ind_lookback_nodes_rolling( input1 )

% declare all the global variables so the function can "see" them
global start_opt_line_no ;
global stop_opt_line_no ;
global targets ;
global sma20r ; % 2
global sma20slope ;
global sma20accel ;
global sma20jerk ;
global sma20diffs ;
global sma20diffslope ;
global sma20diffaccel ;
global sma20diffjerk ;
global base_strength_f ;
global slope_base_strength_f ;
global accel_base_strength_f ;
global jerk_base_strength_f ;
global term_strength_f ;
global slope_term_strength_f ;
global accel_term_strength_f ;
global jerk_term_strength_f ;
global base_term_osc_f ;
global slope_bt_osc_f ;
global accel_bt_osc_f ;
global jerk_bt_osc_f ;

% build feature matrix from the above global variable according to parameters in input1
hidden_layer_size = input1(1) ;

% training targets
Y = targets( start_opt_line_no:stop_opt_line_no , 1 ) ;

% create empty feature matrix
X = [] ;

% which will always have at least one element of the main price series for the NARX
X = [ X sma20r( start_opt_line_no:stop_opt_line_no , 1:input1(2) ) ] ;

% go through input1 values in turn and add to X if necessary
if input1(3) > 0
X = [ X sma20slope( start_opt_line_no:stop_opt_line_no , 1:input1(3) ) ] ;
endif

if input1(4) > 0
X = [ X sma20accel( start_opt_line_no:stop_opt_line_no , 1:input1(4) ) ] ;
endif

if input1(5) > 0
X = [ X sma20jerk( start_opt_line_no:stop_opt_line_no , 1:input1(5) ) ] ;
endif

if input1(6) > 0
X = [ X sma20diffs( start_opt_line_no:stop_opt_line_no , 1:input1(6) ) ] ;
endif

if input1(7) > 0
X = [ X sma20diffslope( start_opt_line_no:stop_opt_line_no , 1:input1(7) ) ] ;
endif

if input1(8) > 0
X = [ X sma20diffaccel( start_opt_line_no:stop_opt_line_no , 1:input1(8) ) ] ;
endif

if input1(9) > 0
X = [ X sma20diffjerk( start_opt_line_no:stop_opt_line_no , 1:input1(9) ) ] ;
endif

if input1(10) > 0 % input for base and term strengths together
X = [ X base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(10) ) ] ;
X = [ X term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(10) ) ] ;
endif

if input1(11) > 0
X = [ X slope_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(11) ) ] ;
X = [ X slope_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(11) ) ] ;
endif

if input1(12) > 0
X = [ X accel_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(12) ) ] ;
X = [ X accel_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(12) ) ] ;
endif

if input1(13) > 0
X = [ X jerk_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(13) ) ] ;
X = [ X jerk_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(13) ) ] ;
endif

if input1(14) > 0
X = [ X base_term_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(14) ) ] ;
endif

if input1(15) > 0
X = [ X slope_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(15) ) ] ;
endif

if input1(16) > 0
X = [ X accel_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(16) ) ] ;
endif

if input1(17) > 0
X = [ X jerk_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(17) ) ] ;
endif

% now the X features matrix has been formed, get its size
X_rows = size( X , 1 ) ; X_cols = size( X , 2 ) ;

X = [ ones( X_rows , 1 ) X ] ; % add bias unit to X

fan_in = X_cols + 1 ; % no. of inputs to a node/unit, including bias
fan_out = 1 ; % no. of outputs from node/unit
r = sqrt( 6 / ( fan_in + fan_out ) ) ;

rolling_window_length = 100 ;
n_iters = 100 ;
n_iter_errors = zeros( n_iters , 1 ) ;
all_errors = zeros( X_rows - ( rolling_window_length - 1 ) - 1 , 1 ) ;
rolling_window_loop_iter = 0 ;

for rolling_window_loop = rolling_window_length : X_rows - 1

rolling_window_loop_iter = rolling_window_loop_iter + 1 ;

% train n_iters no. of nets and put the error stats in n_iter_errors
for ii = 1 : n_iters

% initialise weights
% see https://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network

% One option is Orthogonal random matrix initialization for input_to_hidden weights
% w_i = rand( X_cols + 1 , hidden_layer_size ) ;
% [ u , s , v ] = svd( w_i ) ;
% input_to_hidden = [ ones( X_rows , 1 ) X ] * u ; % adding bias unit to X

% using fan_in and fan_out for tanh
w_i = ( rand( X_cols + 1 , hidden_layer_size ) .* ( 2 * r ) ) .- r ;
input_to_hidden = X( rolling_window_loop - ( rolling_window_length - 1 ) : rolling_window_loop , : ) * w_i ;

% push the input_to_hidden through the chosen sigmoid function
hidden_layer_output = sigmoid_lecun_m( input_to_hidden ) ;

% add bias unit for the output from hidden
hidden_layer_output = [ ones( rolling_window_length , 1 ) hidden_layer_output ] ;

% use hidden_layer_output as the input to a linear regression fit to targets Y
% a la Extreme Learning Machine
% w = ( inv( X' * X ) * X' ) * y ; the "classic" way for linear regression, where
% X = hidden_layer_output, but
w = ( ( hidden_layer_output' * hidden_layer_output ) \ hidden_layer_output' ) * Y( rolling_window_loop - ( rolling_window_length - 1 ) : rolling_window_loop , 1 ) ;
% is quicker and recommended

% use these current values of w_i and w for out of sample test
os_input_to_hidden = X( rolling_window_loop + 1 , : ) * w_i ;
os_hidden_layer_output = sigmoid_lecun_m( os_input_to_hidden ) ;
os_hidden_layer_output = [ 1 os_hidden_layer_output ] ; % add bias
os_output = os_hidden_layer_output * w ;
n_iter_errors( n_iters ) = abs( Y( rolling_window_loop + 1 , 1 ) - os_output ) ;

endfor

all_errors( rolling_window_loop_iter ) = mean( n_iter_errors ) ;

endfor % rolling_window_loop

retval = mean( all_errors ) ;

clear X w_i ;

endfunction

However, to speed things up for some rapid prototyping, rather than use backpropagation training this function uses the principles of an extreme learning machine and loops over 100 such trained ELMs per set of features contained in a rolling window of length 100 across the entire training data set. Walk forward cross validation is performed for each of the 100 ELMs, an average of the out of sample error obtained, and these averages across the whole data set are then averaged to provide the function return. The code was run on daily bars of the four major forex pairs; EURUSD, GBPUSD, USDCHF and USDYPY.

The results of running the above are quite interesting. The first surprise is that the currency strength indicator and features derived from it were not included in the optimal model for any of the four tested pairs. Secondly, for all pairs, a scaled version of a 20 bar price momentum function, and derived features, was included in the optimal model. Finally, again for all pairs, there was a symmetrically decreasing lookback period across the selected features, and when averaged across all pairs the following pattern results: 10 3 3 2 1 3 3 2 1, which is to be read as:
• 10 nodes (plus a bias node) in the hidden layer
• lookback length of 3 for the scaled values of the SMA20 and the 20 bar scaled momentum function
• lookback length of 3 for the slopes/rates of change of the above
• lookback length of 2 for the "accelerations" of the above
• lookback length of 1 for the "jerks" of the above
So it would seem that the 20 bar momentum function is a better exogenous input than the currency strength indicator. The symmetry across features is quite pleasing, and the selection of these "physical motion" features across all the tested pairs tends to confirm their validity. The fact that the currency strength indicator was not selected does not mean that this indicator is of no value, but perhaps it should not be used for regression purposes, but rather as a filter. More in due course.

Tuesday, 7 February 2017

Update on Currency Strength Smoothing, and a new direction?

Since my last two posts ( currency strength indicator and preliminary tests thereof ) I have been experimenting with different ways of smoothing the indicators without introducing lag, mostly along the lines of using an oscillator leading signal plus various schemes to smooth and compensate for introduced attenuation and making heavy use of my particle swarm optimisation code. Unfortunately I haven't found anything that really works to my satisfaction and so I have decided to forgo any further attempts at this and just use the indicator in its unsmoothed form as neural net input.

In the process of doing the above work I decided that my particle swarm routine wasn't fast enough and I started using the BayesOpt optimisation library, which is written in C++ and has an interface to Octave. Doing this has greatly decreased the time I've had to spend in my various optimisation routines and the framework provided by the BayesOpt library will enable more ambitious optimisations in the future.

Another discovery for me was this Predicting Stock Market Prices with Physical Laws paper, which has some really useful ideas for neural net input features. In particular I think the idea of combining position, velocity and acceleration with the ideas contained in an earlier post of mine on Savitzky Golay filter convolution and using the currency strength indicators as proxies for the arbitrary sine and cosine waves function posited in the paper hold some promise. More in due course.

Tuesday, 8 November 2016

Preliminary Tests of Currency Strength Indicator

Since my last post on the currency strength indicator I have been conducting a series of basic randomisation tests to see if the indicator has better than random predictive ability. The first test was a random permutation test, as described in Aronson's Evidence Based Technical Analysis book, the code for which I have previously posted on my Data Snooping Tests Github page. These results were all disappointing in that the null hypothesis of no predictive ability cannot be rejected. However, looking at a typical chart ( repeated from the previous post but colour coded for signals )
it can be seen that there are a lot of green ( no signal ) bars which, during the randomisation test, can be selected and give equal or greater returns than the signal bars ( blue for longs, red for shorts ). The relative sparsity of the signal bars compared to non-signal bars gives the permutation test, in this instance, low power to detect significance, although I am not able to show that this is actually true in this case.

In the light of the above I decided to conduct a different test, the .m code for which is shown below.
clear all ;

all_random_entry_distribution_results = zeros( 21 , 3 ) ;

tic();

for ii = 1 : 21

clear -x ii all_strengths_quad_smooth_21 all_random_entry_distribution_results ;

if ii == 1
mid_price = ( audcad_daily_bars( : , 3 ) .+ audcad_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 6 ; term_ix = 7 ;
end

if ii == 2
mid_price = ( audchf_daily_bars( : , 3 ) .+ audchf_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 6 ; term_ix = 4 ;
end

if ii == 3
mid_price = ( audjpy_daily_bars( : , 3 ) .+ audjpy_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 6 ; term_ix = 5 ;
end

if ii == 4
mid_price = ( audusd_daily_bars( : , 3 ) .+ audusd_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 6 ; term_ix = 1 ;
end

if ii == 5
mid_price = ( cadchf_daily_bars( : , 3 ) .+ cadchf_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 7 ; term_ix = 4 ;
end

if ii == 6
mid_price = ( cadjpy_daily_bars( : , 3 ) .+ cadjpy_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 7 ; term_ix = 5 ;
end

if ii == 7
mid_price = ( chfjpy_daily_bars( : , 3 ) .+ chfjpy_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 4 ; term_ix = 5 ;
end

if ii == 8
mid_price = ( euraud_daily_bars( : , 3 ) .+ euraud_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 2 ; term_ix = 6 ;
end

if ii == 9
mid_price = ( eurcad_daily_bars( : , 3 ) .+ eurcad_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 2 ; term_ix = 7 ;
end

if ii == 10
mid_price = ( eurchf_daily_bars( : , 3 ) .+ eurchf_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 2 ; term_ix = 4 ;
end

if ii == 11
mid_price = ( eurgbp_daily_bars( : , 3 ) .+ eurgbp_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 2 ; term_ix = 3 ;
end

if ii == 12
mid_price = ( eurjpy_daily_bars( : , 3 ) .+ eurjpy_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 2 ; term_ix = 5 ;
end

if ii == 13
mid_price = ( eurusd_daily_bars( : , 3 ) .+ eurusd_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 2 ; term_ix = 1 ;
end

if ii == 14
mid_price = ( gbpaud_daily_bars( : , 3 ) .+ gbpaud_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 3 ; term_ix = 6 ;
end

if ii == 15
mid_price = ( gbpcad_daily_bars( : , 3 ) .+ gbpcad_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 3 ; term_ix = 7 ;
end

if ii == 16
mid_price = ( gbpchf_daily_bars( : , 3 ) .+ gbpchf_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 3 ; term_ix = 4 ;
end

if ii == 17
mid_price = ( gbpjpy_daily_bars( : , 3 ) .+ gbpjpy_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 3 ; term_ix = 5 ;
end

if ii == 18
mid_price = ( gbpusd_daily_bars( : , 3 ) .+ gbpusd_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 3 ; term_ix = 1 ;
end

if ii == 19
mid_price = ( usdcad_daily_bars( : , 3 ) .+ usdcad_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 1 ; term_ix = 7 ;
end

if ii == 20
mid_price = ( usdchf_daily_bars( : , 3 ) .+ usdchf_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 1 ; term_ix = 4 ;
end

if ii == 21
mid_price = ( usdjpy_daily_bars( : , 3 ) .+ usdjpy_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 1 ; term_ix = 5 ;
end

% the returns vectors suitably alligned with position vector
mid_price_rets = shift( mid_price_rets , -1 ) ;
sma2 = sma( mid_price_rets , 2 ) ; sma2_rets = shift( sma2 , -2 ) ; sma3 = sma( mid_price_rets , 3 ) ; sma3_rets = shift( sma3 , -3 ) ;
all_rets = [ mid_price_rets , sma2_rets , sma3_rets ] ;

% delete burn in and 2016 data ( 2016 reserved for out of sample testing )
all_rets( 7547 : end , : ) = [] ; all_rets( 1 : 50 , : ) = [] ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% simple divergence strategy - be long the uptrending and short the downtrending currency. Uptrends and downtrends determined by crossovers
% of the strengths and their respective smooths
smooth_base = smooth_2_5( all_strengths_quad_smooth_21(:,base_ix) ) ; smooth_term = smooth_2_5( all_strengths_quad_smooth_21(:,term_ix) ) ;
test_matrix = ( all_strengths_quad_smooth_21(:,base_ix) > smooth_base ) .* ( all_strengths_quad_smooth_21(:,term_ix) &lt; smooth_term) ; % +1 for longs
short_vec = ( all_strengths_quad_smooth_21(:,base_ix) &lt; smooth_base ) .* ( all_strengths_quad_smooth_21(:,term_ix) > smooth_term) ; short_vec = find( short_vec ) ;
test_matrix( short_vec ) = -1 ; % -1 for shorts
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% delete burn in and 2016 data
test_matrix( 7547 : end , : ) = [] ; test_matrix( 1 : 50 , : ) = [] ;
[ ix , jx , test_matrix_values ] = find( test_matrix ) ;
no_of_signals = length( test_matrix_values ) ;

% the actual returns performance
real_results = mean( repmat( test_matrix_values , 1 , size( all_rets , 2 ) ) .* all_rets( ix , : ) ) ;

% set up for randomisation test
iters = 5000 ;
imax = size( test_matrix , 1 ) ;
rand_results_distribution_matrix = zeros( iters , size( real_results , 2 ) ) ;

for jj = 1 : iters
rand_idx = randi( imax , no_of_signals , 1 ) ;
rand_results_distribution_matrix( jj , : ) = mean( test_matrix_values .* all_rets( rand_idx , : ) ) ;
endfor

all_random_entry_distribution_results( ii , : ) = ( real_results .- mean( rand_results_distribution_matrix ) ) ./ ...
( 2 .* std( rand_results_distribution_matrix ) ) ;

endfor % end of ii loop

toc()

save -ascii all_random_entry_distribution_results all_random_entry_distribution_results ;

plot(all_random_entry_distribution_results(:,1),'k','linewidth',2,all_random_entry_distribution_results(:,2),'b','linewidth',2,...
all_random_entry_distribution_results(:,3),'r','linewidth',2) ; legend('1 day','2 day','3 day');
What the code basically does is construct null hypothesis distributions of 1, 2 and 3 day returns of n random entries, where n is the same number of signal bars -1 or +1 as the currency strength indicator signal. The signal returns are then plotted as a line chart of the distance between random return means and signal return means normalised by 2x the random return standard deviations. In this way values >1 approximately correspond to p values < 0.05. Two typical charts are shown below

The first chart shows the results of the unsmoothed currency strength indicator and the second the smoothed version. From this I surmise that the delay introduced by the smoothing is/will be detrimental to performance and so for the nearest future I shall be working on improving the smoothing algorithm used in the indicator calculations.

Thursday, 27 October 2016

Currency Strength Indicator

Over the last few weeks I have been looking into creating a currency strength indicator as input to a Nonlinear autoregressive exogenous model. This has involved a fair bit of online research and I have to say that compared to other technical analysis indicators there seems to be a paucity of pages devoted to the methodology of creating such an indicator. Apart from the above linked Wikipedia page I was only really able to find some discussion threads on some forex forums, mostly devoted to the Metatrader platform, and a few of the more enlightening threads are here, here and here. Another website I found, although not exactly what I was looking for, is marketsmadeclear.com and in particular their Currency Strength Matrix.

In the end I decided to create my own relative currency strength indicator, based on the RSI, by making the length of the indicator adaptive to the measured dominant cycle. The optimal theoretical length for the RSI is half the cycle period, and the price series are smoothed in a [ 1 2 2 1 ] FIR filter prior to the RSI calculations. I used the code in my earlier post to calculate the dominant cycle period, the reason being that since I wrote that post I have watched/listened to a podcast in which John Ehlers recommended using this calculation method for dominant cycle measurement.

The screenshots below are of the currency strength indicator applied to approx. 200 daily bars of the EURUSD forex pair; first the price chart,
next, the indicator,
and finally an oscillator derived from the two separate currency strength lines.
I think the utility of this indicator is quite obvious from these simple charts. Crossovers of the strength lines ( or equivalently, zero line crossings of the oscillator ) clearly indicate major directional changes, and additionally changes in the slope of the oscillator provide an early warning of impending price direction changes.

I will now start to test this indicator and write about these tests and results in due course.

Thursday, 15 September 2016

In my last post I said I was going to look at data wrangling my data, and this post outlines what I have done since then.

My problem was that I have numerous csv files containing historical data with different date formats and frequency, e.g. tick level and hourly and daily OHLC, and in the past I have always struggled with this. However, I have finally found a solution using the R quantmod package, which makes it easy to change data into a lower frequency. It took me some time to finally get what I wanted but the code box below shows the relevant R code to convert hourly OHLC, contained in one .csv file, to daily OHLC which is then written to a new .csv file.
library("quantmod", lib.loc="~/R/x86_64-pc-linux-gnu-library/3.3")
price_data = xts( price_data[,2:6] , order.by = as.Date.POSIXlt( strptime( price_data[,1] , format = "%d/%m/%y %H:%M" , tz = "" ) ) )
price_data_daily = to.daily( price_data , drop.time = TRUE )
write.zoo( price_data_daily , file = "path/to/new/file.csv" , sep = "," , row.names = FALSE , col.names = FALSE )
To finally achieve such a small snippet of working code I can't believe how much time I had to spend reading documentation and looking online.

This next code box shows Octave code to load the above written .csv file into Octave
fid = fopen( 'path/to/file' , 'rt' ) ;
data = textscan( fid , '%s %f %f %f %f' , 'Delimiter' , ',' , 'CollectOutput', 1 ) ;
fclose( fid ) ;
eurusd = [ datenum( data{1} , 'yyyy-mm-dd' ) data{2} ] ;
clear data fid
Hopefully, in both cases, manipulating the format strings "%d/%m/%y %H:%M" and 'yyyy-mm-dd' in these two respective code snippets will save you the hours I spent.

Useful links that helped me are:

Saturday, 3 September 2016

Possible Addition of NARX Network to Conditional Restricted Boltzmann Machine

It has been over three months since my last post, due to working away from home for some of the summer, a summer holiday and moving home. However, during this time I have continued with my online reading and some new thinking about my conditional restricted boltzmann machine based trading system has developed, namely the use of a nonlinear autoregressive exogenous model in the bottom layer gaussian units of the CRBM. Some links to reading on the same are shown below.
The exogenous time series I am thinking of using, at least for the major forex pairs and perhaps precious metals, oil and US treasuries, is a currency strength indicator based on the US dollar. In order to create the currency strength indicator I will have to delve into some data wrangling with the historical forex data I have, and this will be the subject of my next post.

Monday, 16 May 2016

Giving Up on Recursive Sine Formula for Period Calculation

I have spent the last few weeks trying to get my recursive sine wave formula for period calculations to work, but try as I might I can only get it to do so under ideal theoretical conditions. Once any significant noise, trend or combination thereof is introduced the calculations explode and give meaningless results. In light of this, I am no longer going to continue this work.

Apart from the above work I have also been doing my usual online research and have come across John Ehler's autocorrelation periodogram for period measurement, and below is my Octave C++ .oct implementation of it.
DEFUN_DLD ( autocorrelation_periodogram, args, nargout,
"-*- texinfo -*-\n\
@deftypefn {Function File} {} autocorrelation_periodogram (@var{input_vector})\n\
This function takes an input vector ( price ) and outputs the dominant cycle period,\n\
calculated from the autocorrelation periodogram spectrum.\n\
@end deftypefn" )

{
octave_value_list retval_list ;
int nargin = args.length () ;

// check the input arguments
if ( nargin != 1 ) // there must be a price vector only
{
error ("Invalid arguments. Input is a price vector only.") ;
return retval_list ;
}

if ( args(0).length () < 4 )
{
error ("Invalid argument length. Input is a price vector of length >= 4.") ;
return retval_list ;
}

if ( error_state )
{
error ("Invalid argument. Input is a price vector of length >= 4.") ;
return retval_list ;
}
// end of input checking

ColumnVector input = args(0).column_vector_value () ;
ColumnVector hp = args(0).column_vector_value () ; hp.fill( 0.0 ) ;
ColumnVector smooth = args(0).column_vector_value () ; smooth.fill( 0.0 ) ;
ColumnVector corr ( 49 ) ; corr.fill( 0.0 ) ;
ColumnVector cosine_part ( 49 ) ; cosine_part.fill( 0.0 ) ;
ColumnVector sine_part ( 49 ) ; sine_part.fill( 0.0 ) ;
ColumnVector sq_sum ( 49 ) ; sq_sum.fill( 0.0 ) ;
ColumnVector R1 ( 49 ) ; R1.fill( 0.0 ) ;
ColumnVector R2 ( 49 ) ; R2.fill( 0.0 ) ;
ColumnVector pwr ( 49 ) ; pwr.fill( 0.0 ) ;
ColumnVector dominant_cycle = args(0).column_vector_value () ; dominant_cycle.fill( 0.0 ) ;

double avglength = 3.0 ;
double M ;
double X ; double Y ;
double Sx ; double Sy ; double Sxx ; double Syy ; double Sxy ;
double denom ;
double max_pwr = 0.0 ;
double Spx ; double Sp ;

// variables for highpass filter, hard coded for a high cutoff period of 48 bars and low cutoff of 10 bars
double high_cutoff = 48.0 ; double low_cutoff = 10.0 ;
double alpha_1 = ( cos( 0.707 * 2.0 * PI / high_cutoff ) + sin( 0.707 * 2.0 * PI / high_cutoff ) - 1.0 ) / cos( 0.707 * 2.0 * PI / high_cutoff ) ;
double beta_1 = ( 1.0 - alpha_1 / 2.0 ) * ( 1.0 - alpha_1 / 2.0 ) ;
double beta_2 = 2.0 * ( 1.0 - alpha_1 ) ;
double beta_3 = ( 1.0 - alpha_1 ) * ( 1.0 - alpha_1 ) ;

// variables for super smoother
double a1 = exp( -1.414 * PI / low_cutoff ) ;
double b1 = 2.0 * a1 * cos( 1.414 * PI / low_cutoff ) ;
double c2 = b1 ;
double c3 = -a1 * a1 ;
double c1 = 1.0 - c2 - c3 ;

// calculate the automatic gain control factor, K
double K = 0.0 ;
double accSlope = -1.5 ; //acceptableSlope = 1.5 dB
double halfLC = low_cutoff / 2.0 ;
double halfHC = high_cutoff / 2.0 ;
double ratio = pow( 10 , accSlope / 20.0 ) ;

if( halfHC - halfLC > 0.0 )
{
K = pow( ratio , 1.0 / ( halfHC - halfLC ) ) ;
}

// loop to initialise hp and smooth
for ( octave_idx_type ii ( 2 ) ; ii < 49 ; ii++ ) // main loop
{
// highpass filter components whose periods are < 48 bars
hp(ii) = beta_1 * ( input(ii) - 2.0 * input(ii-1) + input(ii-2) ) + beta_2 * hp(ii-1) - beta_3 * hp(ii-2) ;

// smooth with a super smoother filter
smooth(ii) = c1 * ( hp(ii) + hp(ii-1) ) / 2.0 + c2 * smooth(ii-1) + c3 * smooth(ii-2) ;
} // end of initial loop

for ( octave_idx_type ii ( 49 ) ; ii < args(0).length () ; ii++ ) // main loop
{
// highpass filter components whose periods are < 48 bars
hp(ii) = beta_1 * ( input(ii) - 2.0 * input(ii-1) + input(ii-2) ) + beta_2 * hp(ii-1) - beta_3 * hp(ii-2) ;

// smooth with a super smoother filter
smooth(ii) = c1 * ( hp(ii) + hp(ii-1) ) / 2.0 + c2 * smooth(ii-1) + c3 * smooth(ii-2) ;

// Pearson correlation for each value of lag
for ( octave_idx_type lag (0) ; lag <= high_cutoff ; lag++ )
{
// set the averaging length as M
M = avglength ;
if ( avglength == 0)
{
M = double( lag ) ;
}

Sx = 0.0 ; Sy = 0.0 ; Sxx = 0.0 ; Syy = 0.0 ; Sxy = 0.0 ;

for ( octave_idx_type count (0) ; count < M - 1 ; count++ )
{
X = smooth(ii-count) ; Y = smooth(ii-(lag+count)) ;
Sx += X ;
Sy += Y ;
Sxx += X * X ;
Sxy += X * Y ;
Syy += Y * Y ;
}

denom = ( M * Sxx - Sx * Sx ) * ( M * Syy - Sy * Sy ) ;
if ( denom > 0.0 )
{
corr(lag) = ( M * Sxy - Sx * Sy ) / sqrt( denom ) ;
}

} // end of Pearson correlation loop
/*
The DFT is accomplished by correlating the autocorrelation at each value of lag with the cosine and sine of each period of interest.
The sum of the squares of each of these values represents the relative power at each period.
*/
for ( octave_idx_type period (low_cutoff) ; period <= high_cutoff ; period++ )
{
cosine_part( period ) = 0.0 ; sine_part( period ) = 0.0 ;

for ( octave_idx_type N (3) ; N <= high_cutoff ; N++ )
{
cosine_part( period ) += corr( N ) * cos( 2.0 * PI * double( N ) / double( period ) ) ;
sine_part( period ) += corr( N ) * sin( 2.0 * PI * double( N ) / double( period ) ) ;
} // end of N loop

sq_sum( period ) = cosine_part( period ) * cosine_part( period ) + sine_part( period ) * sine_part( period ) ;

} // end of first period loop

// EMA is used to smooth the power measurement at each period
for ( octave_idx_type period (low_cutoff) ; period <= high_cutoff ; period++ )
{
R2( period ) = R1( period ) ;
R1( period ) = 0.2 * sq_sum( period ) * sq_sum( period ) + 0.8 * R2( period ) ;
} // end of second period loop

// Find maximum power level for normalisation
max_pwr = 0.0 ;

for ( octave_idx_type period (low_cutoff) ; period <= high_cutoff ; period++ )
{
if ( R1( period ) > max_pwr )
{
max_pwr = K * R1( period ) ;
}
} // end of third period loop

// normalisation of power
for ( octave_idx_type period (low_cutoff) ; period <= high_cutoff ; period++ )
{
pwr( period ) = R1( period ) / max_pwr ;
} // end of fourth period loop

// compute the dominant cycle using the centre of gravity of the spectrum
Spx = 0.0 ; Sp = 0.0 ;

for ( octave_idx_type period (low_cutoff) ; period <= high_cutoff ; period++ )
{
if ( pwr( period ) >= 0.5 )
{
Spx += double( period ) * pwr( period ) ;
Sp += pwr( period ) ;
}
} // end of fifth period loop

if ( Sp != 0.0 )
{
dominant_cycle(ii) = Spx / Sp ;
}

} // end of main loop

retval_list( 0 ) = dominant_cycle ;

return retval_list ;

} // end of function 
When applied directly to a theoretical but noisy sine wave series with a trend I find that this autocorrelation method performs better than my current period measurement algo, but on detrended data it is not as good. Since it is trivial to detrend price data, for now I am going to stick with my current method.