“Computing for Data Analysis” with R on coursera

Just stumbled on across a course on coursera titled “Computing for Data Analysis” taught by Roger D. Peng the Johns Hopkins Bloomberg School of Public Health.

Here is the description of the course.

In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment, discuss generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, creating informative data graphics, accessing R packages, creating R packages with documentation, writing R functions, debugging, and organizing and commenting R code. Topics in statistical data analysis and optimization will provide working examples.

I just signed up for it! This course looks like a great opportunity to sharpen  skills in R and learn new things.


Alternative to Monte Carlo Testing

When we backtest a strategy on a portfolio, it is a simple analysis of a single period in time. There are ways to “stress test” a strategy such as monte carlo, random portfolios, or shuffling the returns in a random order. I could never really wrap my head around monte carlo and shuffling the returns seemed to be a better approach because the actual returns of the backtest are used, but it misses one important thing… the impact of consecutive periods of returns. If we are backtesting a strategy and we want to minimize max drawdown, consecutive down periods have a significant impact on max drawdown. If, for example, the max drawdown occured due to 4 consecutive months during 2008, we wan’t to keep those 4 months together when shuffling returns.

In my opinion, a better way to shuffle returns is to shuffle “blocks” of returns. This is nothing new, the TradingBlox software does monte carlo analysis this way. I had a look at the boot package and tseries package for their boot functions, but it was not giving me what I wanted. I wanted to visual a number of equity curves with blocks of returns randomly shuffled.

To accomplish this in R, I wrote two functions.  The shuffle_returns function takes an xts object of returns, the number of samples to run (i.e. how many equity curves to generate), and a number for how many periods of returns makes up a ‘block’ as arguments.The ran_gen function function is a function within the shuffle_returns function that is used to generate random blocks of returns.

shuffle_returns returns an xts object with the random blocks of returns so we can do further analysis such as max drawdown, plotting, or pretty much anything in the PerformanceAnalytics package that takes an xts object as an argument.

This is not a perfect implementation of this idea, so if anybody knows of a better way I’d be glad to hear from you.

The example below uses sample data from edhec and generates 100 equity curves with blocks of 5 consecutive period of returns.

R code:


#Function that grabs a random number and then repeats that number r times
ran_gen <- function(x, r){
  #x is an xts object of asset returns
  #r is for how many consecutive returns make up a 'block'
  vec <- c()
  total_length <- length(x)
  n <- total_length/r
  for(i in 1:n){
    vec <- append(vec,c(rep(sample(1:(n*100),1), r)))
  diff <- as.integer(total_length - length(vec))
  vec <- append(vec, c(rep(sample(1:(n*100),1), len = diff)))

shuffle_returns <- function(x, n, r){
  #x is an xts object of asset returns
  #n is the number of samples to run
  #r is for how many consecutive returns make up a 'block' and is passed to ran_gen

  mat <- matrix(data = x, nrow = length(x))
  for(i in 1:n){
    temp_random <- ran_gen(x = x, r = r)
    temp_mat <- as.matrix(cbind(x, temp_random))
    temp_mat <- temp_mat[order(temp_mat[,2]),]
    temp_ret_mat <- matrix(data = temp_mat[,1])
    mat <- cbind(mat, temp_ret_mat)
  final_xts <- xts(mat, index(x))

#get sample data
a <- edhec[,1]
a <- head(a, -1)

start <- Sys.time()
yy <- shuffle_returns(a, 100, 5)
chart.CumReturns(yy[,1:NCOL(yy)], wealth.index = TRUE, ylab = "Equity", main ="Generated Equity Curve of Shuffled Returns")
end <- Sys.time()

Created by Pretty R at inside-R.org