Sparse B Spline Basis Matrix In R

1 minute read

This is slightly old news now, but as of R 2.15.1, the functions spline.des and splineDesign which construct design matrices for sets of b-spline basis functions, now have the option of outputing such design matrices in sparse format, and constructing them more efficiently. The results are in the compressed column oriented format (class dgCMatrix) used by the Matrix package. This is natural, since b-splines are non-zero over only a small interval and in many applications are used in large numbers, requiring the construction of large model matrices.

You’ll need to fire up the splines and Matrix packages first.

library(splines)
library(Matrix)

Then generate lots of data, say 100000 standard normal draws, and then use this simulated data to construct dense and sparse matrices.

x<-rnorm(100000)
Bdense<-spline.des(x, knots = seq(min(x),  max(x), length.out = 100), outer.ok = T)$design
Bsparse<-spline.des(x, knots = seq(min(x),  max(x), length.out = 100), outer.ok = T, sparse = T)$design

There is a speed up in constructing the above,

system.time(spline.des(x, knots = seq(min(x),  max(x), length.out = 100), outer.ok = T))
system.time(spline.des(x, knots = seq(min(x),  max(x), length.out = 100), outer.ok = T, sparse = T))

But the main thing is how much faster things are from here on with Bsparse. Suppose we wish to calculate

t(Bdense) %*% Bdense

Well this is faster either way using crossprod, and blazingly fast for the sparse version:

system.time(t(Bdense) %*% Bdense)
system.time(crossprod(Bdense))
system.time(crossprod(Bsparse))

So far so good. The dense output takes up 73.2Mb compared to only 4.6Mb in the sparse version in this admittedly contrived example (such examples definitely can and do arise, particularly varying coefficient models where many basis functions are needed to describe a parameter’s smooth trajectory as another variable changes, such as with time for example).

object.size(Bdense)/1024^2
object.size(Bsparse)/1024^2

Something that I’ve found handy is the ability to swap between different sparse matrix formats in R. For example, if I’d like to manipulate Bsparse using the spam package, which has some very useful tools for working with Gaussian Markov random fields, the summary function can be used on Matrix objects to extract the triplet form, which can then easily be used to construct a spam object:

library(spam)
trip.Matrix<-summary(Bsparse)
Bsparse.spam<-spam(list(i = trip.Matrix[,1], j = trip.Matrix[,2], trip.Matrix[,3]),nrow = nrow(Bsparse), ncol = ncol(Bsparse))
class(Bsparse.spam)
all.equal(as.matrix(Bsparse.spam), as.matrix(Bsparse), check.attributes = F)

Updated: