Home > Custom Function > Generating a lag/lead variables

## Generating a lag/lead variables

A few days ago, my friend asked me is there any function in R to generate lag/lead variables in a data.frame or did similar thing as _n in stata. He would like to use that to clean-up his dataset in R.

In stata help manual: _n contains the number of the current observation.
Here’s an example to illustrate what _n does:

set obs 10
generate x = _n
generate x_lag1 = x[_n-1]
generate x_lead1 = x[_n+1]

The data generated would be :
x = {1,2,3,4,5,6,7,8,9,10}
x_lag1 = {NA,1,2,3,4,5,6,7,8,9}

The key feature is the new vector has the same length as the original vector, so we can use it with the original vector or other generated vector.

One application is to create a MA series (just an example, it is better to use function in any time-series packages to do that)
generate x_ma_1 = (x[_n-1] + x[_n]) / 2

I googled a while for that, basically there’re two types of method to generate lag/lead variables in R:(reference)

1> Function that generate a shorter vector (e.g. embed(), running() in gtools
2> Function in ts, zoo, xts, dynlm,dlm.

However, both solutions do not solve his problem. Then I wrote a “shift” function to do the task:

```shift<-function(x,shift_by){
stopifnot(is.numeric(shift_by))
stopifnot(is.numeric(x))

if (length(shift_by)>1)
return(sapply(shift_by,shift, x=x))

out<-NULL
abs_shift_by=abs(shift_by)
if (shift_by > 0 )
out<-c(tail(x,-abs_shift_by),rep(NA,abs_shift_by))
else if (shift_by < 0 )
else
out<-x
out
}
```
```# Example
d<-data.frame(x=1:15)
#generate lag variable
d\$df_lag2<-shift(d\$x,-2)

> d
1   1        3      NA
2   2        4      NA
3   3        5       1
4   4        6       2
5   5        7       3
6   6        8       4
7   7        9       5
8   8       10       6
9   9       NA       7
10 10       NA       8

# shift_by is vectorized
[,1] [,2] [,3] [,4] [,5]
[1,]   NA   NA    1    2    3
[2,]   NA    1    2    3    4
[3,]    1    2    3    4    5
[4,]    2    3    4    5    6
[5,]    3    4    5    6    7
[6,]    4    5    6    7    8
[7,]    5    6    7    8    9
[8,]    6    7    8    9   10
[9,]    7    8    9   10   NA
[10,]    8    9   10   NA   NA
```
```# Test
library(testthat)
expect_that(shift(1:10,2),is_identical_to(c(3:10,NA,NA)))
expect_that(shift(1:10,-2), is_identical_to(c(NA,NA,1:8)))
expect_that(shift(1:10,0), is_identical_to(1:10))
expect_that(shift(1:10,0), is_identical_to(1:10))
expect_that(shift(1:10,1:2), is_identical_to(cbind(c(2:10,NA),c(3:10,NA,NA))))
```

Notice that the result depends on how the data.frame is sorted.

Categories: Custom Function
1. March 12, 2012 at 12:39

Coming to R from Stata, I was really disappointed at how difficult it was to include lags/leads in ad-hoc analysis. I’m sure would be of use to a great many R beginners. IMO, I’d like to see something like this in r-core. I’d personally prefer if it were two functions though. One named lag() and another lead() since I find the negative number in the second parameter a little strange.

That said, excellent work. It even works put right into a regression so one can quickly test the relationship between y, x, and a lag of x with lm(y ~ x + shift(x,-1)).

2. March 12, 2012 at 17:30

lag = 0) {
return(c(rep(NA, k), x[1:(length(x) – k)]))
} else {
return(c(x[(1 – k):length(x)], rep(NA, -k)))
}
}

3. March 12, 2012 at 19:22

@ Andrew:

Same as you i also came from stata. My feeling is that stata is more convenient than R in some operation(like cleaning economics data). This is because stata is deigned for those purposes and R serve a more general purpose. I think R can do as good as stata if there’s a package to do those.

Another reason is that stata has only one data.frame. Every command you typed would apply to that data.frame directly, you dont have to worry about anything else. If you only have one data.frame, then stata’s environment is very good for that. However, it will drive you crazy if you want to perform something more complicated(just imagine your data contain several data.frame.)

However, R store everything in object. it may be hard for people without programming background to understand what is going on. It would be a bit more difficult to do simple task, but much easier to do complicated task.

I think it’s fine that the r-core do not have those features, but it would be good if we could have a package that implement similar features in stata. I would like to try that if I have more idea and time in future.

last, if you want to have lag and lead separately, you can create:
lag <- function(x,lag) { shift(x,-lag) }

4. March 14, 2012 at 07:49

A way to do this with built-in R functions lag() and ts.union():

# create a time series variable
y1 <- ts(1:10)

# create lead variable
y1.lead <- lag(y1, k=2)

# create lag variable
y1.lag <- lag(y1, k=-2)

# combine the time series variables
Time Series:
Start = -1
End = 12
Frequency = 1
-1 NA 1 NA
0 NA 2 NA
1 1 3 NA
2 2 4 NA
3 3 5 1
4 4 6 2
5 5 7 3
6 6 8 4
7 7 9 5
8 8 10 6
9 9 NA 7
10 10 NA 8
11 NA NA 9
12 NA NA 10

5. February 11, 2013 at 03:59

Magnificent web site. Plenty of useful information here. I am sending it to some friends ans also sharing in delicious. And certainly, thanks for your sweat!

• February 13, 2013 at 20:11

Thank you!

6. February 14, 2013 at 10:41

Hello,

I read your blog only after I had found may way of lagging variables in a panel. To share with other uses who have panel data they want to lag below my solution.

In particular I have panel data (‘year’ giving the time dimension and ‘ID’ the cross section).
The variable ‘myvariable’ is the one I want to lag. All Variables are saved in the data.frame ‘mydata’.

# first construct an index
sort1<- paste(mydata\$year, mydata\$ID) # ID can be a character, year must be numeric
sort2<- paste(maydata\$year -1, mydata\$ID)
index_lag<-match(sort2, sort1)
rm(sort1, sort2) # we don#t need them anymore

mydata\$myvariable.Lag <- mydata\$myvariable – mydata\$myvariable[index_lag]
rm(index_lag)

7. August 22, 2013 at 02:18

thank you!! it’s really helpful

8. October 2, 2013 at 23:01

TszKin Julian,

Great post and your blog is very professional, congrats.

I invite you to my blogs.

Regards, Sergio

9. October 2, 2013 at 23:02

TszKin Julian,

Great post, very helpfull.

Regards, Sergio

1. No trackbacks yet.