 # How to Do Egen (stata cmd) in R

How to do egen (stata cmd) in R: egen(stata cmd) compute a summary statistics by groups and store it in to a new variable. For example, the data has three variables, id, time and y, we want to compute the mean of y by for each id and then store it as a new variable mean_y.

In stata, the command would be

egen mean_y = mean(y), by(id)

In R, this task can be completed by `ave`

Generate dataset:

 1 2 3 4 `id <- ``rep``(1:3,each=3)` `t<-``rep``(1:3,3)` `y<-``sample``(1:5,9,replace=T)` `my_data<-``data.frame``(id=id,time=t,y=y)`

Orignal data:

 1 2 3 4 5 6 7 8 9 10 11 `> my_data` `  ``id time y` `1  1    1 4` `2  1    2 1` `3  1    3 4` `4  2    1 2` `5  2    2 3` `6  2    3 3` `7  3    1 4` `8  3    2 4` `9  3    3 3`
 1 2 3 4 5 6 7 8 9 10 11 `> ``within``(my_data, {mean_y = ``ave``(y,id)} )` `  ``id time y   mean_y` `1  1    1 4 3.000000` `2  1    2 1 3.000000` `3  1    3 4 3.000000` `4  2    1 2 2.666667` `5  2    2 3 2.666667` `6  2    3 3 2.666667` `7  3    1 4 3.666667` `8  3    2 4 3.666667` `9  3    3 3 3.666667`

The default summary statistics is `mean`. However, we can assign a particular function to compute the summary statistics. For example, if we want to compute the sd of y by id, then we can have

 1 2 3 4 5 6 7 8 9 10 11 `within``(my_data, {sd_y = ``ave``(y,id,FUN=sd)} )` `  ``id time y      sd_y` `1  1    1 4 1.7320508` `2  1    2 1 1.7320508` `3  1    3 4 1.7320508` `4  2    1 2 0.5773503` `5  2    2 3 0.5773503` `6  2    3 3 0.5773503` `7  3    1 4 0.5773503` `8  3    2 4 0.5773503` `9  3    3 3 0.5773503`

Remark: The `within` evaluate an expression in an environment created from the data.frame. In addition, it will modify the data.frame and return it back(in our case, it create new variables, mean_y or sd_y )

TszKin Julian Chan