在R中为for loop, apply loop增加进度条(progress bar) 2

参考:http://ryouready.wordpress.com/2009/03/16/r-monitor-function-progress-with-a-progress-bar/
http://ryouready.wordpress.com/2010/01/11/progress-bars-in-r-part-ii-a-wrapper-for-apply-functions/

最简单的,for循环,其中需要用到txtProgressBar,使用它,需要先初始化一个进度条,然后在程序中update它,最后关闭进度条。

total <- 20
# create progress bar
pb <- txtProgressBar(min = 0, max = total, style = 3)
for(i in 1:total){
   Sys.sleep(0.1)
   # update progress bar
   setTxtProgressBar(pb, i)
}
close(pb)

但是我们都知道,在R中,for loop是不被鼓励的,因为它的执行效率非常低。大家习惯使用apply, sapply, lapply或者mapply。在R中,Peter Solymos写了一个轻量化的在apply时显示进度条的包,但它的最大缺点是会有一点点耗时。但是如果单个循环计算量大时,它的耗时又不是特别明显。

> l <- sapply(1:20000, function(x) list(rnorm(1000)))
> system.time(sapply(l, mean))
   user  system elapsed 
  0.311   0.003   0.313 
> library(pbapply)
> system.time(pbsapply(l, mean))
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
   user  system elapsed 
  1.611   0.012   1.675 
> df <- data.frame(rnorm(90000), rnorm(90000))
> system.time(apply(df, 1, sd))
   user  system elapsed 
  2.787   0.015   2.830 
> system.time(pbapply(df, 1, sd))
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
   user  system elapsed 
  8.592   0.060   8.861

但是,遗憾的是,它没有写一个可以用于mapply的进度条。下面,我们就试着写一个可以用于mapply的进度条。

##### created by Jianhong Ou @UMMSMED @oct, 2013 ########
pbmapply <- function(FUN, ..., MoreArgs=NULL, SIMPLIFY=TRUE, USE.NAMES=TRUE){
    env <- environment() 
    dots <- list(...)
    pb_Total <- length(dots[[1]])
    if(!(interactive() && pb_Total>=1))
        return(mapply(FUN, ..., MoreArgs=NULL, SIMPLIFY=TRUE, USE.NAMES=TRUE))
 
    counter <- 0
    pb <- txtProgressBar(min=0, max=pb_Total, style=3)
    on.exit(close(pb))
    ##wrapper around FUN
    wrapper <- function(...){
        curVal <- get("counter", envir=env) + 1
        assign("counter", curVal, envir=env)
        setTxtProgressBar(get("pb", envir=env), curVal)
        FUN(...)
    }
    mapply(wrapper, ..., MoreArgs=NULL, SIMPLIFY=TRUE, USE.NAMES=TRUE)
}

这时,我们再运行pbmapply,

> system.time(mapply(mean, l))
   user  system elapsed 
  0.512   0.027   0.537 
> system.time(pbmapply(mean, l))
  |===============================================| 100%
   user  system elapsed 
  1.884   0.083   2.012 
> dl <- lapply(rep(5, 90000), rnorm)
> system.time(mapply(sd, dl))
   user  system elapsed 
  2.346   0.011   2.352 
> system.time(pbmapply(sd, dl))
  |================================================| 100%
   user  system elapsed 
  5.711   0.047   5.841 
>

这一段代码,类似的,它使用wrapper来对mapply的子线程及主environment进行通讯,每一个子线程产生,都在主environment的中的counter上加1,从而现进度条的更新。

2 thoughts on “在R中为for loop, apply loop增加进度条(progress bar)

  1. Reply wxx0316 10月 22,2013 8:49 下午

    你好,我读了你的很多博文,感觉非常有用,在此先表示感谢~
    我想转载其中一些文章,不知是否可以?

Leave a Reply

  

  

  

%d 博主赞过: