如何使用bioconductor做liftover 2

当我们有不同版本的bed文件,比如说有的bed文件map到hg18, 有的bed文件map到hg19,或者对于小鼠来说有些文件map到mm9,有些文件map到mm10。我们如何实现对比对到不同基因组版本的数据进行比较呢?在比较之前,第一个步骤就是将不同的版本的比对文件统一至一个版本,这就需要使用到UCSC提供的liftOver工具。其可执行工具在:http://hgdownload.cse.ucsc.edu/admin/exe/, chain文件在:http://hgdownload.cse.ucsc.edu/downloads.html#source_downloads

那么,在bioconductor中如何实现这一操作呢?这里我们需要使用到的工具是rtracklayer。接下来,我们就来看看示例代码。

> library(rtracklayer)
> ##下载一个bed文件(hg18)
> download.file("http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt", "Demo.bed")
trying URL 'http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt'
Content type 'text/plain; charset=UTF-8' length 707 bytes
opened URL
==================================================
downloaded 707 bytes
 
> bedfile <- read.delim("Demo.bed", header=F, skip=3)
> write.table(bedfile, "Demo.bed", quote=F, sep="\t", row.names=F, col.names=F)
> ##使用rtracklayer::import来读取它。之前的处理只是因为这个import函数不能有效的处理有注释的bed文件
> bed.hg18 <- import("Demo.bed", format="BED")
> ##下载chain文件,并读入。
> download.file("http://hgdownload.cse.ucsc.edu/goldenPath/hg18/liftOver/hg18ToHg19.over.chain.gz", "hg18ToHg19.over.chain.gz")
trying URL 'http://hgdownload.cse.ucsc.edu/goldenPath/hg18/liftOver/hg18ToHg19.over.chain.gz'
Content type 'application/x-gzip' length 140346 bytes (137 Kb)
opened URL
==================================================
downloaded 137 Kb
 
> library(GEOquery)
> gunzip("hg18ToHg19.over.chain.gz")
> chain <- import.chain("hg18ToHg19.over.chain")
> ##做liftOver
> bed.hg19 <- liftOver(bed.hg18, chain)
> bed.hg18
GRanges with 9 ranges and 4 metadata columns:
      seqnames                 ranges strand |        name     score     itemRgb                  thick
         <Rle>              <IRanges>  <Rle> | <character> <numeric> <character>              <IRanges>
  [1]     chr7 [127471197, 127472363]      + |        Pos1         0     #FF0000 [127471197, 127472363]
  [2]     chr7 [127472364, 127473530]      + |        Pos2         0     #FF0000 [127472364, 127473530]
  [3]     chr7 [127473531, 127474697]      + |        Pos3         0     #FF0000 [127473531, 127474697]
  [4]     chr7 [127474698, 127475864]      + |        Pos4         0     #FF0000 [127474698, 127475864]
  [5]     chr7 [127475865, 127477031]      - |        Neg1         0     #0000FF [127475865, 127477031]
  [6]     chr7 [127477032, 127478198]      - |        Neg2         0     #0000FF [127477032, 127478198]
  [7]     chr7 [127478199, 127479365]      - |        Neg3         0     #0000FF [127478199, 127479365]
  [8]     chr7 [127479366, 127480532]      + |        Pos5         0     #FF0000 [127479366, 127480532]
  [9]     chr7 [127480533, 127481699]      - |        Neg4         0     #0000FF [127480533, 127481699]
  ---
  seqlengths:
   chr7
     NA
> bed.hg19
GRangesList of length 9:
$1 
GRanges with 1 range and 4 metadata columns:
      seqnames                 ranges strand |        name     score     itemRgb                  thick
         <Rle>              <IRanges>  <Rle> | <character> <numeric> <character>              <IRanges>
  [1]     chr7 [127683961, 127685127]      + |        Pos1         0     #FF0000 [127471197, 127472363]
 
$2 
GRanges with 1 range and 4 metadata columns:
      seqnames                 ranges strand | name score itemRgb                  thick
  [1]     chr7 [127685128, 127686294]      + | Pos2     0 #FF0000 [127472364, 127473530]
 
$3 
GRanges with 1 range and 4 metadata columns:
      seqnames                 ranges strand | name score itemRgb                  thick
  [1]     chr7 [127686295, 127687461]      + | Pos3     0 #FF0000 [127473531, 127474697]
 
...
<6 more elements>
---
seqlengths:
 chr7
   NA
> unlist(bed.hg19)
GRanges with 9 ranges and 4 metadata columns:
    seqnames                 ranges strand |        name     score     itemRgb                  thick
       <Rle>              <IRanges>  <Rle> | <character> <numeric> <character>              <IRanges>
  1     chr7 [127683961, 127685127]      + |        Pos1         0     #FF0000 [127471197, 127472363]
  2     chr7 [127685128, 127686294]      + |        Pos2         0     #FF0000 [127472364, 127473530]
  3     chr7 [127686295, 127687461]      + |        Pos3         0     #FF0000 [127473531, 127474697]
  4     chr7 [127687462, 127688628]      + |        Pos4         0     #FF0000 [127474698, 127475864]
  5     chr7 [127688629, 127689795]      - |        Neg1         0     #0000FF [127475865, 127477031]
  6     chr7 [127689796, 127690962]      - |        Neg2         0     #0000FF [127477032, 127478198]
  7     chr7 [127690963, 127692129]      - |        Neg3         0     #0000FF [127478199, 127479365]
  8     chr7 [127692130, 127693296]      + |        Pos5         0     #FF0000 [127479366, 127480532]
  9     chr7 [127693297, 127694463]      - |        Neg4         0     #0000FF [127480533, 127481699]
  ---
  seqlengths:
   chr7
     NA

2 thoughts on “如何使用bioconductor做liftover

  1. Reply Xiaofei 12月 6,2014 4:02 上午

    你好,我用你的方法做的liftOver,但是在export结果时一直出问题。

    “Error in export(object, FileForFormat(con, format), …) :
    在为’export’函数选择方法时评估’con’参数出了错: Error in FileForFormat(con, format) : Format ‘BED’ unsupported ”

    请问这是怎么回事?或者怎么导出得到的结果?

Leave a Reply

  

  

  

%d 博主赞过: