如何使用R将DAVID功能分析中的function annotation cluster结果2D图合并显示 21

标题太长了,我实在是无奈:)

首先是数据生成,先去DAVID网站Start Analysis 下点选Demolist1.
Screen Shot 2015-04-06 at 9.07.50 PM
然后点击function annotation cluster,跳出下图的窗口。
Screen Shot 2015-04-06 at 9.08.27 PM
点击右侧的Download File按钮。

或者您可以直接从本站DAVID.output

而后打开R

> setwd("~/Downloads/qiuworld.com/") #设置工作路径为下载文件所在的路径。
> david <- read.delim("DAVID.output.csv", header=FALSE, stringsAsFactors=FALSE)#读取文件。虽然文件以csv结尾,但是它的格式其实是tab间隔的文本文件。
> david[1:5, 1:5]
                    V1                                   V2    V3
1 Annotation Cluster 1 Enrichment Score: 4.8102175510122995      
2             Category                                 Term Count
3       UP_SEQ_FEATURE                       signal peptide    50
4      SP_PIR_KEYWORDS                               signal    50
5       UP_SEQ_FEATURE                       disulfide bond    45
                 V4                   V5
1                                       
2                 %               PValue
3 32.25806451612903 6.549106601734033E-7
4 32.25806451612903 8.552629036683175E-7
5 29.03225806451613 1.245664195299075E-6
> clusterIDs <- grepl("Annotation Cluster", david$V1) #在这里,我们需要得到Annotation Cluster文字所在的行,然后依照它们将文件分割成list.
> sum(clusterIDs)
[1] 72
> cluster <- diff(c(which(clusterIDs), length(clusterIDs)+1))
> length(cluster)
[1] 72
> cluster <- rep(which(clusterIDs), cluster)
> length(cluster)==nrow(david)
[1] TRUE
> david <- split(david, cluster)
> david[[72]][,1:5]
                       V1                                     V2    V3
768 Annotation Cluster 72 Enrichment Score: 0.004486343864196002      
769              Category                                   Term Count
770         GOTERM_BP_FAT GO:0050877~neurological system process     7
771         GOTERM_BP_FAT          GO:0007600~sensory perception     4
772         GOTERM_BP_FAT                   GO:0050890~cognition     4
                    V4                 V5
768                                      
769                  %             PValue
770  4.516129032258064 0.9846981772714716
771 2.5806451612903225  0.989172043715323
772 2.5806451612903225 0.9953274937922806
> names(david) <- 1:length(david)
> enrichmentScore <- gsub("Enrichment Score: ", "", sapply(david, `[`, 1, 2))
> tail(enrichmentScore)
                    67                     68                     69 
 "0.03932168648330851" "0.030350453200157605" "0.022377527829699675" 
                    70                     71                     72 
"0.022175278411917944" "0.012798620974751502" "0.004486343864196002" 
> david <- lapply(david, function(.ele) {colnames(.ele) <- .ele[2,]; .ele[-(1:2),]})
> david[[72]][,1:5]
         Category                                   Term Count
770 GOTERM_BP_FAT GO:0050877~neurological system process     7
771 GOTERM_BP_FAT          GO:0007600~sensory perception     4
772 GOTERM_BP_FAT                   GO:0050890~cognition     4
                     %             PValue
770  4.516129032258064 0.9846981772714716
771 2.5806451612903225  0.989172043715323
772 2.5806451612903225 0.9953274937922806
> david <- mapply(function(.ele, id, score) cbind(.ele, cluster=id, enrichmentScore=score), david, names(david), enrichmentScore, SIMPLIFY=FALSE)
> david <- do.call(rbind, david) #至此,我们将文件转换成为了一个data.frame
> tail(david)
            Category                                             Term Count
71.765 GOTERM_BP_FAT                  GO:0008104~protein localization     6
71.766 GOTERM_BP_FAT                     GO:0015031~protein transport     4
71.767 GOTERM_BP_FAT GO:0045184~establishment of protein localization     4
72.770 GOTERM_BP_FAT           GO:0050877~neurological system process     7
72.771 GOTERM_BP_FAT                    GO:0007600~sensory perception     4
72.772 GOTERM_BP_FAT                             GO:0050890~cognition     4
                        %             PValue
71.765  3.870967741935484 0.9447238129398181
71.766 2.5806451612903225  0.983901594737508
71.767 2.5806451612903225 0.9847990254750114
72.770  4.516129032258064 0.9846981772714716
72.771 2.5806451612903225  0.989172043715323
72.772 2.5806451612903225 0.9953274937922806
                                                                      Genes
71.765      33371_S_AT, 1974_S_AT, 40271_AT, 38236_AT, 41703_R_AT, 40294_AT
71.766                            33371_S_AT, 1974_S_AT, 40271_AT, 40294_AT
71.767                            33371_S_AT, 1974_S_AT, 40271_AT, 40294_AT
72.770 38604_AT, 38236_AT, 32418_AT, 40317_AT, 32279_AT, 1901_S_AT, 1786_AT
72.771                                38604_AT, 38236_AT, 32418_AT, 1786_AT
72.772                                38604_AT, 38236_AT, 32418_AT, 1786_AT
       List Total Pop Hits Pop Total     Fold Enrichment Bonferroni
71.765        136      882     13528  0.6766706682673069        1.0
71.766        136      762     13528  0.5221553188204415        1.0
71.767        136      769     13528  0.5174022795073816        1.0
72.770        136     1210     13528  0.5754496840058337        1.0
72.771        136      810     13528  0.4912127814088598        1.0
72.772        136      909     13528 0.43771435967126127        1.0
                Benjamini   FDR cluster      enrichmentScore
71.765 0.9999996063719517 100.0      71 0.012798620974751502
71.766 0.9999999988540363 100.0      71 0.012798620974751502
71.767 0.9999999990037772 100.0      71 0.012798620974751502
72.770 0.9999999990427183 100.0      72 0.004486343864196002
72.771 0.9999999997991603 100.0      72 0.004486343864196002
72.772  0.999999999996184 100.0      72 0.004486343864196002
> david.s <- david[david$FDR<1, c("Term", "Genes", "cluster")] #我们将数据框简化一下,只保留后面需要用的列
> david.sg <- strsplit(as.character(david.s$Genes), ", ") #将每一行中的genes分割开
> david.sg[1:2]
[[1]]
 [1] "38646_S_AT" "31621_S_AT" "33077_AT"   "33284_AT"   "35915_AT"  
 [6] "37968_AT"   "1355_G_AT"  "1855_AT"    "35956_S_AT" "33979_AT"  
[11] "40350_AT"   "38236_AT"   "37454_AT"   "1403_S_AT"  "1112_G_AT" 
[16] "32250_AT"   "1901_S_AT"  "36766_AT"   "1645_AT"    "37172_AT"  
[21] "1786_AT"    "34546_AT"   "31987_AT"   "31512_AT"   "37096_AT"  
[26] "1890_AT"    "37898_R_AT" "40294_AT"   "1372_AT"    "36436_AT"  
[31] "37105_AT"   "35566_F_AT" "33963_AT"   "38926_AT"   "33684_AT"  
[36] "33530_AT"   "35903_AT"   "37408_AT"   "37061_AT"   "679_AT"    
[41] "34618_AT"   "31506_S_AT" "32469_AT"   "38604_AT"   "31793_AT"  
[46] "1331_S_AT"  "35169_AT"   "34623_AT"   "36703_AT"   "36328_AT"  
[51] "38508_S_AT" "2090_I_AT" 
 
[[2]]
 [1] "33963_AT"   "38646_S_AT" "38926_AT"   "33530_AT"   "31621_S_AT"
 [6] "33077_AT"   "33284_AT"   "37408_AT"   "37968_AT"   "34467_G_AT"
[11] "35915_AT"   "35367_AT"   "1355_G_AT"  "37061_AT"   "679_AT"    
[16] "35090_G_AT" "35956_S_AT" "37983_AT"   "35091_AT"   "37953_S_AT"
[21] "31506_S_AT" "33979_AT"   "32439_AT"   "40350_AT"   "31793_AT"  
[26] "37454_AT"   "1403_S_AT"  "1112_G_AT"  "32250_AT"   "1331_S_AT" 
[31] "1901_S_AT"  "36766_AT"   "37172_AT"   "34623_AT"   "1786_AT"   
[36] "34546_AT"   "31987_AT"   "37096_AT"   "31512_AT"   "36703_AT"  
[41] "36328_AT"   "1890_AT"    "37898_R_AT" "40317_AT"   "38508_S_AT"
[46] "1372_AT"    "37105_AT"   "35566_F_AT"
 
> allgenes <- unique(unlist(david.sg)) #获取所有基因我名字,这里其实是Affymetrix的ID。
> length(allgenes)
[1] 82
> allterms <- unique(names(david.sg)) #获取所有的GO terms的名字
> length(allterms)
[1] 26
> #构建一个matrix,后面调pheatmap,它的输入要求是一个data.frame.
> #这里,我们使用基因名为行名,使用GO terms为列名
> david.m <- matrix(0, nrow=length(allgenes), ncol=length(allterms), dimnames=list(allgenes, allterms)) 
> david.m[1:5, 1:5]
           signal disulfide bond GO:0005576~extracellular region
38646_S_AT      0              0                               0
31621_S_AT      0              0                               0
33077_AT        0              0                               0
33284_AT        0              0                               0
35915_AT        0              0                               0
           GO:0044421~extracellular region part Secreted
38646_S_AT                                    0        0
31621_S_AT                                    0        0
33077_AT                                      0        0
33284_AT                                      0        0
35915_AT                                      0        0
> names(david.sg) <- david.s$Term
> for(i in 1:length(david.sg)){david.m[david.sg[[i]], names(david.sg)[i]] <- 1} #对所有出现过的基因名及相应的GO term值赋为1, 绘图时显示为黑色。
> david.m[1:5, 1:5]
           signal disulfide bond GO:0005576~extracellular region
38646_S_AT      1              1                               0
31621_S_AT      1              1                               1
33077_AT        1              1                               0
33284_AT        1              1                               1
35915_AT        1              1                               1
           GO:0044421~extracellular region part Secreted
38646_S_AT                                    0        0
31621_S_AT                                    1        1
33077_AT                                      0        0
33284_AT                                      1        0
35915_AT                                      0        1
> annotation_col <- david.s[match(colnames(david.m), david.s[,1]), "cluster", drop=FALSE] #为图增加一条cluster信息。
> rownames(annotation_col) <- colnames(david.m)
> annotation_col$cluster <- as.numeric(as.character(annotation_col$cluster))
> head(annotation_col)
                                     cluster
signal                                     1
disulfide bond                             1
GO:0005576~extracellular region            1
GO:0044421~extracellular region part       1
Secreted                                   1
GO:0005615~extracellular space             1
> annotation_col$cluster <- annotation_col$cluster %% 2==1
> annotation_col$cluster <- as.numeric(annotation_col$cluster)
> library(pheatmap) #绘图
> pheatmap(david.m, color=c("white", "black"), legend=FALSE, cluster_col=FALSE, cluster_row=FALSE, show_rownames=T, show_colnames=T, fontsize=5, annotation_col=annotation_col, annotation_colors=list(cluster=c("gray80", "gray30")), annotation_legend=FALSE)

下图就是我们得到的结果:
heatmap

21 thoughts on “如何使用R将DAVID功能分析中的function annotation cluster结果2D图合并显示

  1. Reply super 4月 8,2015 9:05 上午

    欧老师,按照您第一条命令david <- read.delim("DAVID.output.csv", header=FALSE, stringsAsFactors=FALSE),得到这个错误:
    Error in `[.data.frame`(david, 1:5, 1:5) : undefined columns selected
    而head(david) 得到的是
    V1
    1 Annotation Cluster 1\tEnrichment Score: 4.8102175510122995,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
    2 Category\tTerm\tCount\t%\tPValue\tGenes\tList Total\tPop Hits\tPop Total\tFold Enrichment\tBonferroni\tBenjamini\tFDR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
    3 UP_SEQ_FEATURE\tsignal peptide\t50\t32.25806451612903\t6.549106601734033E-7\t38646_S_AT, 31621_S_AT, 33077_AT, 33284_AT, 35915_AT, 37968_AT, 1355_G_AT, 1855_AT, 35956_S_AT, 33979_AT, 40350_AT, 38236_AT, 37454_AT, 1403_S_AT, 1112_G_AT, 32250_AT, 1901_S_AT, 36766_AT, 1645_AT, 37172_AT, 1786_AT, 34546_AT, 31987_AT, 31512_AT, 37096_AT, 1890_AT, 37898_R_AT, 40294_AT, 1372_AT, 36436_AT, 37105_AT, 35566_F_AT, 33963_AT, 38926_AT, 33684_AT, 33530_AT, 35903_AT, 37408_AT, 37061_AT, 679_AT, 34618_AT, 31506_S_AT, 32469_AT, 38604_AT, 31793_AT, 1331_S_AT, 35169_AT, 34623_AT, 36703_AT, 36328_AT, 38508_S_AT, 2090_I_AT\t146\t3250\t19113\t2.0140147523709166\t4.2494684211979994E-4\t4.2494684211979994E-4\t9.776372745906592E-4,,,

    并没有消去‘,’,请问缘由?

  2. Reply ygc 4月 8,2015 9:32 下午

    在用户的feature request下,clusterProfiler也支持david了,不过必须指出,他们的数据有几年没更新了。

    http://ygc.name/2015/03/16/david-functional-analysis-with-clusterprofiler/

  3. Reply super 4月 13,2015 5:28 上午

    欧老师选择基因和GO的时候为何要设$FDR<1?可以设BH或者Bonferroni么?

  4. Reply super 4月 13,2015 8:39 上午

    请问能否解释一下function annotation cluster和function annotation的区别?
    后者不是把不同的genes预测到了指定的GO term了么?需要前者cluster什么?
    得到的cluster有没有名字呢?还就是我看到的cluster1,2,3,…这些?

    • Reply admin 4月 13,2015 3:47 下午

      cluster就是在function annotation的基础上再对gene做了一次cluster。具体可以参见:http://david.abcc.ncifcrf.gov/helps/2D_Introduction_files/additional_file_13.doc

  5. Reply chenzhnh 9月 24,2015 8:30 下午

    老师您好,
    最后一步使用pheatmap()绘制热图时,无法显示border,默认参数下border应该是显示的呀,而且无论我将border_color参数改成什么颜色都无法显示出来,请问这是为什么?另外一个关于pheatmap的问题是如何改变annotation_legend的大小,这个问题我之前向您发邮件请教,您可能没收到邮件,所以我一直没找到解决方法。谢谢您的帮助!

    • Reply admin 9月 25,2015 12:23 下午

      默认的border_color是灰色。应该是有显示的。至于你的这个问题,很奇怪的。你可以把代码和sessionInfo发上来吗?
      关于改变annotation_legend大小的问题,我猜可以通过改变字体大小的方式来改变。


      library(pheatmap)
      test = matrix(rnorm(200), 20, 10)
      test[1:10, seq(1, 10, 2)] = test[1:10, seq(1, 10, 2)] + 3
      test[11:20, seq(2, 10, 2)] = test[11:20, seq(2, 10, 2)] + 2
      test[15:20, seq(2, 10, 2)] = test[15:20, seq(2, 10, 2)] + 4
      colnames(test) = paste("Test", 1:10, sep = "")
      rownames(test) = paste("Gene", 1:20, sep = "")
      pheatmap(test, legend = FALSE)
      annotation_col = data.frame(
      CellType = factor(rep(c("CT1", "CT2"), 5)),
      Time = 1:5
      )
      rownames(annotation_col) = paste("Test", 1:10, sep = "")

      annotation_row = data.frame(
      GeneClass = factor(rep(c("Path1", "Path2", "Path3"), c(10, 4, 6)))
      )
      rownames(annotation_row) = paste("Gene", 1:20, sep = "")

      pheatmap(test, annotation_col = annotation_col, annotation_row = annotation_row)
      pheatmap(test, annotation_col = annotation_col, annotation_row = annotation_row, fontsize=15)
      pheatmap(test, annotation_col = annotation_col, annotation_row = annotation_row, fontsize=5)

      • Reply chenzhnh 9月 25,2015 3:40 下午

        命令全部和您教程中的一样,出的heatmap除了无法显示border外都和您的结果一样。sessioninfo如下,谢谢您!
        R version 3.2.2 (2015-08-14)
        Platform: x86_64-w64-mingw32/x64 (64-bit)
        Running under: Windows 7 x64 (build 7601) Service Pack 1

        locale:
        [1] LC_COLLATE=Chinese (Simplified)_People’s Republic of China.936
        [2] LC_CTYPE=Chinese (Simplified)_People’s Republic of China.936
        [3] LC_MONETARY=Chinese (Simplified)_People’s Republic of China.936
        [4] LC_NUMERIC=C
        [5] LC_TIME=Chinese (Simplified)_People’s Republic of China.936

        attached base packages:
        [1] stats graphics grDevices utils datasets methods base

        other attached packages:
        [1] pheatmap_1.0.7

        loaded via a namespace (and not attached):
        [1] colorspace_1.2-6 scales_0.3.0 plyr_1.8.3 tools_3.2.2
        [5] gtable_0.1.2 RColorBrewer_1.1-2 Rcpp_0.12.1 grid_3.2.2
        [9] munsell_0.4.2

      • Reply chenzhnh 9月 25,2015 3:48 下午

        另关于pheatmap legend大小的问题。的确通过改变字体大小进而压缩或者扩展图形,可以间接改变legend大小,但是我发现我画出的图的legend和您那篇heatmap教程画出的是不一样的,我的是几个色块连一起的,您那个是几个色块分开的。这是因为pheatmap package的version不同导致的,还是由参数设置的?谢谢您!

        • Reply admin 9月 28,2015 9:17 上午

          也有可能是版本问题。但是我建议你先试着使用pdf输出一下你的图像,看看是不是还是没有办法画出border。

  6. Reply R菜鸟 10月 13,2015 2:34 上午

    R version 3.2.0 alpha (2015-03-31 r68131)
    Copyright (C) 2015 The R Foundation for Statistical Computing
    Platform: x86_64-w64-mingw32/x64 (64-bit)

    R is free software and comes with ABSOLUTELY NO WARRANTY.
    You are welcome to redistribute it under certain conditions.
    Type ‘license()’ or ‘licence()’ for distribution details.

    R is a collaborative project with many contributors.
    Type ‘contributors()’ for more information and
    ‘citation()’ on how to cite R or R packages in publications.

    Type ‘demo()’ for some demos, ‘help()’ for on-line help, or
    ‘help.start()’ for an HTML browser interface to help.
    Type ‘q()’ to quit R.

    [Workspace loaded from ~/.RData]

    > setwd(“G:/plink”)
    > david david[1:5,1:5]
    V1 V2
    1 Annotation Cluster 1 Enrichment Score: 4.8102175510122995
    2 Category Term
    3 UP_SEQ_FEATURE signal peptide
    4 SP_PIR_KEYWORDS signal
    5 UP_SEQ_FEATURE disulfide bond
    V3 V4 V5
    1
    2 Count % PValue
    3 50 32.25806451612903 6.549106601734033E-7
    4 50 32.25806451612903 8.552629036683175E-7
    5 45 29.03225806451613 1.245664195299075E-6
    > clusterIDs sum(clusterIDs)
    [1] 72
    > cluster length(cluster)
    [1] 72
    > cluster length(cluster)==nrow(david)
    [1] TRUE
    > david david[[72]][,1:5]
    V1
    768 Annotation Cluster 72
    769 Category
    770 GOTERM_BP_FAT
    771 GOTERM_BP_FAT
    772 GOTERM_BP_FAT
    V2 V3
    768 Enrichment Score: 0.004486343864196002
    769 Term Count
    770 GO:0050877~neurological system process 7
    771 GO:0007600~sensory perception 4
    772 GO:0050890~cognition 4
    V4 V5
    768
    769 % PValue
    770 4.516129032258064 0.9846981772714716
    771 2.5806451612903225 0.989172043715323
    772 2.5806451612903225 0.9953274937922806
    > names(david) enrichmentScore tail(enrichmentScore)
    67 68
    ” 0.039321686483308464″ ” 0.030350453200157605″
    69 70
    ” 0.022377527829699675″ ” 0.022175278411917944″
    71 72
    ” 0.01279862097475155″ ” 0.004486343864196002″
    > david<-lapply(david,function(.ele){colnames(.ele) david[[72]][,1:5]
    Category Term Count
    770 GOTERM_BP_FAT GO:0050877~neurological system process 7
    771 GOTERM_BP_FAT GO:0007600~sensory perception 4
    772 GOTERM_BP_FAT GO:0050890~cognition 4
    % PValue
    770 4.516129032258064 0.9846981772714716
    771 2.5806451612903225 0.989172043715323
    772 2.5806451612903225 0.9953274937922806
    > david david david david david tail(david)
    Category
    71.765 GOTERM_BP_FAT
    71.766 GOTERM_BP_FAT
    71.767 GOTERM_BP_FAT
    72.770 GOTERM_BP_FAT
    72.771 GOTERM_BP_FAT
    72.772 GOTERM_BP_FAT
    Term Count
    71.765 GO:0008104~protein localization 6
    71.766 GO:0015031~protein transport 4
    71.767 GO:0045184~establishment of protein localization 4
    72.770 GO:0050877~neurological system process 7
    72.771 GO:0007600~sensory perception 4
    72.772 GO:0050890~cognition 4
    % PValue
    71.765 3.870967741935484 0.9447238129398181
    71.766 2.5806451612903225 0.983901594737508
    71.767 2.5806451612903225 0.9847990254750114
    72.770 4.516129032258064 0.9846981772714716
    72.771 2.5806451612903225 0.989172043715323
    72.772 2.5806451612903225 0.9953274937922806
    Genes
    71.765 33371_S_AT, 1974_S_AT, 40271_AT, 38236_AT, 41703_R_AT, 40294_AT
    71.766 33371_S_AT, 1974_S_AT, 40271_AT, 40294_AT
    71.767 33371_S_AT, 1974_S_AT, 40271_AT, 40294_AT
    72.770 38604_AT, 38236_AT, 32418_AT, 40317_AT, 32279_AT, 1901_S_AT, 1786_AT
    72.771 38604_AT, 38236_AT, 32418_AT, 1786_AT
    72.772 38604_AT, 38236_AT, 32418_AT, 1786_AT
    List Total Pop Hits Pop Total Fold Enrichment
    71.765 136 882 13528 0.6766706682673069
    71.766 136 762 13528 0.5221553188204415
    71.767 136 769 13528 0.5174022795073816
    72.770 136 1210 13528 0.5754496840058337
    72.771 136 810 13528 0.4912127814088598
    72.772 136 909 13528 0.43771435967126127
    Bonferroni Benjamini FDR cluster
    71.765 1.0 0.9999996063719517 100.0 71
    71.766 1.0 0.9999999988540363 100.0 71
    71.767 1.0 0.9999999990037772 100.0 71
    72.770 1.0 0.9999999990427183 100.0 72
    72.771 1.0 0.9999999997991603 100.0 72
    72.772 1.0 0.999999999996184 100.0 72
    enrichmentScore
    71.765 0.01279862097475155
    71.766 0.01279862097475155
    71.767 0.01279862097475155
    72.770 0.004486343864196002
    72.771 0.004486343864196002
    72.772 0.004486343864196002
    > david.s<-david[david$FDR david.sg david.sg david.sg[1:2]
    [[1]]
    [1] “38646_S_AT” ” 31621_S_AT” ” 33077_AT” ” 33284_AT”
    [5] ” 35915_AT” ” 37968_AT” ” 1355_G_AT” ” 1855_AT”
    [9] ” 35956_S_AT” ” 33979_AT” ” 40350_AT” ” 38236_AT”
    [13] ” 37454_AT” ” 1403_S_AT” ” 1112_G_AT” ” 32250_AT”
    [17] ” 1901_S_AT” ” 1645_AT” ” 37172_AT” ” 36766_AT”
    [21] ” 1786_AT” ” 34546_AT” ” 31987_AT” ” 31512_AT”
    [25] ” 37096_AT” ” 1890_AT” ” 37898_R_AT” ” 40294_AT”
    [29] ” 1372_AT” ” 36436_AT” ” 37105_AT” ” 35566_F_AT”
    [33] ” 33963_AT” ” 38926_AT” ” 33684_AT” ” 33530_AT”
    [37] ” 35903_AT” ” 37408_AT” ” 37061_AT” ” 679_AT”
    [41] ” 34618_AT” ” 31506_S_AT” ” 32469_AT” ” 38604_AT”
    [45] ” 31793_AT” ” 1331_S_AT” ” 35169_AT” ” 34623_AT”
    [49] ” 36703_AT” ” 36328_AT” ” 38508_S_AT” ” 2090_I_AT”

    [[2]]
    [1] “33963_AT” ” 38646_S_AT” ” 38926_AT” ” 33530_AT”
    [5] ” 31621_S_AT” ” 33077_AT” ” 33284_AT” ” 37408_AT”
    [9] ” 37968_AT” ” 34467_G_AT” ” 35915_AT” ” 35367_AT”
    [13] ” 1355_G_AT” ” 37061_AT” ” 679_AT” ” 35090_G_AT”
    [17] ” 35956_S_AT” ” 37983_AT” ” 35091_AT” ” 37953_S_AT”
    [21] ” 31506_S_AT” ” 33979_AT” ” 32439_AT” ” 40350_AT”
    [25] ” 31793_AT” ” 37454_AT” ” 1403_S_AT” ” 1112_G_AT”
    [29] ” 32250_AT” ” 1331_S_AT” ” 1901_S_AT” ” 36766_AT”
    [33] ” 37172_AT” ” 34623_AT” ” 1786_AT” ” 34546_AT”
    [37] ” 31987_AT” ” 37096_AT” ” 31512_AT” ” 36703_AT”
    [41] ” 36328_AT” ” 1890_AT” ” 37898_R_AT” ” 40317_AT”
    [45] ” 38508_S_AT” ” 1372_AT” ” 37105_AT” ” 35566_F_AT”

    > allgenes length(allgenes)
    [1] 90
    > allterms length(allterms)
    [1] 0
    > david.m<-matrix(0,nrow=length(allgenes),ncol=length(allterms,dimnames=list(allgenes,allterms))
    + david.m[1:5,1:5]
    Error: unexpected symbol in:
    "david.m david.m david.m[1:5,1:5]
    Error in david.m[1:5, 1:5] : subscript out of bounds
    > david.m[1:5,1:5]
    Error in david.m[1:5, 1:5] : subscript out of bounds
    > david.m[1:5,1:5]
    Error in david.m[1:5, 1:5] : subscript out of bounds
    > david.m length(allterms)
    [1] 0
    > length(allgenes)
    [1] 90
    >
    > R version 3.2.0 alpha (2015-03-31 r68131)
    Error: unexpected symbol in “R version”
    > Copyright (C) 2015 The R Foundation for Statistical Computing
    Error: unexpected numeric constant in “Copyright (C) 2015”
    > Platform: x86_64-w64-mingw32/x64 (64-bit)
    Error: object ‘Platform’ not found
    >
    > R is free software and comes with ABSOLUTELY NO WARRANTY.
    Error: unexpected symbol in “R is”
    > You are welcome to redistribute it under certain conditions.
    Error: unexpected symbol in “You are”
    > Type ‘license()’ or ‘licence()’ for distribution details.
    Error: unexpected string constant in “Type ‘license()'”
    >
    > R is a collaborative project with many contributors.
    Error: unexpected symbol in “R is”
    > Type ‘contributors()’ for more information and
    Error: unexpected string constant in “Type ‘contributors()'”
    > ‘citation()’ on how to cite R or R packages in publications.
    Error: unexpected symbol in “‘citation()’ on”
    >
    > Type ‘demo()’ for some demos, ‘help()’ for on-line help, or
    Error: unexpected string constant in “Type ‘demo()'”
    > ‘help.start()’ for an HTML browser interface to help.
    Error: unexpected ‘for’ in “‘help.start()’ for”
    > Type ‘q()’ to quit R.
    Error: unexpected string constant in “Type ‘q()'”
    >
    > [Workspace loaded from ~/.RData]
    Error: unexpected ‘[‘ in “[”
    >
    > > setwd(“G:/plink”)
    Error: unexpected ‘>’ in “>”
    > > david’ in “>”
    > > david[1:5,1:5]
    Error: unexpected ‘>’ in “>”
    >
    > R version 3.2.0 alpha (2015-03-31 r68131)
    Error: unexpected symbol in “R version”
    > Copyright (C) 2015 The R Foundation for Statistical Computing
    Error: unexpected numeric constant in “Copyright (C) 2015”
    > Platform: x86_64-w64-mingw32/x64 (64-bit)
    Error: object ‘Platform’ not found
    >
    > R is free software and comes with ABSOLUTELY NO WARRANTY.
    Error: unexpected symbol in “R is”
    > You are welcome to redistribute it under certain conditions.
    Error: unexpected symbol in “You are”
    > Type ‘license()’ or ‘licence()’ for distribution details.
    Error: unexpected string constant in “Type ‘license()'”
    >
    > R is a collaborative project with many contributors.
    Error: unexpected symbol in “R is”
    > Type ‘contributors()’ for more information and
    Error: unexpected string constant in “Type ‘contributors()'”
    > ‘citation()’ on how to cite R or R packages in publications.
    Error: unexpected symbol in “‘citation()’ on”
    >
    > Type ‘demo()’ for some demos, ‘help()’ for on-line help, or
    Error: unexpected string constant in “Type ‘demo()'”
    > ‘help.start()’ for an HTML browser interface to help.
    Error: unexpected ‘for’ in “‘help.start()’ for”
    > Type ‘q()’ to quit R.
    Error: unexpected string constant in “Type ‘q()'”
    >
    > [Workspace loaded from ~/.RData]
    Error: unexpected ‘[‘ in “[”
    >
    > > setwd(“G:/plink”)
    Error: unexpected ‘>’ in “>”
    > > david’ in “>”
    > > david[1:5,1:5]
    Error: unexpected ‘>’ in “>”
    >
    > R version 3.2.0 alpha (2015-03-31 r68131)
    Error: unexpected symbol in “R version”
    > Copyright (C) 2015 The R Foundation for Statistical Computing
    Error: unexpected numeric constant in “Copyright (C) 2015”
    > Platform: x86_64-w64-mingw32/x64 (64-bit)
    Error: object ‘Platform’ not found
    >
    > R is free software and comes with ABSOLUTELY NO WARRANTY.
    Error: unexpected symbol in “R is”
    > You are welcome to redistribute it under certain conditions.
    Error: unexpected symbol in “You are”
    > Type ‘license()’ or ‘licence()’ for distribution details.
    Error: unexpected string constant in “Type ‘license()'”
    >
    > R is a collaborative project with many contributors.
    Error: unexpected symbol in “R is”
    > Type ‘contributors()’ for more information and
    Error: unexpected string constant in “Type ‘contributors()'”
    > ‘citation()’ on how to cite R or R packages in publications.
    Error: unexpected symbol in “‘citation()’ on”
    >
    > Type ‘demo()’ for some demos, ‘help()’ for on-line help, or
    Error: unexpected string constant in “Type ‘demo()'”
    > ‘help.start()’ for an HTML browser interface to help.
    Error: unexpected ‘for’ in “‘help.start()’ for”
    > Type ‘q()’ to quit R.
    Error: unexpected string constant in “Type ‘q()'”
    >
    > [Workspace loaded from ~/.RData]
    Error: unexpected ‘[‘ in “[”
    >
    > > setwd(“G:/plink”)
    Error: unexpected ‘>’ in “>”
    > > david’ in “>”
    > > david[1:5,1:5]
    Error: unexpected ‘>’ in “>”
    >
    > R version 3.2.0 alpha (2015-03-31 r68131)
    Error: unexpected symbol in “R version”
    > Copyright (C) 2015 The R Foundation for Statistical Computing
    Error: unexpected numeric constant in “Copyright (C) 2015”
    > Platform: x86_64-w64-mingw32/x64 (64-bit)
    Error: object ‘Platform’ not found
    >
    > R is free software and comes with ABSOLUTELY NO WARRANTY.
    Error: unexpected symbol in “R is”
    > You are welcome to redistribute it under certain conditions.
    Error: unexpected symbol in “You are”
    > Type ‘license()’ or ‘licence()’ for distribution details.
    Error: unexpected string constant in “Type ‘license()'”
    >
    > R is a collaborative project with many contributors.
    Error: unexpected symbol in “R is”
    > Type ‘contributors()’ for more information and
    Error: unexpected string constant in “Type ‘contributors()'”
    > ‘citation()’ on how to cite R or R packages in publications.
    Error: unexpected symbol in “‘citation()’ on”
    >
    > Type ‘demo()’ for some demos, ‘help()’ for on-line help, or
    Error: unexpected string constant in “Type ‘demo()'”
    > ‘help.start()’ for an HTML browser interface to help.
    Error: unexpected ‘for’ in “‘help.start()’ for”
    > Type ‘q()’ to quit R.
    Error: unexpected string constant in “Type ‘q()'”
    >
    > [Workspace loaded from ~/.RData]
    Error: unexpected ‘[‘ in “[”
    > > setwd(“G:/plink”)
    Error: unexpected ‘>’ in “>”
    > > david’ in “>”
    > > david[1:5,1:5]
    Error: unexpected ‘>’ in “>”
    > setwd(“G:/plink”)
    > david david[1:5,1:5]
    V1 V2
    1 Annotation Cluster 1 Enrichment Score: 4.8102175510122995
    2 Category Term
    3 UP_SEQ_FEATURE signal peptide
    4 SP_PIR_KEYWORDS signal
    5 UP_SEQ_FEATURE disulfide bond
    V3 V4 V5
    1
    2 Count % PValue
    3 50 32.25806451612903 6.549106601734033E-7
    4 50 32.25806451612903 8.552629036683175E-7
    5 45 29.03225806451613 1.245664195299075E-6
    > david
    > david[1:5,1:5]
    V1 V2
    1 Annotation Cluster 1 Enrichment Score: 4.8102175510122995
    2 Category Term
    3 UP_SEQ_FEATURE signal peptide
    4 SP_PIR_KEYWORDS signal
    5 UP_SEQ_FEATURE disulfide bond
    V3 V4 V5
    1
    2 Count % PValue
    3 50 32.25806451612903 6.549106601734033E-7
    4 50 32.25806451612903 8.552629036683175E-7
    5 45 29.03225806451613 1.245664195299075E-6
    > clusterIDs sum(clusterIDs)
    [1] 72
    > cluster length(cluster)
    [1] 72
    > cluster length(cluster)==nrow(david)
    [1] TRUE
    > david david[[72]][,1:5]
    V1
    768 Annotation Cluster 72
    769 Category
    770 GOTERM_BP_FAT
    771 GOTERM_BP_FAT
    772 GOTERM_BP_FAT
    V2 V3
    768 Enrichment Score: 0.004486343864196002
    769 Term Count
    770 GO:0050877~neurological system process 7
    771 GO:0007600~sensory perception 4
    772 GO:0050890~cognition 4
    V4 V5
    768
    769 % PValue
    770 4.516129032258064 0.9846981772714716
    771 2.5806451612903225 0.989172043715323
    772 2.5806451612903225 0.9953274937922806
    > names(david) enrichmentScore tail(enrichmentScore)
    67 68 69
    ” 0.039321686483308464″ ” 0.030350453200157605″ ” 0.022377527829699675″
    70 71 72
    ” 0.022175278411917944″ ” 0.01279862097475155″ ” 0.004486343864196002″
    > david<-lapply(david,function(.ele){colnames(.ele) david[[72]][,1:5]
    Category Term Count
    770 GOTERM_BP_FAT GO:0050877~neurological system process 7
    771 GOTERM_BP_FAT GO:0007600~sensory perception 4
    772 GOTERM_BP_FAT GO:0050890~cognition 4
    % PValue
    770 4.516129032258064 0.9846981772714716
    771 2.5806451612903225 0.989172043715323
    772 2.5806451612903225 0.9953274937922806
    > david david tail(david)
    Category
    71.765 GOTERM_BP_FAT
    71.766 GOTERM_BP_FAT
    71.767 GOTERM_BP_FAT
    72.770 GOTERM_BP_FAT
    72.771 GOTERM_BP_FAT
    72.772 GOTERM_BP_FAT
    Term Count
    71.765 GO:0008104~protein localization 6
    71.766 GO:0015031~protein transport 4
    71.767 GO:0045184~establishment of protein localization 4
    72.770 GO:0050877~neurological system process 7
    72.771 GO:0007600~sensory perception 4
    72.772 GO:0050890~cognition 4
    % PValue
    71.765 3.870967741935484 0.9447238129398181
    71.766 2.5806451612903225 0.983901594737508
    71.767 2.5806451612903225 0.9847990254750114
    72.770 4.516129032258064 0.9846981772714716
    72.771 2.5806451612903225 0.989172043715323
    72.772 2.5806451612903225 0.9953274937922806
    Genes
    71.765 33371_S_AT, 1974_S_AT, 40271_AT, 38236_AT, 41703_R_AT, 40294_AT
    71.766 33371_S_AT, 1974_S_AT, 40271_AT, 40294_AT
    71.767 33371_S_AT, 1974_S_AT, 40271_AT, 40294_AT
    72.770 38604_AT, 38236_AT, 32418_AT, 40317_AT, 32279_AT, 1901_S_AT, 1786_AT
    72.771 38604_AT, 38236_AT, 32418_AT, 1786_AT
    72.772 38604_AT, 38236_AT, 32418_AT, 1786_AT
    List Total Pop Hits Pop Total Fold Enrichment
    71.765 136 882 13528 0.6766706682673069
    71.766 136 762 13528 0.5221553188204415
    71.767 136 769 13528 0.5174022795073816
    72.770 136 1210 13528 0.5754496840058337
    72.771 136 810 13528 0.4912127814088598
    72.772 136 909 13528 0.43771435967126127
    Bonferroni Benjamini FDR cluster
    71.765 1.0 0.9999996063719517 100.0 71
    71.766 1.0 0.9999999988540363 100.0 71
    71.767 1.0 0.9999999990037772 100.0 71
    72.770 1.0 0.9999999990427183 100.0 72
    72.771 1.0 0.9999999997991603 100.0 72
    72.772 1.0 0.999999999996184 100.0 72
    enrichmentScore
    71.765 0.01279862097475155
    71.766 0.01279862097475155
    71.767 0.01279862097475155
    72.770 0.004486343864196002
    72.771 0.004486343864196002
    72.772 0.004486343864196002
    > david.s<-david[david$FDR david.sg
    > david.sg[1:2]
    [[1]]
    [1] “38646_S_AT” ” 31621_S_AT” ” 33077_AT” ” 33284_AT”
    [5] ” 35915_AT” ” 37968_AT” ” 1355_G_AT” ” 1855_AT”
    [9] ” 35956_S_AT” ” 33979_AT” ” 40350_AT” ” 38236_AT”
    [13] ” 37454_AT” ” 1403_S_AT” ” 1112_G_AT” ” 32250_AT”
    [17] ” 1901_S_AT” ” 1645_AT” ” 37172_AT” ” 36766_AT”
    [21] ” 1786_AT” ” 34546_AT” ” 31987_AT” ” 31512_AT”
    [25] ” 37096_AT” ” 1890_AT” ” 37898_R_AT” ” 40294_AT”
    [29] ” 1372_AT” ” 36436_AT” ” 37105_AT” ” 35566_F_AT”
    [33] ” 33963_AT” ” 38926_AT” ” 33684_AT” ” 33530_AT”
    [37] ” 35903_AT” ” 37408_AT” ” 37061_AT” ” 679_AT”
    [41] ” 34618_AT” ” 31506_S_AT” ” 32469_AT” ” 38604_AT”
    [45] ” 31793_AT” ” 1331_S_AT” ” 35169_AT” ” 34623_AT”
    [49] ” 36703_AT” ” 36328_AT” ” 38508_S_AT” ” 2090_I_AT”

    [[2]]
    [1] “33963_AT” ” 38646_S_AT” ” 38926_AT” ” 33530_AT”
    [5] ” 31621_S_AT” ” 33077_AT” ” 33284_AT” ” 37408_AT”
    [9] ” 37968_AT” ” 34467_G_AT” ” 35915_AT” ” 35367_AT”
    [13] ” 1355_G_AT” ” 37061_AT” ” 679_AT” ” 35090_G_AT”
    [17] ” 35956_S_AT” ” 37983_AT” ” 35091_AT” ” 37953_S_AT”
    [21] ” 31506_S_AT” ” 33979_AT” ” 32439_AT” ” 40350_AT”
    [25] ” 31793_AT” ” 37454_AT” ” 1403_S_AT” ” 1112_G_AT”
    [29] ” 32250_AT” ” 1331_S_AT” ” 1901_S_AT” ” 36766_AT”
    [33] ” 37172_AT” ” 34623_AT” ” 1786_AT” ” 34546_AT”
    [37] ” 31987_AT” ” 37096_AT” ” 31512_AT” ” 36703_AT”
    [41] ” 36328_AT” ” 1890_AT” ” 37898_R_AT” ” 40317_AT”
    [45] ” 38508_S_AT” ” 1372_AT” ” 37105_AT” ” 35566_F_AT”

    > allgenes length(allgenes)
    [1] 90
    > allterms length(allterms)
    [1] 0
    为甚么老师得到的是26,我的却是0呢

  7. Reply Bingyu 4月 28,2016 4:14 上午

    annotation_col$cluster <- annotation_col$cluster %% 2==1
    annotation_col$cluster <- as.numeric(annotation_col$cluster)
    老师您好,这两条处理没有看懂是什么意思。这是要按出现的次数转换成1,0数值形式么?为什么第一步要看cluster是否出现偶数次?
    基础知识比较缺乏,麻烦指点下~~~多谢啦

    • Reply admin 5月 2,2016 2:18 下午

      你可以试一试比较一下去掉annotation_col的效果,比如:pheatmap(david.m, color=c(“white”, “black”), legend=FALSE, cluster_col=FALSE, cluster_row=FALSE, show_rownames=T, show_colnames=T, fontsize=5)

Leave a Reply

  

  

  

%d 博主赞过: