MeSH分析

在2013年,MeSH ORA framework在Bioconductor发布了。它很恐怖的一次性发布了123多个包,其中有120个是数据包。MeSH可以做些什么呢?它可以把一个entrezID下所有的发布的文章列出来。你可以直接拿到摘要,甚至可以直接下载到文章。这就厉害了,你想想如果我拿出一堆差异表达的基因,以前只能做做什么GO富集分析啊,或者说是pathway富集分析啊。现在我们做个MeSH关键字富集分析如何?而MeSH ORA fromework就可以做到这一点。我们还可以想象,如果再复杂一点,我们通过做homologous,把不同种的同源蛋白可以非常简单地全部拿出来做关键词分析,也许可以扩展思路呢。以后他们会做些数据的图形化显示,比如tag cloud啊之类的。也许我们也可以做一些事情,比如meta-analyze一下文章的摘要。如果感兴趣,在回复中把自己的想法实现后的package链接给出来吧。

好了,本文就只是粗略地介绍一下如果做一堆基因的关键词富集分析。

# 安装包
library(BiocInstaller)
biocLite("meshr")
library(meshr)
## Loading required package: fdrtool
## Loading required package: Category
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
##     as.data.frame, cbind, colnames, do.call, duplicated, eval,
##     evalq, get, grep, grepl, intersect, is.unsorted, lapply,
##     lengths, mapply, match, mget, order, paste, pmax, pmax.int,
##     pmin, pmin.int, rank, rbind, rownames, sapply, setdiff, sort,
##     table, tapply, union, unique, unsplit
## Loading required package: AnnotationDbi
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: IRanges
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
## 
##     colMeans, colSums, expand.grid, rowMeans, rowSums
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following object is masked from 'package:S4Vectors':
## 
##     expand
## Loading required package: cummeRbund
## Loading required package: RSQLite
## Loading required package: DBI
## Loading required package: ggplot2
## Loading required package: reshape2
## Loading required package: fastcluster
## 
## Attaching package: 'fastcluster'
## The following object is masked from 'package:stats':
## 
##     hclust
## Loading required package: rtracklayer
## Loading required package: GenomicRanges
## Loading required package: GenomeInfoDb
## Loading required package: Gviz
## Loading required package: grid
## 
## Attaching package: 'cummeRbund'
## The following object is masked from 'package:GenomicRanges':
## 
##     promoters
## The following object is masked from 'package:IRanges':
## 
##     promoters
## The following object is masked from 'package:Biobase':
## 
##     samples
## The following object is masked from 'package:BiocGenerics':
## 
##     conditions
## Loading required package: org.Hs.eg.db
## 
## Loading required package: MeSH.db
## Loading required package: MeSHDbi
## 
## Attaching package: 'MeSHDbi'
## The following object is masked from 'package:utils':
## 
##     packageName
## Loading required package: MeSH.AOR.db
## Loading required package: MeSH.PCR.db
## Loading required package: MeSH.Hsa.eg.db
## Loading required package: MeSH.Aca.eg.db
## Loading required package: MeSH.Bsu.168.eg.db
## Loading required package: MeSH.Syn.eg.db
data(geneid.cummeRbund) # 所有的基因,
data(sig.geneid.cummeRbund) # 差异表达的基因
# 我们看一下都有多少条
dim(geneid.cummeRbund)
## [1] 303   2
dim(sig.geneid.cummeRbund)
## [1] 104   2
head(geneid.cummeRbund)
##    SYMBOL ENTREZID
## 1   OR4F5    79501
## 2  OR4F16    81399
## 3  SAMD11   148398
## 4  KLHL17   339451
## 5 PLEKHN1    84069
## 6   ISG15     9636
head(sig.geneid.cummeRbund)
##    SYMBOL ENTREZID
## 1  KLHL17   339451
## 2 PLEKHN1    84069
## 3    AGRN   375790
## 4  TTLL10   254173
## 5  SCNN1D     6339
## 6    VWA1    64856
# 需要载入一些包来进行计算。
library("fdrtool") 
library("MeSH.Hsa.eg.db") # 这个Hsa就是Hsapiens的前三个字母。
# 和topGO类似的,它也需要先构建一个MeSHHyperGParams object来指定富集时的参数。
meshParams <- new("MeSHHyperGParams", #
                  geneIds = sig.geneid.cummeRbund[, 2], #
                  universeGeneIds = geneid.cummeRbund[, 2], #
                  annotation = "MeSH.Hsa.eg.db", #
                  category = "C", # 类似于GO的BP, MF, CC之类的分类
                  database = "gendoo", #可以是gendoo, gene2pubmed或者RBBH
                  pvalueCutoff = 0.05, #
                  pAdjust = "none") # pAdjust设置成none只是因为这个例子很小,大多数的时候,我们可以把它设置成BH,QV或者FDR
meshR <- meshHyperGTest(meshParams) # 超几何分布算一下pvale, 然后通过p value来过滤结果
meshR # 看一下它是个什么对象,是不是和topGO很类似?
## MeSH enrichment analysis for category Diseases 
## Annotation package used:  MeSH.Hsa.eg.db 
## The correspondance is retrived from:  gendoo 
## Number of MeSH terms identified:  9
head(summary(meshR))
##        MESHID       Pvalue OddsRatio ExpCount Count Size
## 15774 D009361 0.0002619206  3.942771 11.32673    21   33
## 15775 D009361 0.0002619206  3.942771 11.32673    21   33
## 15776 D009361 0.0002619206  3.942771 11.32673    21   33
## 15777 D009361 0.0002619206  3.942771 11.32673    21   33
## 15778 D009361 0.0002619206  3.942771 11.32673    21   33
## 15779 D009361 0.0002619206  3.942771 11.32673    21   33
##                    MESHTERM GENEID SOURCEID
## 15774 Neoplasm Invasiveness   2069     <NA>
## 15775 Neoplasm Invasiveness   3106     <NA>
## 15776 Neoplasm Invasiveness 145957     <NA>
## 15777 Neoplasm Invasiveness   5579     <NA>
## 15778 Neoplasm Invasiveness    843     <NA>
## 15779 Neoplasm Invasiveness   3294     <NA>

我们来把类换成G (Phenomena and Processes).这些类别有:

Abbreviations Categories No. of terms Examples

A

Anatomy

    2882

Muscles, Skeleton

B

Organisms

    5169

Gram-negative Bacteria

C

Diseases

  111257

Leukemia, Burns

D

Chemicals and Drugs

   20633

Fatty Acids, Ligases

E

Analytical Diagnostic and Therapeutic

    4720

Dental Care

Techniques

F

Psychiatry and Psychology

    1127

Behavior, Motivation

G

Phenomena and Processes

    3352

Antibody Formation

H

Disciplines and Occupations

     495

Biology, Clinical Medicine

I

Anthropology, Education, Sociology

     622

Economics, Culture

Social

J

Technology and Food and Beverages

     597

Bread, Coffee, Tea

K

Humanities

     216

Music, Religion

L

Information Science

     505

Communications Media

M

Persons

     245

Adult Children, Drug Users

N

Health Care

    2297

Oral Health, Women’s Health

V

Publication Type

     180

Book Illustrations, Letter

Z

Geographical Locations

     546

Japan, China, Taiwan

category(meshParams) <- "G"
database(meshParams) <- "gene2pubmed"
meshR <- meshHyperGTest(meshParams)
meshR
## MeSH enrichment analysis for category Phenomena and Processes 
## Annotation package used:  MeSH.Hsa.eg.db 
## The correspondance is retrived from:  gene2pubmed 
## Number of MeSH terms identified:  9
head(summary(meshR))
##          MESHID       Pvalue OddsRatio ExpCount Count Size
## 1533148 D060066 4.585389e-06  3.256735 60.75248    79  177
## 1533149 D060066 4.585389e-06  3.256735 60.75248    79  177
## 1533150 D060066 4.585389e-06  3.256735 60.75248    79  177
## 1533151 D060066 4.585389e-06  3.256735 60.75248    79  177
## 1533152 D060066 4.585389e-06  3.256735 60.75248    79  177
## 1533153 D060066 4.585389e-06  3.256735 60.75248    79  177
##                         MESHTERM GENEID SOURCEID
## 1533148 Protein Interaction Maps   5901 22939629
## 1533149 Protein Interaction Maps   8061 21988832
## 1533150 Protein Interaction Maps   6122 23443559
## 1533151 Protein Interaction Maps   9730 23443559
## 1533152 Protein Interaction Maps 389114 26186194
## 1533153 Protein Interaction Maps   6146 21853274

我们再来看一组例子

library(mesh)
library(fdrtool)
library(MeSH.Hsa.eg.db)
allege <- select(MeSH.Hsa.eg.db, columns="GENEID", keytype="SOURCED", keys="gene2pubmed")
siggene <- sig.geneid.cummeRbund[, 2]
meshParams <- new("MeSHHyperGParams", 
                  geneIds = siggene,
                  universeGeneIds = allgene,
                  annotation = "MeSH.Hsa.eg.db", 
                  category = "G", 
                  database = "gene2pubmed",
                  pvalueCutoff = 0.05,
                  pAdjust = "QV")
output <- meshHyperGTest(meshParams)
save.pdf(output) # 输出所有的文献,慎用。

Leave a Reply

  

  

  

%d 博主赞过: