没文化真可怕。之前写[R中的pipelining（chaining）概念](/archives/3965)的时候，我就一直在想，为什么pipe的包要叫一个magrittr?然后就在网上查呀查，发现怎么没人解释一下。

后来查magritte的时候空然发现一幅图，这幅图在很多人介绍magrittr包的时候都用到了。我再一查，明白了。原来，这个叫Ren'{e} Magritte比利时超现实主义[……]

Skip to content
# 标签：生物信息学

# magrittr – Ceci n’est pas un pipe.

# 关于GenomicRanges::GRanges的几点

# tophat library type 应该怎么设置

# ChIP-Seq 数据分析

# 如何使用R将DAVID功能分析中的function annotation cluster结果2D图合并显示

# 使用Biostrings的PDict搜索一系列小片段

# 如何使用biomaRt做hg19注释

# R/Bioconductor开发中import和depend的区别

# bioconductor开发指导

# slide of BioC 2014 flashlight talk for motifStack

欢迎来到糗糗的世界

各类生物软件应用心得

当我们拿到一个测序结果，实验人员也搞不清楚他的结果的library type怎么运行tophat呢？

依照手册，你可以先访问实验人员，他使用的是什么样的测序手段，是用的illumina还是SOLiD。如果是SOLiD，多半是second strand. 如果是illumina，可能还是分辨一下，主要是分辨unstraned还是firststrand.

这可能需要分三步走：

1 r[……]

标题太长了，我实在是无奈：）

首先是数据生成，先去DAVID网站Start Analysis 下点选Demolist1.

然后点击function annotation cluster，跳出下图的窗口。

点击右侧的Download File按钮。

或者您可以直接从本站DAVID.output。

而后打开R

5b1bd79312a3edfb5adddf77418c01[……]

今天遇到一个小问题，在给定的一些序列中，搜索CRISPR Toolbox中GeCKO library中已有的guideRNA。这正好符合PDict的要求，PDict要求所有的片段长度必须一致，而guide sequence的长度正好都是20。

代码如下：

> library(Biostrings) > seq <- dir("tobeSearched", pattern=".seq$") > seq [1] "1-F_A01_015_2015-03-06.seq" "1-R_C02_012_2015-03-06.seq" "10-F_B02_014_2015-03-06.seq" [4] "10-R_D03_025_2015-03-06.seq" "2-F_B01_013_2015-03-06.seq" "2-R_D02_010_2015-03-06.seq" [7] "3-F_C01_011_2015-03-06.seq" "3-R_E02_008_2015-03-06.seq" "4-F_D01_009_2015-03-06.seq" [10] "4-R_F02_006_2015-03-06.seq" "5-F_E01_007_2015-03-06.seq" "5-R_G02_004_2015-03-06.seq" [13] "6-F_F01_005_2015-03-06.seq" "6-R_H02_002_2015-03-06.seq" "7-F_G01_003_2015-03-06.seq" [16] "7-R_A03_031_2015-03-06.seq" "8-F_H01_001_2015-03-06.seq" "8-R_B03_029_2015-03-06.seq" [19] "9-F_A02_016_2015-03-06.seq" "9-R_C03_027_2015-03-06.seq" > sequences <- sapply(seq, function(.ele){ readLines(file.path("tobeSearched", .ele))}) > sequences[1:2] $`1-F_A01_015_2015-03-06.seq` [1] "NNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNGNNNNNNNNNNNNNNNNNNNNNNNAATATCCCCCNGACGGNNNGNGNN" [2] "NNNNNNNNNNNNTTCAGGCTGNNNNNNNCACNNNNNNCGAGNCATCANNNNNAGGGNGNGNTTGNNNCTCANNNNNNCAG" [3] "GTNCTGNNNGNNGCGTTNAGTGNCATCNCCGGTNGGNNNNNNNNNGGNNAAACCNCNCGNANNCCCNNCNNGAAGCNNNN" [4] "ACNTTNNTTNNNCCCNANNNGCCCNAAATGTTTNANCAANANNTNNNNNNNNNNNGNNNNCTTNTNNTNNNNNNNNNNNN" [5] "NNNNNNGNNNNNCGGNNGGNACNAANCNNCCNNNNNNNNANCANANGCNGGGGNNAAATTTNNCCCCNNCANNNNNNNNA" [6] "ANGGGNNCCCCNGNAANNTTTTTNNNAANNNCTTTCCCCCCCCNNAGGCCTTTCCCCCNNNAANNCCNCCCCCNNNNNNN" [7] "NNNNNGGNAAAAAAAAAAAAAAANNNNNNNAAAAANNNNGNNNNNNNNNNNAAAAAGNATNNNTNNTGNNNNNNNNNNTG" [8] "NNNNNNNNNANNNNAANNNNNNNNNTTNNGAANNANNGNNNGNNNNNNNNGTNNTANNNNNGNGNNTATTNNGNTTANNG" [9] "GNNNNGNGNNTNNNNNNNNGNGNNNNNNNNCNGTNNNTNNNNNNNTNNNNNNNNNNGNNGNNGTNGNNGTNNNNNNNGNN" [10] "NNGGNNNNNNGNNGGGNGNNNGGNNNNNNNNNGNCNNNTNNNNNNGNGTNGNNGNGNNNGNNNNNTANTNTNTNNTAANA" [11] "ANGNGNNNNGTGNTNTTTNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNGNNNTTNNNNNNNNNNNNNNNNNN" $`1-R_C02_012_2015-03-06.seq` [1] "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCCGGCNNAAGNCATTATGCTCGGNNAAAGNACTTCCTGGCGCGTGNTTAC" [2] "CAGGGCGCACCGATTGCCATCACCGTTGAAGGGGCTAACATTCTGACCCGCNGCATGATGNTCTTCGGACAAGGNGCGAT" [3] "TCGTTGCCATCCGTACGTGCTGGANGAGANGGAAGCGGCGAANAANAATGANGTCAACGCGTTCGATAAACTGTTGTTCA" [4] "AACATATCGGTCACGNCGGTAGCAACAAAGNTCGCAGCTTCTGGCTGGNCNNGANGCGCGGNTTAACCAGCAGCACGNCA" [5] "ACCGGCGATGCCACTAAACGCTACTATCAGCACCTGANCNGCNGAGCGCCNNCCNCGCNNTGNTNNTGANGNNTCGNNNN" [6] "NGNNNNGGNGNNNNNNAANNNCNNNNNNNNNNGNNNNNTTNTTTNNNNNNANNCNNCCCNCCNNNNNNNCNGNNNNNNNN" [7] "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNNNNNNN" > sequences <- lapply(sequences, paste, collapse="") > sequences <- unlist(sequences) > sequences[1] 1-F_A01_015_2015-03-06.seq "NNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNGNNNNNNNNNNNNNNNNNNNNNNNAATATCCCCCNGACGGNNNGNGNNNNNNNNNNNNNNTTCAGGCTGNNNNNNNCACNNNNNNCGAGNCATCANNNNNAGGGNGNGNTTGNNNCTCANNNNNNCAGGTNCTGNNNGNNGCGTTNAGTGNCATCNCCGGTNGGNNNNNNNNNGGNNAAACCNCNCGNANNCCCNNCNNGAAGCNNNNACNTTNNTTNNNCCCNANNNGCCCNAAATGTTTNANCAANANNTNNNNNNNNNNNGNNNNCTTNTNNTNNNNNNNNNNNNNNNNNNGNNNNNCGGNNGGNACNAANCNNCCNNNNNNNNANCANANGCNGGGGNNAAATTTNNCCCCNNCANNNNNNNNAANGGGNNCCCCNGNAANNTTTTTNNNAANNNCTTTCCCCCCCCNNAGGCCTTTCCCCCNNNAANNCCNCCCCCNNNNNNNNNNNNGGNAAAAAAAAAAAAAAANNNNNNNAAAAANNNNGNNNNNNNNNNNAAAAAGNATNNNTNNTGNNNNNNNNNNTGNNNNNNNNNANNNNAANNNNNNNNNTTNNGAANNANNGNNNGNNNNNNNNGTNNTANNNNNGNGNNTATTNNGNTTANNGGNNNNGNGNNTNNNNNNNNGNGNNNNNNNNCNGTNNNTNNNNNNNTNNNNNNNNNNGNNGNNGTNGNNGTNNNNNNNGNNNNGGNNNNNNGNNGGGNGNNNGGNNNNNNNNNGNCNNNTNNNNNNGNGTNGNNGNGNNNGNNNNNTANTNTNTNNTAANAANGNGNNNNGTGNTNTTTNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNGNNNTTNNNNNNNNNNNNNNNNNN" > sequences <- DNAStringSet(sequences) > head(sequences) A DNAStringSet instance of length 6 width seq names [1] 872 NNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNG...NNNNNNNNGNNNTTNNNNNNNNNNNNNNNNNN 1-F_A01_015_2015-... [2] 528 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCCG...NNNNNNNNNNNNNNNNNNNNNNNNGNNNNNNN 1-R_C02_012_2015-... [3] 543 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNG...NNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNN 10-F_B02_014_2015... [4] 525 GNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 10-R_D03_025_2015... [5] 697 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNGNNNNNNNNNCTTTTTTTTTTNNNNNNAN 2-F_B01_013_2015-... [6] 527 NNNNNNNNNNNNNNNNNNNNNNNNGNNNNNNNC...NNNNNNNNNNNNNNNNNNNNNNNNNNAANNNN 2-R_D02_010_2015-... > library1 <- read.csv("mouse_geckov2_library_b_1.10000000053.csv", stringsAsFactors=F) > library2 <- read.csv("mouse_geckov2_library_a_2.10000000053.csv", stringsAsFactors=F) > head(library1) gene_id UID seq 1 0610007P14Rik MGLibB_00001 TGTCTAAGGTTTCTCAATCC 2 0610007P14Rik MGLibB_00002 CAGTGAATGGCCTCCAAGCC 3 0610007P14Rik MGLibB_00003 GGTGCTTACTTGCTACCATC 4 0610009B22Rik MGLibB_00004 ACCTCGTCGACGAAAACATG 5 0610009B22Rik MGLibB_00005 GTTCATAGCTCATGCTGCTC 6 0610009B22Rik MGLibB_00006 GAGCAGCATGAGCTATGAAC > head(library2) gene_id UID seq 1 0610007P14Rik MGLibA_00001 TCCTGAATGTGTTACGAAGC 2 0610007P14Rik MGLibA_00002 GGTCGGGCTCCGGTACCTAG 3 0610007P14Rik MGLibA_00003 GCCAGCTTCGTAACACATTC 4 0610009B22Rik MGLibA_00004 TCATCATGCTGCATGACGTG 5 0610009B22Rik MGLibA_00005 ATTCAATGAGTGGTTCGTCT 6 0610009B22Rik MGLibA_00006 TCGTCACGGCTGGGCACATG > dim(library1) [1] 62804 3 > dim(library2) [1] 67405 3 > library1.DNA <- library1$seq > library2.DNA <- library2$seq > names(library1.DNA) <- library1$UID > names(library2.DNA) <- library2$UID > library1.DNA <- DNAStringSet(library1.DNA) > library2.DNA <- DNAStringSet(library2.DNA) > head(library1.DNA) A DNAStringSet instance of length 6 width seq names [1] 20 TGTCTAAGGTTTCTCAATCC MGLibB_00001 [2] 20 CAGTGAATGGCCTCCAAGCC MGLibB_00002 [3] 20 GGTGCTTACTTGCTACCATC MGLibB_00003 [4] 20 ACCTCGTCGACGAAAACATG MGLibB_00004 [5] 20 GTTCATAGCTCATGCTGCTC MGLibB_00005 [6] 20 GAGCAGCATGAGCTATGAAC MGLibB_00006 > head(library2.DNA) A DNAStringSet instance of length 6 width seq names [1] 20 TCCTGAATGTGTTACGAAGC MGLibA_00001 [2] 20 GGTCGGGCTCCGGTACCTAG MGLibA_00002 [3] 20 GCCAGCTTCGTAACACATTC MGLibA_00003 [4] 20 TCATCATGCTGCATGACGTG MGLibA_00004 [5] 20 ATTCAATGAGTGGTTCGTCT MGLibA_00005 [6] 20 TCGTCACGGCTGGGCACATG MGLibA_00006 > library1.PDict <- PDict(library1.DNA) > library2.PDict <- PDict(library2.DNA) > mT.1 <- lapply(sequences, function(.ele){ whichPDict(library1.PDict, .ele)}) > mT.2 <- lapply(sequences, function(.ele){ whichPDict(library2.PDict, .ele)}) > sapply(mT.1, length) 1-F_A01_015_2015-03-06.seq 1-R_C02_012_2015-03-06.seq 10-F_B02_014_2015-03-06.seq 0 0 0 10-R_D03_025_2015-03-06.seq 2-F_B01_013_2015-03-06.seq 2-R_D02_010_2015-03-06.seq 0 0 0 3-F_C01_011_2015-03-06.seq 3-R_E02_008_2015-03-06.seq 4-F_D01_009_2015-03-06.seq 0 0 0 4-R_F02_006_2015-03-06.seq 5-F_E01_007_2015-03-06.seq 5-R_G02_004_2015-03-06.seq 0 0 0 6-F_F01_005_2015-03-06.seq 6-R_H02_002_2015-03-06.seq 7-F_G01_003_2015-03-06.seq 0 0 0 7-R_A03_031_2015-03-06.seq 8-F_H01_001_2015-03-06.seq 8-R_B03_029_2015-03-06.seq 1 0 0 9-F_A02_016_2015-03-06.seq 9-R_C03_027_2015-03-06.seq 0 0 > sapply(mT.2, length) 1-F_A01_015_2015-03-06.seq 1-R_C02_012_2015-03-06.seq 10-F_B02_014_2015-03-06.seq 0 0 0 10-R_D03_025_2015-03-06.seq 2-F_B01_013_2015-03-06.seq 2-R_D02_010_2015-03-06.seq 0 0 0 3-F_C01_011_2015-03-06.seq 3-R_E02_008_2015-03-06.seq 4-F_D01_009_2015-03-06.seq 0 0 0 4-R_F02_006_2015-03-06.seq 5-F_E01_007_2015-03-06.seq 5-R_G02_004_2015-03-06.seq 0 0 0 6-F_F01_005_2015-03-06.seq 6-R_H02_002_2015-03-06.seq 7-F_G01_003_2015-03-06.seq 0 1 0 7-R_A03_031_2015-03-06.seq 8-F_H01_001_2015-03-06.seq 8-R_B03_029_2015-03-06.seq 0 0 0 9-F_A02_016_2015-03-06.seq 9-R_C03_027_2015-03-06.seq 0 0 |

因为现在ensembl已经更新了human的注释库，所以当你直接使用

library("biomaRt") database <- listDatasets(useMart("ensembl")) database[database[,1]=='hsapiens_gene_ensembl',] |

dataset description version 32 hsapiens_gene_ensembl Homo sapiens genes (GRCh38) GRCh38 |

你会得到的是GRCh38（hg38）的注释库。

那么如何转换至hg19注释库呢？

使用

mart <- useMart("ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice") database <- listDatasets(mart) database[database[,1]=='hsapiens_gene_ensembl',] |

[……]

import:将命名空间载入

depends:将命名空间载入，并且放入搜索目录。

两者在使用上的区别是：import只载入对开发者有用的，而depends需要载入对用户有用的。比如说GenomicRanges，

如果你开发的包使用到了GRanges类，并且它会成为结果返回给用户，那么你就必须使用depends了，如果你只是用了GenomicRanges中的类或者函数，比如说GRang[……]

youtube上的，所以，需要翻墙。全英文…

去参加了BioC2014，在开发者会上讲了一下motifStack。

今天把PPT整理了一下，发上来宣传一下。

motifStack, A tool for graphic representation of multiple motifs