Circos系列教程(二)染色体示意图ideograms

本节的目标就是画出如下的图

circos绘制简单的ideogram

基础:circos作业流程

circos流程图

定义:
The symbolic representation of chromosomes are called ideograms.

circos为了能准确地画出染色体示意图,染色体的定义,位置,大小,以及显示的形式都是circos需要考虑的。这些要素需要在数据文件当中定义出来。[……]

Read more

安装Rgraphviz遇到的怪异问题(已解决)

在安装了graphviz之后,在R当中安装Rgraphviz。命令:

[ouj@qiuworld ~]$ sudo R CMD INSTALL Rgraphviz_1.30.1.tar.gz 
[sudo] password for ouj: 
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables... 
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for stdbool.h that conforms to C99... yes
checking for _Bool... yes
checking for whether compiler has bool... yes
configure: No --with-graphviz option was specified. Trying to find Graphviz using other methods.
checking for pkg-config... /usr/bin/pkg-config
Package libgvc was not found in the pkg-config search path.
Perhaps you should add the directory containing `libgvc.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libgvc' found
configure: pkg-config was not able to find the Graphviz library libgvc. This either indicates that Graphviz is old or that something is wrong. Verify Graphviz is installed and that PKG_CONFIG_PATH is correct.
checking for dotneato-config... no
configure: dotneato-config not found in PATH.
configure: Using default directory /usr/local, consider specifiying --with-graphviz
configure: Found Graphviz version '2.28.0'.
configure: Graphviz major version is '2' and minor version is '28'.
configure: Using the following compilation and linking flags for Rgraphviz
configure:    PKG_CPPFLAGS=-I/usr/local/include/graphviz
configure:    PKG_LIBS=-L/usr/local/lib/graphviz -L/usr/local/lib -lgvc
configure:    GVIZ_DEFS= -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28 
configure: Setting Graphviz Build version to '2.28.0'.
configure: creating ./config.status
config.status: creating R/graphviz_build_version.R
config.status: creating src/Makevars
** libs
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c LL_funcs.c -o LL_funcs.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c Rgraphviz.c -o Rgraphviz.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c RgraphvizInit.c -o RgraphvizInit.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c agopen.c -o agopen.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c agread.c -o agread.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c agwrite.c -o agwrite.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c bezier.c -o bezier.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c buildEdgeList.c -o buildEdgeList.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c buildNodeList.c -o buildNodeList.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c doLayout.c -o doLayout.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c graphvizVersion.c -o graphvizVersion.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c init.c -o init.o
gcc -std=gnu99 -shared -L/usr/local/lib64 -o Rgraphviz.so LL_funcs.o Rgraphviz.o RgraphvizInit.o agopen.o agread.o agwrite.o bezier.o buildEdgeList.o buildNodeList.o doLayout.o graphvizVersion.o init.o -L/usr/local/lib/graphviz -L/usr/local/lib -lgvc
installing to /usr/local/lib64/R/library/Rgraphviz/libs
** R
** inst
** preparing package for lazy loading
Creating a new generic function for "head" in "Rgraphviz"
Creating a new generic function for "tail" in "Rgraphviz"
Creating a new generic function for "lines" in "Rgraphviz"
Creating a new generic function for "plot" in "Rgraphviz"
** help
*** installing help indices
** building package indices ...
** testing if installed package can be loaded
Error : .onLoad failed in loadNamespace() for 'Rgraphviz', details:
  call: value[[3L]](cond)
  error: unable to load shared object '/usr/local/lib64/R/library/Rgraphviz/libs/Rgraphviz.so':
  libgvc.so.6: cannot open shared object file: No such file or directory
 
  Check that (1) graphviz is installed on your system; (2) the
  installed version of graphviz matches '2.28.0'; this is the version
  used to build this Rgraphviz package; (3) graphviz is accessible to
  R, e.g., the path to the graphviz 'bin' directory is in the system
  'PATH' variable.  See additional instructions in the 'README' file of
  the Rgraphviz 'source' distribution, available at
 
  http://bioconductor.org/packages/release/bioc/html/Rgraphviz.html
 
  Ask further questions on the Bioconductor mailing list
 
  http://bioconductor.org/docs/mailList.html
 
 
Error: loading failed
Execution halted
ERROR: loading failed

不知道如何解决。在网上搜索了两天之后,并在邮件组里分问,分析认为是动态链接库的问题。解压原代码安装包,测试:

[ouj@qiuworld ~]$ R CMD ldd /usr/local/lib/libgvc.so.6
	linux-vdso.so.1 =>  (0x00007fff173fc000)
	libxdot.so.4 => /usr/local/lib/libxdot.so.4 (0x00002b5c100e0000)
	libgraph.so.5 => /usr/local/lib/libgraph.so.5 (0x00002b5c102e4000)
	libcdt.so.5 => /usr/local/lib/libcdt.so.5 (0x00002b5c104f0000)
	libpathplan.so.4 => /usr/local/lib/libpathplan.so.4 (0x00002b5c106f5000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00002b5c10914000)
	libexpat.so.0 => /lib64/libexpat.so.0 (0x00002b5c10b18000)
	libz.so.1 => /usr/lib64/libz.so.1 (0x00002b5c10d3a000)
	libm.so.6 => /lib64/libm.so.6 (0x00002b5c10f4f000)
	libc.so.6 => /lib64/libc.so.6 (0x00002b5c111d2000)
	/lib64/ld-linux-x86-64.so.2 (0x000000349cc00000)
[ouj@qiuworld ~]$ sudo R CMD ldd Rgraphviz/src/Rgraphviz.so
[sudo] password for ouj: 
	linux-vdso.so.1 =>  (0x00007fff58b40000)
	libgvc.so.6 => not found
	libc.so.6 => /lib64/libc.so.6 (0x00002aacd51ae000)
	/lib64/ld-linux-x86-64.so.2 (0x000000349cc00000)

果然。可是没有办法使用export 为sudo 输出共亨链接库,只好先使用

[ouj@qiuworld ~]$ sudo LD_LIBRARY_PATH=/usr/lib:/usr/local/lib R CMD INSTALL Rgraphviz_1.30.1.tar.gz
...

先安装上。接下来的问题是如何将LD_LIBRARY_PATH=LD_LIBRARY_PATH:/usr/local/lib写给每个用户,并在每次登录的时候都可以正常使用。否则在使用Rgraphviz包时,都需要LD_LIBRARY_PATH=/usr/lib:/usr/local/lib R来启动R。
[……]

Read more

从RNA-seq结果到差异表达

翻译自:From RNA-seq reads to differential expression, Oshlack et al. Genome Biology 2010, 11:220

高通量测序技术,也就是下一代测序技术已经成为现代生物学研究的一个较为常规的实验手段了。这一技术的发展极大地推动了基因组学,表观基因组学以及翻译组学的研究。RNA-seq通过测定稳定状态下的RNA样品的序列来对RNA样品进行研究,从而避免了许多之前研究手段的不足,比如象基因芯片或者PCR就需要背景知识。而且RNA-seq还可以触及以前无法研究的领域,比如复杂结构的转录体。RNA-seq可以应用于以下几个方面的研究,1. SNPs;2. novel transcripts;3. alternative splicing;4. RNA editing。无论如何,使用RNA-seq最多的还是比较两组样品基因水平表达差异,比如野生型与突变型,用药组与对照组,不同组织之间,癌细胞与正常细胞,等等。我们把这种基因水平差异表达,简称为DE (differential expression,注,不是ED啊〜〜〜)。

常用的RNA-seq操作平台有Illumina GA/ HiSeq, SOLiD 还有Roche 454。它们都是提取RNA后,纯化,打碎,逆转录成cDNA,然后测序。测序的结果被称为short reads,短序。通常一个短序的长度为25-300bp之间。如果测序只测一端可能会带来比对时的困难,于是这些操作平台提供了两端都测的办法,这样的结果成对出现,中间有一定的间隔,但是因为测序长度一下子提高了一倍,所以比对会精准很多。人们把这种测序结果称为’paired-end’ reads,成对短序。一般来讲,测序结果会直接转换成一行一行的由字母组成的短序列,可能是fasta,fastq等等不同格式。

然而,这一技术产生的海量数据分析却给生物学家带来了难题。一个测序的结果文件少则几Gb,多则几十Gb,单独对比拼接,就会用去几个小时,而后再得出差异表达的结果,其耗时耗力,并非实验生物学家可以应付得了的。于是生物信息学的研究人员努力做出一些软件,以降低结果分析的难度。但是,即使这样,还是必须对分析过程有个较为细致地了解,才能正确地使用这些软件,从而得到比较接近事实的结果。

一般的来讲,RNA-seq后DE的工作流程是这样的(图1),首先,将短序映射到基因组相应的位置上去,其次,对映射的结果进行基因水平,外显子水平,以及转录水平的拼接,而后对结果进行数据统计,标准化之后生成表达水平报告文件,最后由生物学者依据系统生物学相关知识,来对数据结果进行分析。

RNA-seq分析工作流程
RNA-seq分析工作流程

不同步骤涉汲的软件和方法:

分析步骤 方法 软件
mapping General aligner GMAP/GSNAP
BFAST
BOWTIE
CloudBurst
GNUmap
MAQ/BWA
PerM
RzaerS
Mrfast/mrsfast
SOAP/SOAP2
SHRiMP
De novo annotator QPALMA/GenomeMapper/PALMapper
SpliceMap
SOAPals
G-Mo.R-Se
TopHat
SplitSeek
De novo transcript assembler Qases
MIRA
Summarization Isoform-based Cufflinks
ALEXA-seq
Gene-based Count exons only
Exon junction libraries
Normalization library size
RPKM: reads per kilobase of exon model per million mapped reads ERANGE
TMM: trimmed mean of M-values edgeR
Upper quartile Myrna
Differential expression Poisson GLM (generalized linear model) DEGseq
Myrna
Negative binomial edgeR
DESeq
baySeq
Systems biology Gene Ontology analysis GOseq

[……]

Read more

MACS(Model-based Analysis of ChIP-Seq)使用说明

在使用Bowtie比对于完Chip-Seq的结果后,就需要用到MACS或者ERANGE来找出峰所在的位置了。但是由于ERANGE的设置比较复杂,所以最为流行的还是MACS。

我们首先来了解一下MACS的工作流程(http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2592715/),否则的话,许多参数都无法理解。

Read more

序列拼接工具Bowtie使用说明

Bowtie是一个超级快速的,较为节省内存的短序列拼接至模板基因组的工具。它在拼接35碱基长度的序列时,可以达到每小时2.5亿次的拼接速度。

Bowtie并不是一个简单的拼接工具,它不同于Blast等。它适合的工作是将小序列比对至大基因组上去。它最长能读取1024个碱基的片段。换言之,bowtie非常适合下一代测序技术。

在使用bowtie前,需要使用bowtie-build来构建比对模板。如果你需要比对是比较常见的基因组的话,你可以去http://bowtie-bio.sourceforge.net/manual.shtml下载你所需要的Pre-built indexes文件就可以了。

如前所述,bowtie适合于将短序列拼接至大的模板上,尤其是基因组。模板最小尺寸不能小于1024碱基,而短序列最长而不能超过1024碱基。Bowtie设计思路是,1)短序列在基因组上至少有一处最适匹配, 2)大部分的短序列的质量是比较高,3)短序列在基因组上最适匹配的位置最好只有一处。这些标准基本上和RNA-seq, ChIP-seq以及其它一些正在兴起的测序技术或者再测序技术的要求一致。

[……]

Read more

100种经典英文字体几个中文字体

关于字体的选择,其实是一件需要积累的工作。如果说你在一天之内就想找到合适自己的字体,比较困难。对于中文比较流行而又好看的字体,华康有一套比较有意思,其中比较经典的就是华康娃娃体,少女体,童童体;金梅一有套也很有意思,主要是它的印篆;方正的铁筋隶;微软件雅黑;个人字体当中喜欢毛泽东体和徐静蕾体。

中文字体
我喜欢的中文字体

但是,在更多的情况下,我们用到的是英文字体。但是我们却对英文字体并不是很了解。今天,看到一个人写下一百种经典的英文字体的名单,于是我就把它记录下来,以方便以后查看。

他们是:

[……]

Read more