如何从SRA文件中分离出从对短序paired-end reads

很多时候我们从NCBI的SRA文档中分离paired-end sequencing数据。但是当我们使用SRA toolkit的fastq-dump工具时,往往只能得到一个文件,而不是两个文件。如何才能将这个文件分离成两个或者更多的文件呢?需要我们自己写代码吗?

答案是不一定。首先我们可以试试使用fastq-dump的–split-3参数。如果它不行,再自己考虑写代码。对于–split-3[……]

Read more

microRNA deep-sequencing数据分析手段

什么是microRNA呢?MicroRNAs(miRNAs)是在真核生物中长约20-22个碱基的对基因表达起负调控作用的RNA分子。它对于发育,细胞分化,分形,凋亡,染色体结构及病毒抗性都起着极其重要的作用。

最近也有研究表明在原核生物及病毒中也发现了类似机制的RNA。

microRNA是高度保守的。受这些高度保守的RNAs调控的基因比例非常高,约占基因组中1-5%的预测基因,10%([……]

Read more

如何在CentOS6下安装R(how to install R in CentOS6.2)

首先安装一个源,EPEL(Extra Packages for Enterprise Linux),这个是杜克大学的爱好者们在维护的一个fedora安装源。其中就有R。我们找到自己需要的版本安装就可以了,比如我用的就是6/x86_64

rpm -Uvh http://archive.linux.duke.edu/pub/epel//6/x86_64/epel-release-6-8.noarch.rpm
yum install R

edgeR案例学习:deepSAGE分析

接到学习任务,要求搞清楚两个R包,edgeR和REDseq。这一篇就主要讲述edgeR中的案例之一,对应edgeR包中说明文件的案例12

要了解更多关于SAGE(Serial Analysis of Gene Expression),请阅读DeepSAGE—digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples

安装及调用edgeR

> source("http://bioconductor.org/biocLite.R")
BiocInstaller version 1.4.3, ?biocLite for help
> biocLite("edgeR")
BioC_mirror: http://bioconductor.org
Using R version 2.15, BiocInstaller version 1.4.3.
Installing package(s) 'edgeR'
trying URL 'http://www.bioconductor.org/packages/2.10/bioc/bin/macosx/leopard/contrib/2.15/edgeR_2.6.0.tgz'
Content type 'application/x-gzip' length 1558515 bytes (1.5 Mb)
opened URL
==================================================
downloaded 1.5 Mb
 
 
The downloaded binary packages are in
	/var/folders/Dj/Dj+bWjS7HxiNJ0kYFeKdTE+++TI/-Tmp-//RtmpcteSIZ/downloaded_packages
> library(edgeR)
Loading required package: limma

我们需要做准备一个名为targets.txt的文本文件,用于对样品的说明。接下来的工作就是读取数据。

> library(GEOquery)
Loading required package: Biobase
Loading required package: BiocGenerics
Welcome to Bioconductor
 
    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.
 
Setting options('download.file.method.GEOquery'='auto')
> gset<-getGEO("GSE10782",GSEMatrix = F)
File stored at: 
/var/folders/Dj/Dj+bWjS7HxiNJ0kYFeKdTE+++TI/-Tmp-//RtmpBoLGWD/GSE10782.soft.gz
Parsing....
Found 9 entities...
GPL9185 (1 of 9 entities)
GSM272105 (2 of 9 entities)
GSM272106 (3 of 9 entities)
GSM272318 (4 of 9 entities)
GSM272319 (5 of 9 entities)
GSM272320 (6 of 9 entities)
GSM272321 (7 of 9 entities)
GSM272322 (8 of 9 entities)
GSM272323 (9 of 9 entities)
There were 50 or more warnings (use warnings() to see the first 50)
> slotNames(gset)
[1] "header" "gsms"   "gpls"  
> x<-slot(gset,"gsms")
> setwd("~/Documents/shRNA/edgeR")
> for(i in 1:length(x)){
+	name<-Accession(x[[i]]);
+	data<-Table(dataTable(x[[i]]));
+	write.table(data,paste(name,".txt",sep=""),quote=F,sep="\t",row.names=F)
+ }
> targets<-read.delim("targets.txt",stringsAsFactors=F)
> targets
          files group                          description
1 GSM272105.txt  DCLK transgenic (Dclk1) mouse hippocampus
2 GSM272106.txt    WT          wild-type mouse hippocampus
3 GSM272318.txt  DCLK transgenic (Dclk1) mouse hippocampus
4 GSM272319.txt    WT          wild-type mouse hippocampus
5 GSM272320.txt  DCLK transgenic (Dclk1) mouse hippocampus
6 GSM272321.txt    WT          wild-type mouse hippocampus
7 GSM272322.txt  DCLK transgenic (Dclk1) mouse hippocampus
8 GSM272323.txt    WT          wild-type mouse hippocampus
> d<-readDGE(targets)
> colnames(d)
[1] "1" "2" "3" "4" "5" "6" "7" "8"
> colnames(d)<-c("DCLK1","WT1","DCLK2","WT2","DCLK3","WT3","DCLK4","WT4")

UTF8_E[……]

Read more

R安装遇到的两个小问题

1, 出现Error in readRDS(file) : error reading from the connection错误。
出现的原因,安装过程中死机,导致安装不完整。
解决办法,
在R当中

> .libPaths()
[1] "/usr/local/lib64/R/library"

进入/usr/local/lib64/R/library下删除最新安装的所有library。将BiocInsta[……]

Read more

Circos系列教程(三)突出标记Highlight

这一节的目标是画出下面的图

亮显强调
亮显强调

所谓突出标记,或者说亮显强调,多是通过大的反差明显或者符合色彩心理学的色块来将数据分组强调出来。在使用circos绘制基因组时,可以使用这一办法,将不同区域同一组内的基因亮显出来。[……]

Read more