Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing
Gordon Robertson, Martin Hirst, Matthew Bainbridge, Misha Bilenky, Yongjun Zhao, Thomas Zeng, Ghia Euskirchen, Bridget Bernier, Richard Varhol, Allen Delaney, Nina Thiessen, Obi L. Griffith, Ann He, Marco Marra, Michael Snyder, and Steven Jones
Nature Methods. In press. (doi:10.1038/nMeth 1068)
We developed a method, ChIP-sequencing (ChIP-seq), combining chromatin immunoprecipitation (ChIP) and massively parallel sequencing to identify mammalian DNA sequences bound by transcription factors in vivo. We used ChIPseq to map STAT1 targets in interferon-γ (IFN-γ)-stimulated and unstimulated human HeLa S3 cells, and compared the method's performance to ChIP-PCR and to ChIP-chip for four chromosomes. By ChIP-seq, using 15.1 and 12.9 million uniquely mapped sequence reads, and an estimated false discovery rate of less than 0.001, we identified 41,582 and 11,004 putative STAT1-binding regions in stimulated and unstimulated cells, respectively. Of the 34 loci known to contain STAT1 interferon-responsive binding sites, ChIP-seq found 24 (71%). ChIP-seq targets were enriched in sequences similar to known STAT1 binding motifs. Comparisons with two ChIP-PCR data sets suggested that ChIP-seq sensitivity was between 70% and 92% and specificity was at least 95%.
cisRED: A database system for genome scale computational discovery of regulatory elements
Robertson A.G, Bilenky M, Lin K, He A, Yuen W, Dagpinar M, Varhol R, Teague K, Griffith O.L, Zhang X, Pan Y, Hassel M, Sleumer M.C, Pan W, Pleasance E.D, Chuang, M, Hao H, Li Y.Y, Robertson N, Fjell C, Li B, Montgomery S.B, Astakhova T, Zhou J, Sander J, Siddiqui A.S and Jones S.J.M
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D68-73. | View this article on PubMed.
We describe cisRED, a database for conserved regulatory elements that are identified and ranked by a genome-scale computational system (www.cisred.org). The database and high-throughput predictive pipeline are designed to address diverse target genomes in the context of rapidly evolving data resources and tools. Motifs are predicted in promoter regions using multiple discovery methods applied to sequence sets that include corresponding sequence regions from vertebrates. We estimate motif significance by applying discovery and post-processing methods to randomized sequence sets that are adaptively derived from target sequence sets, retain motifs with p-values below a threshold, and identify groups of similar motifs and co-occurring motif patterns. The database offers information on atomic motifs, motif groups and patterns. It is web-accessible, and can be queried directly, downloaded, or installed locally.
Sockeye: a 3D environment for comparative genomics
Montgomery SB, Astakhova T, Bilenky M, Birney E, Fu T, Hassel M, Melsopp C, Rak M, Robertson AG, Sleumer M, Siddiqui AS, Jones SJ.
Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization.
Assessment and integration of publicly available SAGE, cDNA microarray, and oligonucleotide microarray expression data for global coexpression analyses.
Griffith OL, Pleasance ED, Fulton DL, Oveisi M, Ester M, Siddiqui AS, Jones SJ.
Background: Large amounts of gene expression data from several different platforms are being made available to the scientific community and increasingly used as tools for validation and integration of other studies. Several studies have compared two or three platforms to evaluate the consistency of expression profiles for a single tissue or sample series but few have determined if these translate into reliable gene co-expression patterns across many conditions.
Results: We have analyzed Homo sapiens data from 1202 cDNA microarray experiments, 242 SAGE libraries and 667 Affymetrix oligonucleotide microarray experiments. Using standard co-expression analysis methods, we have assessed each platform for internal consistency, performed inter-platform comparisons, and tested each platform's predictions against the Gene Ontology. An overall correlation of correlations (rc) analysis showed that the platforms agree significantly better than random (p<0.001, 1000 randomizations) but with very low correlations of rc < 0.102. A rank analysis also showed significant but poor agreement with only 3-8% better performance than randomized data. Comparison against the Gene Ontology (GO) revealed that all three platforms identify more co-expressed gene pairs with common biological processes than random data and as the Pearson correlation for a gene pair increased it was more likely to be confirmed by GO.
Conclusions: The three datasets compared demonstrate significant but low levels of global concordance. When evaluated for biological relevance, the Affymetrix dataset performed best with gene pairs of correlation 0.9-1.0 confirmed by GO in 74% of cases. However, our results suggest that all three datasets may provide some biologically relevant predictions of co-expression. Researchers are cautioned against using any one dataset exclusively for their analyses.
An application of peer-to-peer technology to the discovery, use and assessment of bioinformatics programs.
Montgomery SB, Fu T, Guan J, Lin K, Jones SJ.
We have created an open-source peer-to-peer system for bioinformatics analysis. Our system enables researchers across the globe to freely access state-of-the-art algorithms and computational resources. Algorithms are found and new jobs are submitted to remote servers through either BioPerl scripting or a sophisticated Java-based user interface. Furthermore, this system has been designed to provide support to applications that require access to a diverse range of bioinformatics functionality.
Currently, over 20 algorithms are accessible via this network at multiple locations. Each node's ability to advertise and upgrade new services ensures that users of our system are accessing the most current versions of algorithms (in a few cases directly from the authors). Additionally, each node can customize the methods of annotation and sequence retrieval for its clients; typically, we use EnsEMBL1 for sequence retrieval.
We hypothesize that the peer-to-peer approach can facilitate improved communication between the biologists who want to use bioinformatics tools and the authors of such techniques themselves.