Overview
The cisRED database holds conserved sequence motifs identified by genome scale motif discovery, similarity,
clustering, co-occurrence and coexpression calculations. Sequence inputs include
low-coverage genome sequence
data and ENCODE data.
A Nucleic Acids Research
article
describes the system architecture; please use this publication to cite cisRED. PubMed publications that cite
cisRED are listed here.
cisRED makes three levels of information available for regulatory elements:
- 'Atomic' motifs: These are conserved, over-represented, sequence sets, typically 6 to 12 bp long, that have been discovered in a 'search region' sequence set.
- Groups of 'similar' motifs: These are identified either by a) annotating motifs with site sequences from TRANSFAC, JASPAR and ORegAnno databases (annotation-based groups), or by b) 'de novo' hierarchical clustering with the OPTICS algorithm ('de novo' groups).
- Patterns of motif group labels that co-occur in many search regions: These putative regulatory modules are ranked using genome-scale statistical and functional properties. Motifs in highly ranked patterns are likely the most reliable predictions.
In promoter-based cisRED databases, sequence search regions for motif discovery extend from 1.5 Kb
upstream to 200b downstream of a transcription start site, net of most types of repeats and of coding exons.
Many transcription factor binding sites are located in such regions. For each target gene's search region,
we use a base set of probabilistic ab initio discovery tools, in parallel, to
find over-represented atomic motifs. Discovery methods use comparative genomics with over 40
vertebrate input genomes.
In ChIP-seq-based cisRED databases, sequence search regions for motif discovery
correspond to significant peaks that represent genome-wide sites of protein-DNA binding.
Because such peaks occur in a wide range of genic and intergenic locations, ChIP-seq and promoter-based
databases are complementary. Currently, motif discovery for ChIP-seq data uses scan-based approaches
that make more explicit use of sets of sequences known to be functional transcription factor binding
sites, and that consider a wide range of levels of conservation. For the human STAT1 ChIP-seq database
search regions in the target species (human) was selected +/- 300 bp around the ChIP-seq peak maximum.
Repeats and coding regions were masked. Multiple sequence alignment were used to assemble orthologous input
sequences from other species.
You can access cisRED's data in three ways:
- view predicted regulatory elements directly in cisRED's web user interface. From this interface, motifs can be viewed 'live' in the UCSC or Ensembl genome browsers.
- download the data and SQL structure for each species' MySQL 4.x database, with a schema diagram and example SQL queries, from the Databases and Methods tab.
- query the databases directly with SQL at db.cisred.org. Queries can be driven from command line or graphical clients (e.g. the MySQL QueryBrowser), or programmatically from Perl, Python, Java, Ruby, etc. The username is 'anonymous' and the password should be left blank.
cisRED human motifs are available as a native data type at the Ensembl genome browser.
cisRED is an ongoing project. Updates will be released frequently.
Usage Notes
Filters and Cookies
cisRED manages your 'Filter' settings via a browser 'cookie'.
You must allow your web browser to accept cookies from cisred.org for your filter settings to take effect.
|
News
|
C. elegans v4 tables are now public |
August 25, 2008
|
| The new C. elegans database has been added to our public MySQL server. |
|
C. elegans v4 database |
July 18, 2008
|
| This version of the C. elegans cisRED database features 8 nematode genomes and 3847 highly conserved transcripts. |
|
New mouse v4 database |
September 26, 2007
|
| The v3.1 motif coordinates were 'lifted' to the NCBI m37 (mm9) assembly. The v4 motifs are compatible with (and will be available at) Ensembl 47. |
|
New human v9 database |
July 26, 2007
|
| The new human database offers regulatory modules predicted with sequence data from 41 vertebrate species.
|
|
New browse/search options for the mouse database |
June 20, 2007
|
| You can now browse/search modules using GO terms, and by chromosome. |
|
Browse by location |
June 20, 2007
|
| You can now find all search regions that overlap a genomic region defined by coordinates. |
|
Search region masks |
June 14, 2007
|
| Our UCSC genome browser images now show the search region masks. |
|
Improved search tools |
June 13, 2007
|
| You can now use IUPAC symbols in a motif search, or can search by cisRED
group ID or cisRED module ID. |
 |
| ORegAnno is an open access, LGPL/open source database that holds annotations for regulatory elements and polymorphisms. Its data are submitted and reviewed by a global user community. cisRED uses ORegAnno data for optimizing and assessing predictive performance, and for annotating computationally predicted DNA sequence motifs. |
|