Motif discovery and post-processing for promoter regions of 18,779 protein-coding target genes (Ensembl Build 40, NBCI v36, hg18) returned 236,227 conserved DNA sequence motifs for 18,676 genes, after applying an empirical discovery p-value threshold of 0.1.

Motifs were discovered in sequence sets consisting of regions around the transcription start site (TSS) of a single, canonical transcript for each gene, and corresponding regions from other species. Input sequence sets were assembled from genome sequence data for 41 vertebrate species whose genome data were taken from Ensembl, ENCODE and low coverage read files. Input sequence sets had a median of 20 and a mode or 22 vertebrate species. Search regions were -1.5Kb/+200b relative to a TSS, net of most types of repeats and of coding sequences, which were masked. For v9.0, 'annotation-based' modules, i.e. co-occurring patterns of motifs that were similar to known transcription factor binding site models in TRANSFAC, JASPAR and ORegAnno were added.