Multiple method motif discovery
Three motif discovery programs are currently used in the predictive pipeline: CONSENSUS [1], MEME [2] and MotifSampler [3]. For all programs we include both strands as separate sequences.
CONSENSUS
We run the discovery program for four different motif widths: 6,8,10,12 bp and use the following motif occurrence models:
OOPS - One Occurrence Per Sequence
OMOPS - One or More Occurrence Per Sequence
ZMOPS - Zero or More Occurrence Per Sequence
MEME
The motif widths range is 6-12 bp. Three different modes are used:
OOPS - One Occurrence Per Sequence
ZOOPS - Zero or One Occurrence Per Sequence
TCM - Two-Component Mixture; each sequence may contain any number of non-overlapping occurrences of each motif
MotifSampler
The program is run for widths: 6,8,10,12 bp The prior probability of finding one instance of a motif was set to p=0.3. 30 iterations of the program were run and 20 distinct motifs per iteration were reported.
References:
1. Hertz,G.Z. and Stormo,G.D. (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563-577.
View this article on PubMed.
2. Bailey,T.L. and Elkan,C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2, 28-36.
View this article on PubMed.
3. Thijs,G., Marchal,K., Lescot,M., Rombauts,S., De Moor,B., Rouze,P. and Moreau,Y. (2002) A Gibbs Sampling method to detect over-represented motifs in upstream regions of coexpressed genes, J. Comp. Biol. (special issue Recomb'2001) 9, 447-464.
View this article on PubMed.