content/databases_methods/method/blocks/discovery

Multiple method motif discovery

Three motif discovery programs are currently used in the predictive pipeline: CONSENSUS [1], MEME [2] and MotifSampler [3]. For all programs we include both strands as separate sequences.

CONSENSUS

We run the discovery program for four different motif widths: 6,8,10,12 bp and use the following motif occurrence models:

OOPS - One Occurrence Per Sequence

OMOPS - One or More Occurrence Per Sequence

ZMOPS - Zero or More Occurrence Per Sequence

MEME

The motif widths range is 6-12 bp. Three different modes are used:

OOPS - One Occurrence Per Sequence

ZOOPS - Zero or One Occurrence Per Sequence

TCM - Two-Component Mixture; each sequence may contain any number of non-overlapping occurrences of each motif

MotifSampler

The program is run for widths: 6,8,10,12 bp The prior probability of finding one instance of a motif was set to p=0.3. 30 iterations of the program were run and 20 distinct motifs per iteration were reported.

References:

1. Hertz,G.Z. and Stormo,G.D. (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563-577.

View this article on PubMed.

2. Bailey,T.L. and Elkan,C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2, 28-36.

View this article on PubMed.

3. Thijs,G., Marchal,K., Lescot,M., Rombauts,S., De Moor,B., Rouze,P. and Moreau,Y. (2002) A Gibbs Sampling method to detect over-represented motifs in upstream regions of coexpressed genes, J. Comp. Biol. (special issue Recomb'2001) 9, 447-464.

View this article on PubMed.