BAMBE
Bayesian Analysis in Molecular Biology and Evolution
Version 2.03 beta, January 2001
© Copyright 2000, 2001, Donald Simon & Bret Larget,
Department of Mathematics and Computer Science, Duquesne University.
There is no guarantee that the algorithms here
will correctly sample from the desired posterior distributions
in a finite run.
While theory says that the long-run frequencies
will converge to the desired posterior probabilities,
inferences from insufficiently long runs can be biased
if the chain has not moved sufficiently far from the initial state
and converged to near stationarity.
Obtaining consistent results from several different starting points
with different random seeds is a minimum criterion to have confidence
that the numerical results provided by BAMBE
are close to their analytically determined values.
We generally follow something like this procedure in analyzing new data sets.
- Categorize the sites in a biologically relevant way.
- Complete one short run with global during burn in and local in the main loop,
updating parameters every cycle, running long enough to reach approximate stationarity.
- Graph the .lpd file.
(The free software Gnuplot or packages such as S-PLUS or MatLab are good choices.)
- If the graph does not finish in a plateau, the run was too short.
- Note the approximate level of the final plateau.
- Look at the .out file.
- Verify that the number of unique sites, number of constant sites,
and nucleotide base distributions are reasonable.
- Look at differences in number of constant sites
and nucleotide base distributions by category.
- Look at the .par file.
- Adjust the initial parameter values in the rc file
so they are approximately equal to final values achieved by the simulation.
- Note the tree and parameter acceptance rates.
It is good if they are between about 15% and 40%.
Tuning parameters may require adjustment,
but we will fine tune this later.
- Beginning at the last tree from initial run,
make several short runs (
burn=0
, cycles=1000
)
with update-kappa=true
and other parameter updating turned off.
Adjust kappa-delta
until the parameter acceptance rate is
between about 50% and 70%.
Values from 0.05 to 0.2 are usually acceptable.
- Beginning at the last tree from initial run,
make several short runs (
burn=0
, cycles=1000
)
with update-theta=true
and other parameter updating turned off.
Adjust theta-const
until the parameter acceptance rate is
between about 50% and 70%.
- Beginning at the last tree from initial run,
make several short runs (
burn=0
, cycles=1000
)
with update-pi=true
and other parameter updating turned off.
Adjust pi-const
until the parameter acceptance rate is
between about 50% and 70%.
Values in the thousands are usually acceptable.
- Beginning at the last tree from initial run,
make one slightly longer run (
burn=0
, cycles=5000
)
with all parameter updating turned on.
The parameter acceptance rate should be between 15% and 40%.
You may need to go back and tinker with tuning parameters.
- With the run control file up to date with tuning parameters and initial values,
run several (three or four) runs from random starts for several thousand cycles
(maybe
burn=5000
and cycles=50000
).
- Plot all .lpd files to make sure the plateaus are in about the same place.
Determine a common number of cycles to discard for all runs
so that each run is well into its plateau.
- Run
meansd
on the tail of the .par files to check that the plateau's
are actually similar and to readjust initial values for parameters.
- Run
summarize
on the different .top files (discarding many initial sample points)
to check if the same tree topologies are being selected at roughly the same
posterior probabilities.
- Examine the .sum files after running
summarize
to see if named clades are reasonably defined.
Resummarize if necessary.
Look at transition tables between subtree topologies to see if mixing is similar
in different runs and adequate.
- If at this point, if you find that the runs are converging to very different places,
you may wish to start at a nonrandom tree found by another method.
It may be that the algorithms in BAMBE are inadequate for your data.
- If all seems in order, it is time to do several (three or four) longer runs
to save for inference.
The sample rate should be as small as possible.
The ultimate size of the files you produce should be the limiting factor.
We often produce files with 200,000 tree topologies, and there is no reason
not to make them bigger.
They may be compressed later or discarded after the essential summarization has occurred.
The runs should be long enough to achieve the accuracy you desire.
You can judge time requirements from the previous exploratory runs.
Back to the table of contents.
This page was most recently updated on January 19, 2001.
bambe@mathcs.duq.edu