Dr. Luke Sheneman



Academic Interests

Broadly, I am interested in developing novel algorithms that estimate the evolutionary history and relationships of biological, molecular sequences. This has applications in disease diagnoses, drug discovery, evolutionary studies, and phylogeography.

In particular, I am interested in the problems of multiple sequence alignment (MSA) and phylogenetic inferencing. In addition to proposing and implementing novel approaches to these problems, I explore the limitations and biases of existing techniques and provide alternatives which mitigate those limitations. My work is largely based on empirical simulation studies.

Currently, I am working on answering these and related questions:

  • What is the relationship of guide tree quality to alignment quality?
    • Do true phylogenies make the best guide trees?
    • If so, when and why?
  • How important is the guide tree as compared to other parameters in progressive MSA?
  • Do progressive MSA algorithms that iteratively refine guide trees through incremental topological transformations work and why?
    • Are they worth the computational time and effort?
    • Under what conditions?
  • Can Evolutionary Computation/Genetic Algorithms (GAs) efficiently and effectively evolve guide trees which in turn result in highly accurate sequence alignments?
    • How effective are tree-based GA recombination operators?
      • How can we measure schema disruption for GA recombination operators?
  • Pairwise profile alignment using Dynamic Programming (DP) can introduce systematic bias by resolving ties in an arbitrary and deterministic way.
    • Explore the one-to-many relationship between guide tree and alignments produced from that alignment due to arithmetic DP ties.
    • Deterministic resolution of DP ties commonly results in sub-optimal alignments with respect to the inferred guide tree.
    • Develop an importance sampling strategy for selecting better alignments from distribution of possible alignments derived from a single guide tree.


Recent Publications
Clearcut: a fast implementation of relaxed neighbor joining.   Bioinformatics 2006; doi: 10.1093/bioinformatics/btl478

Sheneman, L., JA. Foster (2006) Estimating the Destructiveness of Crossover on Binary Tree Representations, Graduate student workshop in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '06), Seattle WA., July 8th - July 13th, 2006

Evans, J., L. Sheneman, and J.A. Foster (2006) Relaxed Neighbor-Joining: A Fast Distance-Based Phylogenetic Tree Construction Method, J. Mol. Evol, 62:785-792

Sheneman, L., J.A. Foster (2004) Evolving Better Multiple Sequence Alignments, In the Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2004), Seattle, WA.

Sheneman, L., J.A. Foster (2004) Evolving Better Alignments, One page poster abstract In the Proceedings of the Pacific Symposium on Biocomputing (PSB 2004), Fairmont Orchid, Big Island of Hawaii, Jan 6th - 10th, 2004

Shyu, C., L. Sheneman, J.A. Foster (2003) Multiple Sequence Alignment with Evolutionary Computation, Genetic Programming and Evolvable Machines special issue on Biological Applications of Evolutionary Computation.



EVALYN -- EVolved ALYNments

Download here (C source code)

EVALYN is a genetic algorithm that performs progressive multiple sequence alignment. EVALYN maintains a steady-state population of guide trees. The fitness of every guide tree is determined by the sum-of-pairs score of the alignment which results from aligning the input sequences in the order dictated by the guide tree. The theory behind EVALYN is that through recombining important guide tree features (building blocks), better guide trees can be evolved, ultimately resulting in better alignments. EVALYN is capable of performing on par with other progressive alignment techniques, as measured by benchmarks such as BAliBASE.


CLEARCUT -- Relaxed Neighbor Joining Implementation

Download here (C source code)

Clearcut is the reference implementation for the Relaxed Neighbor Joining (RNJ) algorithm. RNJ is a variation on the tradtitional neighbor joining (NJ) distance-based approach to tree reconstruction. RNJ opportunistically joins nodes without having to search for a global minimum between every join. As with traditional NJ, RNJ will always reconstruct the true tree if the distance matrix is additive. RNJ has a typical asymptotic runtime complexity of O(n^2logn), while traditional neighbor joining is O(n^3). Results for non-additive inputs are nearly indistinguishable between the two algorithms.


Work Experience

I have 7 years of experience in the software and internet industries in various technical and management roles.
My resume can be found here.



Copyright (C) 2006 Luke Sheneman