# Invited talks

#### Alessandra Carbone

##### Laboratory of Computational and Quantitative Biology, Université Pierre et Marie Curie-CNRS

**How to find similarities in large datasets of sequences?**

The overwhelming number of genomic sequences (reaching today hundreds of billions of nucleotides) demands their classification based on the
similarities with sequences that we already know. Such classifications inform protein function and protein structure, and thus have strong
connections to our fundamental understanding of biological systems.

Because of the complex, and yet unknown, processes to which sequences have been submitted during evolution, the diversity of such
sequences became important and hardly detectable by current computational methods. Due to the large databases available, this problem became
a bottleneck in computational biology.

We discuss the power of a new computational strategy that explores existing data (coming from complete genomes or from environmental samples)
by creating a large number of probabilistic models of sequences, by exploiting learning and combinatorial optimisation and pointing to a solution
of the problem.

The consequences of this approach are multiple and we shall present the impact of the computational approach to functional metagenomic sequences
of the oceans and to the possibility to functionally characterize them.

#### Erik D. Demaine

##### MIT Computer Science and Artificial Intelligence Laboratory

**Replicators, transformers, and robot swarms: Science fiction through geometric algorithms**

Science fiction is a great inspiration for science. How can we build reconfigurable robots like Transformers or Terminator 2? How can we build
Star Trek-style replicators that duplicate or mass-produce a given shape at the nano scale? How can we orchestrate the motion of a large swarm of robots?
Recently we've been exploring possible answers to these questions through computational geometry, in the settings of reconfigurable robots
(both modular and folding robots that can become any possible shape), robot swarms (which may be so small and simple that they have no identity),
and self-assembly (building computers and replicators out of DNA tiles).

#### Kurt Mehlhorn

##### Max-Planck-Institut für Informatik

**New Results on Self-Organizing Binary Search Trees**

The dynamic optimality conjecture is perhaps the most fundamental open question about binary search trees (BST).
It postulates the existence of an asymptotically optimal online BST, i.e. one that is constant factor competitive with any BST on any input access sequence.
The two main candidates for dynamic optimality in the literature are splay trees [Sleator and Tarjan, 1985], and Greedy [Lucas, 1988; Munro, 2000;
Demaine et al. 2009]. Despite BSTs being among the simplest data structures in computer science, and despite extensive effort over the past three decades,
the conjecture remains elusive. Dynamic optimality is trivial for almost all sequences: the optimum access cost of most length-*n* sequences is
*Θ(n log n)*, achievable by any balanced BST.

Thus, the obvious missing step towards the conjecture is an understanding of the "easy" access sequences, and indeed the most fruitful research direction
so far has been the study of specific sequences, whose "easiness" is captured by a parameter of interest.
For instance, splay provably achieves the bound of *O(n d)* when *d* roughly measures the distances between consecutive accesses (dynamic finger),
the average entropy (static optimality), and the delays between multiple accesses of an element (working set). The difficulty of proving dynamic optimality
is witnessed by other highly restricted special cases that remain unresolved; one prominent example is the traversal conjecture [Sleator and Tarjan, 1985],
which states that preorder sequences (whose optimum is linear) are linear-time accessed by splay trees; no online BST is known to satisfy this conjecture.

We prove two different relaxations of the traversal conjecture for Greedy:
(i) Greedy with an arbitrary initial tree is almost linear for traversal conjecture.
(ii) Greedy with a fixed initial tree is in fact linear.
These statements are corollaries of our more general results that express the complexity of access sequences in terms of a pattern avoidance
parameter *k*. Pattern avoidance is a well-established concept in combinatorics, and the classes of input sequences thus defined are rich, e.g.
the *k=3* case includes preorder sequences. For any sequence *X* with parameter *k*, our most general result shows that Greedy achieves the
cost *n 2^{α(n)^{O(k^2)}}* where *α* is the inverse Ackermann function.
Furthermore, a broad subclass of parameter-*k* sequences has a natural combinatorial interpretation as *k*-decomposable sequences.
For this class of inputs, we obtain the *n 2^{O(k^2)}* bound for Greedy with a fixed initial tree. For *k=3*, these results imply (i) and (ii).

Further studying the intrinsic complexity of *k*-decomposable sequences, we make several observations.
First, in order to obtain an offline optimal BST, it is enough to bound Greedy on non-decomposable access sequences.
Furthermore, we show that the optimal cost for *k*-decomposable sequences is *Θ(n log k)*, which is well below the proven performance
of all known BST algorithms. Hence, sequences in this class can be seen as a "candidate counterexample" to dynamic optimality.

Joint work with Parinya Chalermsook, Mayank Goswami, Lazlo Kosma, and Thatchaphol Saranurak.