May 10, 2013

Context based bioinformatics

The goal of bioinformatics is to develop innovative and practical methods and algorithms for bio-

logical questions. In many cases, these questions are driven by new biotechnological techniques,

especially by genome and cell wide high throughput experiment studies.

In principle there are two approaches:

1. Reduction and abstraction of the question to a clearly deﬁned optimization problem, which

can be solved with appropriate and efﬁcient algorithms.

2. Development of context based methods, incorporating as much contextual knowledge as

possible in the algorithms, and derivation of practical solutions for relevant biological ques-

tions on the high-throughput data. These methods can be often supported by appropriate

software tools and visualizations, allowing for interactive evaluation of the results by ex-

perts.

Context based methods are often much more complex and require more involved algorithmic

techniques to get practical relevant and efﬁcient solutions for real world problems, as in many

cases already the simpliﬁed abstraction of problems result in NP-hard problem instances. In

many cases, to solve these complex problems, one needs to employ efﬁcient data structures and

heuristic search methods to solve clearly deﬁned sub-problems using efﬁcient (polynomial) op-

timization (such as dynamic programming, greedy, path- or tree-algorithms).

In this thesis, we present new methods and analyses addressing open questions of bioinformatics

from different contexts by incorporating the corresponding contextual knowledge.

The two main contexts in this thesis are the protein structure similarity context (Part I) and net-

work based interpretation of high-throughput data (Part II).

For the protein structure similarity context Part I we analyze the consistency of gold standard

structure classiﬁcation systems and derive a consistent benchmark set usable for different ap-

plications. We introduce two methods (Vorolign, PPM) for the protein structure similarity recog-

nition problem, based on different features of the structures.

Derived from the idea and results of Vorolign, we introduce the concept of contact neighbor-

hood potential, aiming to improve the results of protein fold recognition and threading.

For the re-scoring problem of predicted structure models we introduce the method Vorescore,

clearly improving the fold-recognition performance, and enabling the evaluation of the contact

neighborhood potential for structure prediction methods in general.

We introduce a contact consistent Vorolign variant ccVorolign further improving the structure

based fold recognition performance, and enabling direct optimization of the neighborhood po-

tential in the future. Due to the enforcement of contact-consistence, the ccVorolign method has

much higher computational complexity than the polynomial Vorolign method - the cost of com-

puting interpretable and consistent alignments.

Finally, we introduce a novel structural alignment method (PPM) enabling the explicit modeling

and handling of phenotypic plasticity in protein structures. We employ PPM for the analysis of

effects of alternative splicing on protein structures. With the help of PPM we test the hypothesis,

whether splice isoforms of the same protein can lead to protein structures with different folds

(fold transitions).

In Part II of the thesis we present methods generating and using context information for the

interpretation of high-throughput experiments.

For the generation of context information of molecular regulations we introduce novel textmin-

ing approaches extracting relations automatically from scientiﬁc publications.

In addition to the fast NER (named entity recognition) method (syngrep) we also present a novel,

fully ontology-based context-sensitive method (SynTree) allowing for the context-speciﬁc dis-

ambiguation of ambiguous synonyms and resulting in much better identiﬁcation performance.

This context information is important f

...more

View all episodes

By Ludwig-Maximilians-Universität München

11 ratings