July 20, 2006

Contextual Analysis of Gene Expression Data

As measurement of gene expression using microarrays has become a standard high throughput

method in molecular biology, the analysis of gene expression data is still a very active

area of research in bioinformatics and statistics. Despite some issues in quality and reproducibility

of microarray and derived data, they are

still considered as one of the most promising experimental techniques for the understanding

of complex molecular mechanisms.

This work approaches the problem of expression data analysis using contextual information.

While all analyses must be based on sound statistical data processing, it is also

important to include biological knowledge to arrive at biologically interpretable results.

After giving an introduction and some biological background, in chapter 2 some standard

methods for the analysis of microarray data including normalization, computation

of differentially expressed genes, and clustering are reviewed. The first source of context

information that is used to aid in the interpretation of the data, is functional annotation

of genes. Such information is often represented using ontologies such as gene ontology. GO annotations are provided by many gene and

protein databases and have been used to find functional groups that are significantly enriched

in differentially expressed, or otherwise conspicuous genes. In gene clustering approaches,

functional annotations have been used to find enriched functional classes within

each cluster. In chapter 3, a clustering method for the samples of an expression data set

is described that uses GO annotations during the clustering process in order to find functional

classes that imply a particularly strong separation of the samples. The resulting

clusters can be interpreted more easily in terms of GO classes. The clustering method was

developed in joint work with Henning Redestig.

More complex biological information that covers interactions between biological objects

is contained in networks. Such networks can be obtained from public databases of metabolic

pathways, signaling cascades, transcription factor binding sites, or high-throughput measurements

for the detection of protein-protein interactions such as yeast two hybrid experiments.

Furthermore, networks can be inferred using literature mining approaches or

network inference from expression data. The information contained in such networks is

very heterogenous with respect to the type, the quality and the completeness of the contained

data. ToPNet, a software tool for the interactive analysis of networks and gene

expression data has been developed in cooperation with Daniel Hanisch. The basic analysis and visualization methods as well as some important concepts

of this tool are described in chapter 4.

In order to access the heterogeneous data represented as networks with annotated experimental

data and functions, it is important to provide advanced querying functionality.

Pathway queries allow the formulation of

network templates that can include functional annotations as well as expression data. The

pathway search algorithm finds all instances of the template in a given network. In order

to do so, a special case of the well known subgraph isomorphism problem has to be

solved. Although the algorithm has exponential running time in the worst case, some implementation

tricks make it run fast enough for practical purposes. Often, a pathway query

has many matching instances, and it is important to assess the statistical significance of

the individual instances with respect to expression data or other criteria. In chapter 5

the pathway query language and the pathway search algorithm are described in detail and

some theoretical properties are derived. Furthermore, some scoring methods that have

been implemented are described. The possibility of combining different scoring schemes

for different parts of the query result in very flexible scoring capabilities.

In chapter 6, some applications of the methods are described, us

...more

View all episodes

By Ludwig-Maximilians-Universität München

11 ratings