Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02

Graph Kernels


Listen Later

As new graph structured data is constantly being generated, learning and data mining on graphs have become a challenge in application areas such as molecular biology, telecommunications, chemoinformatics, and social network analysis. The central algorithmic problem in these areas, measuring similarity of graphs, has therefore received extensive attention in the recent past. Unfortunately, existing approaches are slow, lacking in expressivity, or hard to parameterize.
Graph kernels have recently been proposed as a theoretically sound and promising approach to the problem of graph comparison. Their attractivity stems from the fact that by defining a kernel on graphs, a whole family of data mining and machine learning algorithms becomes applicable to graphs.
These kernels on graphs must respect both the information represented by the topology and the node and edge labels of the graphs, while being efficient to compute. Existing methods fall woefully short; they miss out on important topological information, are plagued by runtime issues, and do not scale to large graphs. Hence the primary goal of this thesis is to make learning and data mining with graph kernels feasible.
In the first half of this thesis, we review and analyze the shortcomings of state-of-the-art graph kernels. We then propose solutions to overcome these weaknesses. As highlights of our research, we
- speed up the classic random walk graph kernel from O(n^6) to O(n^3), where n is the number of nodes in the larger graph, and by a factor of up to 1,000 in CPU runtime, by extending concepts from Linear Algebra to Reproducing Kernel Hilbert Spaces,
- define novel graph kernels based on shortest paths that avoid tottering and outperform random walk kernels in accuracy,
- define novel graph kernels that estimate the frequency of small subgraphs within a large graph and that work on large graphs hitherto not handled by existing graph kernels.
In the second half of this thesis, we present algorithmic solutions to two novel problems in graph mining. First, we define a two-sample test on graphs. Given two sets of graphs, or a pair of graphs, this test lets us decide whether these graphs are likely to originate from the same underlying distribution. To solve this so-called two-sample-problem, we define the first kernel-based two-sample test. Combined with graph kernels, this results in the first two-sample test on graphs described in the literature.
Second, we propose a principled approach to supervised feature selection on graphs. As in feature selection on vectors, feature selection on graphs aims at finding features that are correlated with the class membership of a graph. Towards this goal, we first define a family of supervised feature selection algorithms based on kernels and the Hilbert-Schmidt Independence Criterion. We then show how to extend this principle of feature selection to graphs, and how to combine it with gSpan, the state-of-the-art method for frequent subgraph mining. On several benchmark datasets, our novel procedure manages to select a small subset of dozens of informative features among thousands and millions of subgraphs detected by gSpan. In classification experiments, the features selected by our method outperform those chosen by other feature selectors in terms of classification accuracy.
Along the way, we also solve several problems that can be deemed contributions in their own right:
- We define a unifying framework for describing both variants of random walk graph kernels proposed in the literature.
- We present the first theoretical connection between graph kernels and molecular descriptors from chemoinformatics.
- We show how to determine sample sizes for estimating the frequency of certain subgraphs within a large graph with a given precision and confidence, which promises to be a key to the solution of important problems in data mining and bioinformatics.
Three branches of computer science immediately benefit from our findings:
...more
View all episodesView all episodes
Download on the App Store

Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02By Ludwig-Maximilians-Universität München

  • 5
  • 5
  • 5
  • 5
  • 5

5

1 ratings


More shows like Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02

View all
Medizinische Fakultät - Digitale Hochschulschriften der LMU - Teil 06/19 by Ludwig-Maximilians-Universität München

Medizinische Fakultät - Digitale Hochschulschriften der LMU - Teil 06/19

0 Listeners

LMU Kapitalgesellschaftsrecht by Dr. jur. Timo Fest, LL.M. (Pennsylvania)

LMU Kapitalgesellschaftsrecht

0 Listeners

LMU Grundkurs Strafrecht I (L-Z) WS 2014/15 by Prof. Dr. jur. Helmut Satzger

LMU Grundkurs Strafrecht I (L-Z) WS 2014/15

0 Listeners

MCMP – Metaphysics and Philosophy of Language by MCMP Team

MCMP – Metaphysics and Philosophy of Language

2 Listeners

Hegel lectures by Robert Brandom, LMU Munich by Robert Brandom, Axel Hutter

Hegel lectures by Robert Brandom, LMU Munich

7 Listeners

John Lennox - Hat die Wissenschaft Gott begraben? by Professor John C. Lennox, University of Oxford

John Lennox - Hat die Wissenschaft Gott begraben?

4 Listeners

LMU Rechtsphilosophie by Prof. Dr. jur. Dr. jur. h.c. mult. Bernd Schünemann

LMU Rechtsphilosophie

0 Listeners

MCMP – Philosophy of Mathematics by MCMP Team

MCMP – Philosophy of Mathematics

2 Listeners

LMU Wiederholung und Vertiefung zum Schuldrecht - Lehrstuhl für Bürgerliches Recht by Professor Dr. Stephan Lorenz

LMU Wiederholung und Vertiefung zum Schuldrecht - Lehrstuhl für Bürgerliches Recht

0 Listeners

Epistemology and Philosophy of Science: Prof. Dr. Stephan Hartmann – HD by Ludwig-Maximilians-Universität München

Epistemology and Philosophy of Science: Prof. Dr. Stephan Hartmann – HD

1 Listeners