Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02

Efficient Similarity Search in Structured Data


Listen Later

Modern database applications are characterized by two major aspects: the use of complex data types
with internal structure and the need for new data analysis methods. The focus of database users has shifted from
simple queries to complex analyses of the data, known as knowledge discovery in databases. Important
tasks in this area are the grouping of data objects (clustering), the classification of new data objects or the
detection of exceptional data objects (outlier detection). Most algorithms for solving those problems are based on
similarity search in databases. This makes efficient similarity search in large databases of structured objects
an important basic operation for modern database applications.
In this thesis we develop efficient methods for similarity search in large databases of
structured data and improve the efficiency of existing query processing techniques.
For the data objects, only a tree or graph structure is assumed which can be extended with arbitrary attribute
information.
Starting with an analysis of the demands from two example applications, several important requirements for
similarity measures are identified. One aspect is the adaptability of the similarity search method to the
requirements of the user and the application domain. This can even imply a change of the similarity measure
between two successive queries of the same user. An explanation component which makes clear why objects are considered
similar by the system is a necessary precondition for a purposeful adaption of the measure. Consequently,
the edit distance, well-known from string processing, is a common similarity measure for graph structured
objects. Its feature to allow a visualization of corresponding substructures and the possibility to weight
single operations are the reason for this popularity.
But it turns out that the edit distance and similar measures for tree structures are computationally extremely
complex which makes them unsuitable for today's large and even growing databases. Therefore, we develop a
multi-step query processing architecture which reduces the number of necessary distance calculations significantly.
This is achieved by employing suitable filter methods.
Furthermore, we show that by easing certain restrictions
on the similarity measure, a significant performance gain can be obtained without reducing the quality of the
measure. To achieve this, matchings of substructures (vertices or edges) of the data objects are determined.
An additional cost function for those matchings allows to derive a similarity measure for structured data, called
the edge matching distance, from
the cost optimal matching of the substructures. But even for this new similarity measure, efficiency can be improved
significantly by using a multi-step query processing approach. This allows the use of the edge matching distance for
knowledge discovery applications in large databases. Within the thesis, the properties of our new similarity search methods
are proved both theoretically and through experiments.
...more
View all episodesView all episodes
Download on the App Store

Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02By Ludwig-Maximilians-Universität München

  • 5
  • 5
  • 5
  • 5
  • 5

5

1 ratings


More shows like Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02

View all
Medizinische Fakultät - Digitale Hochschulschriften der LMU - Teil 06/19 by Ludwig-Maximilians-Universität München

Medizinische Fakultät - Digitale Hochschulschriften der LMU - Teil 06/19

0 Listeners

LMU Kapitalgesellschaftsrecht by Dr. jur. Timo Fest, LL.M. (Pennsylvania)

LMU Kapitalgesellschaftsrecht

0 Listeners

LMU Grundkurs Strafrecht I (L-Z) WS 2014/15 by Prof. Dr. jur. Helmut Satzger

LMU Grundkurs Strafrecht I (L-Z) WS 2014/15

0 Listeners

MCMP – Metaphysics and Philosophy of Language by MCMP Team

MCMP – Metaphysics and Philosophy of Language

2 Listeners

Hegel lectures by Robert Brandom, LMU Munich by Robert Brandom, Axel Hutter

Hegel lectures by Robert Brandom, LMU Munich

6 Listeners

John Lennox - Hat die Wissenschaft Gott begraben? by Professor John C. Lennox, University of Oxford

John Lennox - Hat die Wissenschaft Gott begraben?

4 Listeners

LMU Rechtsphilosophie by Prof. Dr. jur. Dr. jur. h.c. mult. Bernd Schünemann

LMU Rechtsphilosophie

0 Listeners

MCMP – Philosophy of Mathematics by MCMP Team

MCMP – Philosophy of Mathematics

2 Listeners

LMU Wiederholung und Vertiefung zum Schuldrecht - Lehrstuhl für Bürgerliches Recht by Professor Dr. Stephan Lorenz

LMU Wiederholung und Vertiefung zum Schuldrecht - Lehrstuhl für Bürgerliches Recht

0 Listeners

Epistemology and Philosophy of Science: Prof. Dr. Stephan Hartmann – HD by Ludwig-Maximilians-Universität München

Epistemology and Philosophy of Science: Prof. Dr. Stephan Hartmann – HD

1 Listeners