Medizin - Open Access LMU - Teil 14/22

A simple approach for protein name identification: prospects and limits


Listen Later

Background: Significant parts of biological knowledge are available only as unstructured text in articles of biomedical journals. By automatically identifying gene and gene product (protein) names and mapping these to unique database identifiers, it becomes possible to extract and integrate information from articles and various data sources. We present a simple and efficient approach that identifies gene and protein names in texts and returns database identifiers for matches. It has been evaluated in the recent BioCreAtIvE entity extraction and mention normalization task by an independent jury. Methods: Our approach is based on the use of synonym lists that map the unique database identifiers for each gene/protein to the different synonym names. For yeast and mouse, synonym lists were used as provided by the organizers who generated them from public model organism databases. The synonym list for fly was generated directly from the corresponding organism database. The lists were then extensively curated in largely automated procedure and matched against MEDLINE abstracts by exact text matching. Rule-based and support vector machine-based post filters were designed and applied to improve precision. Results: Our procedure showed high recall and precision with F-measures of 0.897 for yeast and 0.764/0.773 for mouse in the BioCreAtIvE assessment (Task 1B) and 0.768 for fly in a post-evaluation. Conclusion: The results were close to the best over all submissions. Depending on the synonym properties it can be crucial to consider context and to filter out erroneous matches. This is especially important for fly, which has a very challenging nomenclature for the protein name identification task. Here, the support vector machine-based post filter proved to be very effective.
...more
View all episodesView all episodes
Download on the App Store

Medizin - Open Access LMU - Teil 14/22By Ludwig-Maximilians-Universität München


More shows like Medizin - Open Access LMU - Teil 14/22

View all
MCMP – Metaphysics and Philosophy of Language by MCMP Team

MCMP – Metaphysics and Philosophy of Language

2 Listeners

Hegel lectures by Robert Brandom, LMU Munich by Robert Brandom, Axel Hutter

Hegel lectures by Robert Brandom, LMU Munich

8 Listeners

MCMP – Philosophy of Science by MCMP Team

MCMP – Philosophy of Science

1 Listeners

MCMP – Mathematical Philosophy (Archive 2011/12) by MCMP Team

MCMP – Mathematical Philosophy (Archive 2011/12)

6 Listeners

Sommerfeld Lecture Series (ASC) by The Arnold Sommerfeld Center for Theoretical Physics (ASC)

Sommerfeld Lecture Series (ASC)

0 Listeners

Theoretical Physics Schools (ASC) by The Arnold Sommerfeld Center for Theoretical Physics (ASC)

Theoretical Physics Schools (ASC)

2 Listeners

MCMP – Philosophy of Physics by MCMP Team

MCMP – Philosophy of Physics

3 Listeners

Medizin - Open Access LMU - Teil 12/22 by Ludwig-Maximilians-Universität München

Medizin - Open Access LMU - Teil 12/22

0 Listeners

Women Thinkers in Antiquity and the Middle Ages - SD by Peter Adamson

Women Thinkers in Antiquity and the Middle Ages - SD

0 Listeners

Einführung in die Ethnologie by Prof. Dr. Frank Heidemann

Einführung in die Ethnologie

0 Listeners