Annotating genes using textual patterns

Ali Cakmak*, Gultekin Ozsoyoglu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Citations (Scopus)

Abstract

Annotating genes with Gene Ontology (GO) terms is crucial for biologists to characterize the traits of genes in a standardized way. However, manual curation of textual data, the most reliable form of gene annotation by GO terms, requires significant amounts of human effort, is very costly, and cannot catch up with the rate of increase in biomedical publications. In this paper, we present GEANN, a system to automatically infer new GO annotations for genes from biomedical papers based on the evidence support linked to PubMed, a biological literature database of 14 million papers. GEANN (i) extracts from text significant terms and phrases associated with a GO term, (ii) based on the extracted terms, constructs textual extraction patterns with reliability scores for GO terms, (iii) expands the pattern set through pattern crosswalks, (iv) employs semantic pattern matching, rather than syntactic pattern matching, which allows for the recognition of phrases with close meanings, and (iv) annotates genes based on the quality of the matched pattern to the genomic entity occurring in the text. On the average, in our experiments, GEANN has reached to the precision level of 78% at the 57% recall level.

Original languageEnglish
Title of host publicationPacific Symposium on Biocomputing 2007, PSB 2007
Pages221-232
Number of pages12
Publication statusPublished - 2007
Externally publishedYes
EventPacific Symposium on Biocomputing, PSB 2007 - Maui, HI, United States
Duration: 3 Jan 20077 Jan 2007

Publication series

NamePacific Symposium on Biocomputing 2007, PSB 2007

Conference

ConferencePacific Symposium on Biocomputing, PSB 2007
Country/TerritoryUnited States
CityMaui, HI
Period3/01/077/01/07

Fingerprint

Dive into the research topics of 'Annotating genes using textual patterns'. Together they form a unique fingerprint.

Cite this