Biological knowledge is scattered in heterogeneous database formats and locked in unstructured natural language documents.
The aim of BOOTStrep is to pull together already existing biological fact databases as well as various terminological repositories and implement a text analysis system which continuously increases their coverage by analysing biological documents.
The intended integration of biological knowledge in a homogeneous conceptual framework will ease access to the otherwise fragmented knowledge and substantially increase its usability for R&D purposes, e. g. in the European bio-tech and pharmaceutical industry.
Knowledge integration and reuse in the biology domain are the main goals of the BOOTStrep project.
In particular, BOOTStrep aims at:
- exploiting already existing terminological resources (thesauri, classification systems etc.) and combining them within a common and standardized conceptual representation framework; based on this domain-specific background knowledge, advanced natural language technologies are employed for the analysis of biological documents in order to fill conceptual gaps in these resources by automatically acquiring new terms, concepts and relations;
- creating, incrementally maintaining and continuously updating a repository of biological facts based on employing a comprehensive bio-lexicon and a standards-based formal bio-ontology for text analysis; facts are extracted from biological documents in a fully automatic way and are subsequently filtered and validated for novelty, redundancy, contradiction etc.;
- developing resources and resource-building NLP tools for text-based knowledge harvesting in order to support information extraction and text mining in the biology domain;
- allowing multilingual public access to continuously updated and validated biological fact repositories.