Abstract:
The exponential growth of biomedical literature makes it challenging for end
users to find short and precise information quickly. Biomedical search engines,
such as Pubmed and Quertle, are unable to retrieve the exact information so, a
paradigm shift to question answering systems (QA) is required to find short and
crisp answers to users’ questions. A biomedical question answering system usu
ally comprises three components: question processing, candidate retrieval, and
answer processing. In question processing, the QA system performs query for
mulation and lexical answer type (LAT) prediction. Candidate retrieval stage
uses a search engine and a document index to retrieve relevant documents and
snippets. Finally, answer processing stage performs candidate answer generation
and scoring. The biomedical terminology is ever evolving, so it is challenging
for the candidate retrieval step to retrieve relevant documents requiring effective
query formulation techniques. Secondly, the answers to biomedical questions are
labeled with more than one semantic class in the biomedical domain requiring
multi-label lexical answer type (LAT) prediction. The study at hand attempts to
solve these two components in question processing stage by incorporating semantic
information. We use discriminative term-selection query expansion technique with
word embedding based semantic filtering during query formulation to improve the
performance of biomedical document retrieval. Furthermore, we propose a LAT
prediction pipeline for factoid and list type questions by introducing focus-driven
semantic features which have significantly enhanced appropriate answer selection
during the answer processing stage. We perform the evaluations of our proposed
LAT prediction methodology using state-of-the-art Open Architecture for Ques
tion Answering (OAQA) system and achieved better performance on 80% of the
test batches compared with the performance of state-of-the-art QA systems. Fur
thermore, we examine the proposed system performance in comparison with an
online biomedical question answering system - EAGLi - and attain the best per
formance for factoid and list type questions on Mean Reciprocal Rank (MRR) and
F1 measure respectively.