Question Answering (Part 2): Taxonomy and Comparison of Different Categories of Question Answering Systems
In the first part of this series we looked at how Question Answering systems can help organizations unlock the information available in their data stores to improve productivity and end user experience. In this part let us look at the different ways to build Question Answering systems. The Question Answering systems are based on the following two paradigms: knowledge based, and information retrieval based [1]. Knowledge based questions answering systems typically use the knowledge from the document corpora captured in a knowledge graph to answer a query. Information retrieval-based question answering systems directly retrieve the answer from the documents in the corpora.
Supervised Machine Learning Based Question Answer Models
Information retrieval based systems typically use supervised machine learning based Question Answer models. Question-answer models take as input a passage of text, and a question. Question-answer models extract the span of text in the passage which is the answer to the user query. They return the start and the end position of the answer in the input passage.

Knowledge Based vs Information Retrieval Based Question Answering

Knowledge based question answering systems require an additional layer to transform the natural query into a language such as SPQRQL that can be understood by the underlying layers. Information retrieval question answering systems based on deep learning do not require an additional layer to understand the natural language query.
Knowledge based question answering system can only provide information that is captured in the underlying structured knowledge layer from the unstructured document corpora. Information retrieval question answering systems based on deep learning can answer questions from any part of the unstructured document corpora.
Information retrieval-based question answering systems lack the ability to answer questions that require reasoning on the information in the text. Knowledge based question answering systems are designed to answer questions that require reasoning on the text.
Creation and maintenance of knowledge-based question answering systems is expensive, and error prone. Creation and maintenance of ontology to capture the knowledge is tedious and time consuming. Populating the ontology using machine learning is error prone.
Advantages of Information Retrieval Based Question Answering

The ability to get direct answers to questions asked in natural language is convenience from reading through a list of documents to get the answer to questions. This is specifically useful in business settings where field and support personals are under pressure to quickly, efficiently and accurately answer queries.
Transfer learning is the application of knowledge learned while solve one problem on other similar problems. Latest deep learning based word embeddings such as Bidirectional Encoder Representation fromTransformers (BERT) enable pre-trained Question Answer models trained on corpus from one domain to easily answer questions from another domain. This makes is easy to introduce support for Question Answering in newer applications using pre-trained models.
When compared to knowledge based systems, information retrieval based systems do not require an upfront creation and maintenance cost like ontology design. Transfer learning allows information retrieval based systems to scale to different data sets.
Challenges of Information Retrieval Based Question Answering

As mentioned earlier in this article, Information Retrieval Based Systems developed using deep learning based Question Answer models are naturally incapable of answering questions that require reasoning. The deep learning based models for Question Answering take the input passage and questions and output the start and end position in the passage that contains the answer. Consequently they can not answer questions whose answer is spread across the document. Finally deep learning based models for Question Answering can not answer questions that have multiple sub questions whose answers are spread across the document.
In the next part of this series let us look at the standard data sets that can be used to develop supervised question answer models.