What is Watson Question Answering System?

IBM Watson is a question answering system that won Jeopardy challenge in 2011 against former winners Brad Rutter and Ken Jennings. Jeopardy is a famous American television game show that tests participants on their general knowledge and Watson turned out to be more knowledgeable than humans in that show. Watson uses natural language processing techniques and machine learning algorithms to comprehend questions and answer them.

There are some key differences between question answering systems and search engines. In search engines a set of keywords are considered as an input and the output would be a list of documents ranked in the relevancy order of the keywords and their popularity on the World Wide Web. However in question answering systems, the input is a question in natural language and the answer is precise piece of text. The main difference is that the question answering system can relate words semantically and provide the same answer; e.g., how tall is the CN tower and how huge is the tallest Canadian building would get the same answers.

How Watson works?

Watson Question Answering system is founded on an open source framework called Unstructured Information Management Architecture (UIMA). UIMA allows for different analysis engines to act as stages, where the analyzed, searched and scored outputs are saved in the form of a common data structure. Watson’s architecture is also composed of stages and is illustrated in Figure 1. It is further explained below.


Figure 1 Watson Architecure

First stage in Watson’s architecture is Question Analysis. In this step, questions are decomposed into keywords and the answer type for a question is detected. For example, consider the question: “In which Michigan city in 1894, C.W. Post created his warm cereal drink?”  Watson will divide it into keywords as: “1894, C.W. Post, created, warm cereal, drink Potasum, Michigan, and city.” It will also identify the answer type as “Michigan city” for this question because the question is asking about the city in Michigan. Answer types in a question answering system are usually detected base on a predefined taxonomy of name-entities. This taxonomy is extracted from a set of initial sample questions, or it is created separately. A grammar parser is used in Watson to extract answer types along with the semantic and syntactic focus of the question. Similarly, a named entity detector identifies common and proper nouns and the entity type (person, place, etc.) and a predicate structure detector identifies the relationship between subjects and objects. All these become the input for the second stage.

The second stage is about the generation of hypothesis. In this stage Watson uses those keywords and search millions of documents for relevant passages. This primary search is done by using Lucene, Solr and Prismatic. If potential passages are not found at this time, they will not be discovered later. In the case of the above example, Watson found 5 relevant documents and 30 passages. In this stage candidate answers are identified in the search results by ranking the searched documents and the passages. Ranking is done by determining the keywords, answer types, etc. in the document titles (including a variety of title variants and expansions) and text in the passages.

The third stage is about answer scoring. In this stage, the candidate answers (selected passages from documents) are assigned scores using many answer scoring analytics. For example, type coercion is used to determine whether an answer is of a specific type or not. Figure 2 shows the scores assigned to the candidate answers using type coercion (Ty Cor), document ranking and passage ranking scoring mechanism (as mentioned in previous step).

 

Candidate Answers Evidence Feature Scores
Doc Rank Pass Rank Ty Cor
GeneralFoods 0 1 0.1
Post Foods 2 1 0.1
Battle Creek 1 2 0.8
Will Keith

Kellog

3 0.1
Grand Rapids 0.9
1895 0 0.0

Figure 2. Evidence scores for candidate answers

 

The first column in Figure 2 shows the titles of the documents and the remaining column shows the scores based on different analytics. In the case of type coercion for our example question, a score would be assigned by evaluating if the passage in the document is referring to a city or not, such as: (a) the score for the document “General Food” as a “city” would be 0.1; and (b) the score for the document “Grand Rapids” as a “city” would be 0.9. In addition to type coercion, hundreds of such scores are assigned to the candidate answers in Watson.

The fourth stage deals with final merging and ranking (FMR) of candidate answers. This step is done on IBM’s SPSS servers. Figure 2 shows that each row for a candidate answer represents a feature vector with columns as features. These feature vectors for each candidate answers are passed on to logistic regression based machine learning models. These models assign probabilities, called confidence, to each of the candidate answers. The candidate answers are then ranked in the decreasing order of their confidence and presented to users as the final answers. The answer with the highest confidence is considered as the best answer.

The logistic regression based machine learning models used in the fourth stage are actually developed from the sample of questions and answers provided to Watson during the training phase. During training, users need to create well formatted documents with titles and passages as answers. Those documents are uploaded to Watson and then the users need to assign sample questions to different answers in the documents. For each of the questions, Watson goes through the stages 1-3 as mentioned above but this time Watson knows the correct answers and labels each feature vector as “Yes” or “No” based on the answer-mapping done by the users. Using this collection of feature vectors for candidate answers of sample questions, Watson trains the logistic regression based models which are then used to answer different questions of users not in sample questions during testing (i.e., Stage 4).

 

Watson is a complicated system and can answer to questions accurately even if they are not present in its training corpus. However, Watson QA system also requires extensive training and it could be tedious. This article is extracted from my research paper on Watson: How to Effectively Train IBM Watson: Classroom Experience