Bi-directional Attention Flow for Machine Comprehension

MRC Comments

Paper Link

Overview

Architecture

Bidirectional Attention Flow for Machine Comprehension

Assume we’ve the context hidden states and question hidden states . We compute the similarity matrix S which contains a similarity score Sij for each pair of (ci, qj) where

where is a weight matrix of shape and .

Context-to-Question Attention (C2Q)

We take a row-wise softmax of S to obtain the attention distributions which is used to take a weighted sum of the quesiton hidden states yielding C2Q attention output a_i.

This is very similar to the normal attention (instead of dot-product we use the matrix similarity S). The intution is that for every word in context, we compute the similarity to every other word in question. We then take a softmax on top of this to get a weighted sum of the question hidden states. We do this for every word/token in context.

Question-To-Context Attention (Q2C)

Similarly, we take a column-wise softmax of S to obtain the attention distributions which is used to take a weighted sum of the context hidden states yielding C2Q attention output .

Bi-directional Attention Flow

The intuition is that, for every word in context, we find the most similar word to question and then take a softmax of that to get a weight for every word in context. Now we take a weighted sum of the context hidden states to get .

Results

Paper EM F1
Dynamic Co-attention Networks 66.2 75.9
BiDaf 67.7 77.3
No char embedding 65 74.4
No C2Q Attention 57.2 67.7
No Q2C Attention 63.6 73.7

Kaushik Rangadurai

Code. Learn. Explore

Share this post

Comments