Dynamic Co-attention Network
- Similar to BiDAF, this paper introduces an attention layer (Co-attention) that flows both ways.
- However, this is a 2 layer attention (we compute attention on top of existing attention layers).
Assume we’ve the context hidden states and question hidden states . First, we apply a linear layer with tanh non-linearity to the question hidden states to get the projected question hideen states (q’).
Next we add sentinel vectors and which are both trainable vectors.
First Level Attention Layer
Similar to the BiDAF, we compute the affinity matrix (L) and C2Q attention layer (ai) and Q2C attention layer (bj).
Second Level Attention Layer
The advantage with DCN, is that we’ve a second-level attention layer. This is an attention layer on top of the first-level attention layer.
Finally, we concatenate the second-level attention layer outputs si with the first level C2Q attention outputs ai, and feed the sequence with a BiLSTM. We then return the final hidden state.
|Dynamic Co-attention Networks||65.4||75.6|
Code. Learn. Explore