A Structured Self-Attentive Sentence Embedding

Classification Comments

Paper Link

Introduction

Architecture

Self Attention Architecture

H = hidden states of Bi-LSTM (N*2u) a = softmax(V_s2 * tanh(W_s1 * H^T))

where the dimensionality of a is n.

But we might need multiple such vectors, lets say r and hence we set

A = softmax(W_s2 * tanh(W_s1 * H^T))

where the dimensionality of A is r*n

M = AH (r * 2u)

Results

Self Attention Results

Kaushik Rangadurai

Code. Learn. Explore

Share this post

Comments