Attention is All You Need

NLG Comments

Paper Link

Overview

Architecture

Attention Is All You Need

Scaled Dot-Product Attention

Compute the attention between Queries and Keys (each of dimension dk) and use it to find the weighted sum of values (dimension dv).

Multi-Head Attention

Attentions Used In This Paper

Position-wise Feed-Forward Networks

Positional Encoding

Miscellaneous Things

Results

Paper EN-DE EN-FR
MoE 26.03 40.56
Transformer Base 27.3 38.1
Transformer 28.4 41.0

References

  1. http://nlp.seas.harvard.edu/2018/04/03/attention.html

Kaushik Rangadurai

Code. Learn. Explore

Share this post

Comments