BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

NLU Comments

Paper Link

Overview

Architecture

BERT Architecture Diagram

Input/Output Representation

BERT Text Representation

Masked LM

Next Sentence Prediction

Data

Results

Paper MNLI QQP QNLI SST-2 CoLA STS-B MRPC RTE Average
BERT Large 86.7/85.9 72.1 92.7 94.9 60.5 86.5 89.3 70.1 82.1
OpenAI GPT 82.1/81.4 70.3 87.4 91.3 45.4 80.0 82.3 56.0 75.1

Kaushik Rangadurai

Code. Learn. Explore

Share this post

Comments