Sentencepiece

Tokenization Comments

Paper Link

Introduction

Architecture

Four main components -

Normalizer

Trainer

Encoder

Decoder

Lossless Tokenization

Results

SentencePiece Results

Kaushik Rangadurai

Code. Learn. Explore

Share this post

Comments