Blockwise Parallel Decoding for Deep Autoregressive Models

NLG Comments

Paper Link

Overview

Architecture

Blockwise Parallel Decoding

  1. Predict - Get the block predictions for the next k steps.
  2. Verify - Find the largest prefix of k (say m) that is valid according to the base language model.
  3. Accept - Extend y to yj+1 and now set j = j + m
Blockwise Parallel Decoding for Deep Autoregressive Models

Combined Scoring and Proposal Model

Blockwise Parallel Decoding for Deep Autoregressive Models

Other Details

  1. TopK-Selection - as long as the token predicted is in the TopK during verification.
  2. Distance-Based Selection - distance between tokens (makes sense for images).

Results

Paper BLEU
Transformer (beam size 4) 28.4
Blockwise parallel decoding (k=4) 28.54
Transformer with distillation (k=1 29.11

Kaushik Rangadurai

Code. Learn. Explore

Share this post

Comments