Gmail Smart Compose: Real-Time Assisted Writing

Nov 03, 2019 Dialog, NLG Comments

A system for providing real-time, interactive suggestions to help users compose messages quickly and with confidence in Gmail.
Challenges include latency (90th percentile is 60 ms (<100ms)), scale (1.5 billion users), personalization, fairness and privacy and Metrics Design (offline metrics correlate with online).

Data

Preprocessing

Language Detection (strip text not in language)
Segmentation & Tokenization - sentence boundaries detected, sentences are broken down into words and punctuation marks.
Normalization - Infrequent words and entities like name, email, URL and phone numbers are replaced by special tokens.
Quotation Removal - quoted original message is removed.
Salutation/Close Removal

Models

In LM-A: Input sequence (to LM) is the prefix of the current e-mail. The context is encoded using a dedicated encoder (average word-embeddings) and is concatenated at each time step of the LM.
In LM-B: Input sequence is concatenation of (subject, previous email, current e-mail) with some special token to delimit them. Much simpler but longer sequence.

In Seq2Seq Model: The encoder is subject and previous email body and the decoder is current email body.

Triggering

Generate n-best predictions/candidates using beam search.
Each candidate sequence is considered complete when a sentence punctuation token or a special end-of-sequence () token is generated, or when the candidate reaches a predefined maximum output sequence length.
Length-normalized log conditional probability as the confidence score of each suggestion sequence and define a triggering threshold based on a target triggering frequency/coverage.

Eval

Log Perplexity
ExactMatch@N - In this paper, we report an averaged ExactMatch number by weighted averaging the ExactMatch for all the lengths up to 15.

Production System

Context Encoding - Cached and stored only once.
Prefix Encoding - Takes hidden layer and current text. User is assigned to a fixed server, so directly proportional to text typed since last request.
Beam Search - Takes in various parameters to control beam search (number of steps, valid tokens to end consideration of a beam, and blacklisted words). We can also filter for partial match.

Personalization

Train for each user a light-weight language model adapted to the user’s personal mail data.
n-gram language model with Katz-backoff stored using compact weighted finite automata (WFA) format.

\[\begin{align*} P_{final} = \alpha P_{personal} + (1 - \alpha) * P_{global} \end{align*}\]

Multi-lingual Model

RNN Layers

Transformer

We observed that for the LSTM-2-1024, simply by joining the averaged embeddings from subject and previous e-mail to the model input reduces the test log perplexity by 0.13, showing that contextual information indeed helps improving model quality.
We can see that the LSTM seq2seq model outperforms the LSTM language models with similar number of model params, indicating that an encoder-decoder architecture with attention is more effective in modeling the context information.

Future Work

Adapted version of the Transformer model where self-attention is applied locally over a fixedsized window of steps during decoding, instead of over all previous decoding steps
Pretrained language models.
Use VAE.