Dialog Context Language Modeling With Recurrent Neural Networks

Nov 01, 2019 Dialog, NLG Comments

Overview

The goal is to encode the context for Language Modeling.
Design RNN based contextual language models that specially track the interactions between speakers in a dialog.
Modeling utterances in a dialog as a sequence of inputs might not well capture the pauses, turntaking, and grounding phenomena in a dialog.

Background

Mikolov et al proposed a topic conditioned RNNLM by introducing a contextual real valued vector (LDA of preceding text) to the RNN hidden state.
Lin et al proposed using Hierarchical RNN for document modeling.

However all these methods focused on applying context by encoding preceding text without considering interactions in dialogs.

Architecture

Context Dependent RNNLM

Let D = \((U_1, U_2, ... U_K)\) be a dialog with K turns and involve 2 speakers. In this case, turn is just the utterance of a single speaker and could involve multiple messages. The kth turn \(U_k = (w_1, w_2, ..., w_{T_K})\) is represented as a sequence of T_k words.

\[\begin{align*} P (U_k | U_{less_than_k}) = \prod_{t=1}^{T_k} P ({w_t}^{U_k} | {w_{less_than_t}}^{U_k}, U_{less_than_k}) \end{align*}\]

In the above model, we append a context representation to the input to RNN (as opposed to the hidden state).

Context Representations

Strip sentence boundaries, run an RNN and use the final hidden state as the context (DRNNLM).
Alternatively, the last RNN hidden state is fed to the RNN hidden state of the target utterance at each time step (CCDCLM).

However, the above 2 methods treat dialog history as a sequence of inputs, without modeling dialog interactions. In order to deal with this, the paper proposes 2 different architectures -

In the above model, we define the context and initial hidden state as follows - \(c = {h_{T_{k-1}}}^{U_{k-1}}\) and \({h_0}^{U_k} = {h_{T_{k-2}}}^{U_{k-2}}\)