Effective Incorporation of Speaker Information in Utterance Encoding in Dialog

Nov 01, 2019 Dialog Comments

Knowing who produced which utterance is essential to understanding a dialog
Conventional methods tried integrating speaker labels into utterance vectors.
A relative speaker modeling method is proposed to address the problem and is more useful in SwDA corpus as there are many speakers (not just 2).

Absolute Speaker Embedding

Simplest idea is to use 1-hot vectors for Speakers. If there are 2 speakers [1, 0] for speaker A and [0, 1] for Speaker B.

\[\begin{align*} f^{uttr}(x_i, s_i) = RNN^{uttr} (x_i) \oplus Emb^{abs} (s_i) \end{align*}\]

Another alternate here is to use 2 different encoders (1 RNN encoder for speaker A and 1 for speaker B).

Relative Speaker Embedding

Code. Learn. Explore