A Structured Self-attentive Sentence Embedding①（Abstract＆Introduction）｜言語処理へのDeepLearningの導入の研究トレンドを俯瞰する #35

f:id:lib-arts:20200112123228p:plain

言語処理へのDeepLearningの導入をご紹介するにあたって、#3〜#8においては、Transformer[2017]やBERT[2018]について、#9~#10ではXLNet[2019]について、#11~#12ではTransformer-XL[2019]について、#13~#17ではRoBERTa[2019]について、#18~#20ではWord2Vec[2013]について、#21~#24ではALBERT[2019]について、#26〜#30ではT5[2019]について、#31〜#32ではERNIEについて、#33〜#34ではELMo[2018]について取り扱ってきました。

BERTリポジトリのサンプル実行の流れ｜言語処理へのDeepLearningの導入の研究トレンドを俯瞰する #6 - lib-arts’s diary

XLNet②（事前学習におけるAutoRegressiveとPermutation）｜言語処理へのDeepLearningの導入の研究トレンドを俯瞰する #10 - lib-arts’s diary

RoBERTa（論文の詳細④ RoBERTa、Related Work、Conclusion）｜言語処理へのDeepLearningの導入の研究トレンドを俯瞰する #17 - lib-arts’s diary

ALBERT③（The Elements of ALBERT）｜言語処理へのDeepLearningの導入の研究トレンドを俯瞰する #23 - lib-arts’s diary

T5(Text-toText Transfer Transformer)③（Section2_Setup）｜言語処理へのDeepLearningの導入の研究トレンドを俯瞰する #28 - lib-arts’s diary

#35からはTransformerの構造のベースになっているself-attentionを確認するにあたって"
A Structured Self-attentive Sentence Embedding"について取り扱います。

[1703.03130] A Structured Self-attentive Sentence Embedding

#35ではAbstractとIntroductionの確認を行います。
以下目次になります。
1. Abstract
2. Introduction(Section1)
3. まとめ

1. Abstract
1節ではAbstractの内容を確認しながら概要について把握します。以下各文の和訳などを通して簡単に内容を確認します。

This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence.

和訳：『この論文ではself-attentionを導入することによって解釈可能な文の表現を抽出するモデルを提案する。ベクトルを用いる代わりに、我々はそれぞれの行が文の異なる部分を表す2次元の行列表現を用いている。』
Transformerの論文でも用いられているself-attentionの紹介を行なっています。詳細の記述については論文の該当部分を読み解くと良さそうです。

We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding.

和訳：『また、我々はモデルに導入するself-attentionや特殊な正則化項(special regularization term)を提案も行なっている。副次効果として、埋め込み表現は文におけるどの特定の部分が埋め込み表現にエンコードされているかについて簡単に可視化する方法がある。』
self-attentionの説明がされていますが、こちらについても詳細の記述を元に読み解くと良さそうです。

We evaluate our model on 3 different tasks: author profiling, sentiment classification, and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.

和訳：『モデルを評価するにあたって我々はauthor profiling、sentiment classification、and textual entailmentの三つの異なるタスクを用いている。実験結果としては、我々のモデルは全てのタスクにおいて他の文の埋め込み手法と比較した際に特筆すべき性能を示した。』
実験結果について示されています。

2. Introduction(Section1)
2節ではSection1のIntroductionについて確認します。以下パラグラフ単位で確認していきます。

f:id:lib-arts:20200112165950p:plain

第一パラグラフでは、Word2vecのような単語のベクトル表現(distributed representations)の計算においてはこれまで大きな発展があった一方で、句や文単位での十分な表現を獲得する問題が残っているとされており、それを受けて二つの研究カテゴリがあるとなっています。また、一つ目の研究カテゴリとして、教師なし学習を用いて学習したuniversal sentence embeddingsが挙げられており、具体的な研究としては、ParagraphVectorやrecursive auto-encodersなどが紹介されています。

f:id:lib-arts:20200112170006p:plain

第二パラグラフでは、二つ目のカテゴリとして特定のタスクのためのモデル学習が挙げられています。特定のタスクのために学習させる方がうまく行きやすいが、学習用のコーパスの得やすさなども考慮する必要があるとなっています。また、モデルとしてはRNNベースの話が紹介されています。

f:id:lib-arts:20200112170041p:plain

f:id:lib-arts:20200112170057p:plain

第三パラグラフでは、CNNやLSTM(RNN)を用いたアプローチに対して、いくつかのタスクではattention mechanismが用いられていることについて述べられています。一方で、sentiment classificationのような追加情報(extra information)のない状況下では通常の入力だけがインプットのためattentionを直接的に表現することはできないとなっています。

f:id:lib-arts:20200112170113p:plain

第四パラグラフでは、第三パラグラフで述べられた入力としてテキスト系列しかない場合の解決策としてself-attentionが提案されています。これによって追加情報がなくてもattentionの計算を行うことができるとなっています。

3. まとめ
#35ではself-attentionについての論文である、"A Structured Self-attentive Sentence Embedding"のAbstractとIntroductionを確認し、論文の概要を掴みました。
#36ではRelated Work以下の重要なポイントについて取り扱っていきます。