Adam(Adaptive Moment Estimation)｜DeepLearning論文の原文を読む #13

f:id:lib-arts:20190222211759p:plain

#12ではDCGAN(Deep Convolutional GAN)について取り扱いました。

#13では最適化のアルゴリズムとして近年よく使われているAdamについて取り扱います。（ちゃんと読めてなかったのでAbstractの和訳だけにとどめ、後日追記します。）

[1412.6980] Adam: A Method for Stochastic Optimization
以下論文の目次です。基本的な書き方の流れとしてはAbstractは和訳＆補足、それ以外の章に関しては要約を中心にまとめます（省く章もあるかもしれません）

0. ABSTRACT
1. INTRODUCTION
2. ALGORITHM
3. INITIALIZATION BIAS CORRECTION
4. CONVERGENCE ANALYSIS
5. RELATED WORK
6. EXPERIMENTS
7. EXTENSIONS
8. CONCLUSION

0. ABSTRACT

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters.

和訳：『我々は低次元モーメントの適応的推定に基づいた確率的な目的関数の一次の勾配ベースの最適化のためのアルゴリズムのためにAdamを提案する。Adamは直接的に実装でき、計算機的に効率的で、メモリの制約が少なく、勾配の対角の李スケールに対し不変で、データやパラメータの点で大きな問題に適している。』

The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed.

和訳：『Adamはノイズが多かったり勾配が疎だったりを伴う非定常な目的関数や問題にも適している。ハイパーパラメータ直感的に解釈可能で少ないチューニングしか必要としない。アダムが影響を受けた関連するアルゴリズムとの関連も議論されている。』

We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

和訳：『同様に理論的な収束の詳細について分析しており、オンラインの凸最適化の文脈において最高と知られている結果と同様な収束の比率を示している。』

Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

和訳：『実証的な結果によってAdamが実質的にうまく機能し、他の確率的な最適化の手法に比肩することを示す。最後にAdaMaxについても議論する』

ざっと読んだ感じだと全体的に一次の勾配(moment)がベースで、計算効率の言及がされている印象でした。以後は後日時間があれば追記します。

1. INTRODUCTION
2. ALGORITHM
3. INITIALIZATION BIAS CORRECTION
4. CONVERGENCE ANALYSIS
5. RELATED WORK
6. EXPERIMENTS
7. EXTENSIONS
8. CONCLUSION