RNN Architecture
Encoder consumes all words first before generating:
-> word1 -> word2 ... -> word-n
h1 h2 hn
Decoder generates one word at a time updating hidden states:
-> outWord1 -> outWord2 ... -> outWordFinal
hn h (updated)