Good reads

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. J. Mach. Learn. Res., 3:1137–1155, March 2003. URL: www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf.
One of the initial works in word embeddings
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel P. Kuksa. Natural language processing (almost) from scratch. CoRR, 2011. URL: http://arxiv.org/abs/1103.0398, arXiv:1103.0398.
Graham Neubig. Neural machine translation and sequence-to-sequence models: A tutorial. CoRR, 2017. URL: http://arxiv.org/abs/1703.01619, arXiv:1703.01619.
Detailed tutorial on machine translation.
Introduces various language modeling techniques: ngram models, NN, RNN, LSTM and Attention based models
Sébastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. On using very large target vocabulary for neural machine translation. CoRR, 2014. URL: http://arxiv.org/abs/1412.2007, arXiv:1412.2007.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. CoRR, 2013. URL: http://arxiv.org/abs/1301.3781, arXiv:1301.3781.

Jaeyoung Kim, Mostafa El-Khamy, and Jungwon Lee. Residual LSTM: design of a deep recurrent architecture for distant speech recognition. CoRR, 2017. URL: http://arxiv.org/abs/1701.03360, arXiv:1701.03360.
Julian G. Zilly, Rupesh Kumar Srivastava, Jan Koutn\'ık, and Jürgen Schmidhuber. Recurrent highway networks. CoRR, 2016. URL: http://arxiv.org/abs/1607.03474, arXiv:1607.03474.
Chris Dyer. Notes on noise contrastive estimation and negative sampling. CoRR, 2014. URL: http://arxiv.org/abs/1410.8251, arXiv:1410.8251.
Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, 2012. URL: http://arxiv.org/abs/1207.0580, arXiv:1207.0580.
Paper on dropout

Rosie Jones and Daniel C. Fain. Query word deletion prediction. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR '03, 435–436. New York, NY, USA, 2003. ACM. URL: http://doi.acm.org/10.1145/860435.860538, doi:10.1145/860435.860538.