Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin.
A neural probabilistic language model.
J. Mach. Learn. Res., 3:1137–1155, March 2003.
URL: www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf.
One of the initial works in word embeddings
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel P. Kuksa.
Natural language processing (almost) from scratch.
CoRR, 2011.
URL: http://arxiv.org/abs/1103.0398, arXiv:1103.0398.
Introduces various language modeling techniques: ngram models, NN, RNN, LSTM and Attention based models
Sébastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio.
On using very large target vocabulary for neural machine translation.
CoRR, 2014.
URL: http://arxiv.org/abs/1412.2007, arXiv:1412.2007.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean.
Efficient estimation of word representations in vector space.
CoRR, 2013.
URL: http://arxiv.org/abs/1301.3781, arXiv:1301.3781.
NN
Jaeyoung Kim, Mostafa El-Khamy, and Jungwon Lee.
Residual LSTM: design of a deep recurrent architecture for distant speech recognition.
CoRR, 2017.
URL: http://arxiv.org/abs/1701.03360, arXiv:1701.03360.
Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov.
Improving neural networks by preventing co-adaptation of feature detectors.
CoRR, 2012.
URL: http://arxiv.org/abs/1207.0580, arXiv:1207.0580.
Paper on dropout
Search
Rosie Jones and Daniel C. Fain.
Query word deletion prediction.
In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR '03, 435–436. New York, NY, USA, 2003. ACM.
URL: http://doi.acm.org/10.1145/860435.860538, doi:10.1145/860435.860538.