Memory degradation induced by attention in recurrent neural architectures
Mostra el registre complet de l'element
Visualització
(8.520Mb)
|
|
|
|
|
|
Harvat, Mykola; Martín Guerrero, José David
|
|
Aquest document és un/a article, creat/da en: 2022
|
|
|
|
This paper studies the memory mechanisms in recurrent neural architectures when attention models are included. Pure-attention models like Transformers are more and more popular as they tend to outperform models with recurrent connections in many different tasks. Our conjecture is that attention prevents the recurrent connections from transferring information properly between consecutive next steps. This conjecture is empirically tested using five different models, namely, a model without attention, a standard Loung attention model, a standard Bahdanau attention model, and our proposal to add attention to the inputs in order to fill the gap between recurrent and parallel architectures (for both Luong and Bahdanau attention models). Eight different problems are considered to assess the five models: a sequence-reverse copy problem, a sequence-reverse copy problem with repetitions, a filter sequence problem, a sequence-reverse copy problem with bigrams and four translation problems (English to Spanish, English to French, English to German and English to Italian). The achieved results reinforce our conjecture on the interaction between attention and recurrence.
|
|
Veure al catàleg Trobes
|
|
|
Aquest element apareix en la col·lecció o col·leccions següent(s)
Mostra el registre complet de l'element