Mostra el registre parcial de l'element
dc.contributor.author | Harvat, Mykola | |
dc.contributor.author | Martín Guerrero, José David | |
dc.date.accessioned | 2023-06-15T07:52:14Z | |
dc.date.available | 2023-06-16T04:45:06Z | |
dc.date.issued | 2022 | es_ES |
dc.identifier.citation | Harvat, M., & Martín-Guerrero, J. D. (2022). Memory degradation induced by attention in recurrent neural architectures. Neurocomputing, 502, 161-176. | es_ES |
dc.identifier.uri | https://hdl.handle.net/10550/87913 | |
dc.description.abstract | This paper studies the memory mechanisms in recurrent neural architectures when attention models are included. Pure-attention models like Transformers are more and more popular as they tend to outperform models with recurrent connections in many different tasks. Our conjecture is that attention prevents the recurrent connections from transferring information properly between consecutive next steps. This conjecture is empirically tested using five different models, namely, a model without attention, a standard Loung attention model, a standard Bahdanau attention model, and our proposal to add attention to the inputs in order to fill the gap between recurrent and parallel architectures (for both Luong and Bahdanau attention models). Eight different problems are considered to assess the five models: a sequence-reverse copy problem, a sequence-reverse copy problem with repetitions, a filter sequence problem, a sequence-reverse copy problem with bigrams and four translation problems (English to Spanish, English to French, English to German and English to Italian). The achieved results reinforce our conjecture on the interaction between attention and recurrence. | es_ES |
dc.language.iso | en | es_ES |
dc.publisher | Elsevier | es_ES |
dc.subject | long short-term memory networks | es_ES |
dc.subject | attention mechanisms | es_ES |
dc.subject | recurrence | es_ES |
dc.subject | gate activations | es_ES |
dc.subject | forget gate | es_ES |
dc.title | Memory degradation induced by attention in recurrent neural architectures | es_ES |
dc.type | journal article | es_ES |
dc.subject.unesco | UNESCO::CIENCIAS TECNOLÓGICAS | es_ES |
dc.identifier.doi | 10.1016/j.neucom.2022.06.056 | es_ES |
dc.accrualmethod | CI | es_ES |
dc.embargo.terms | 0 days | es_ES |
dc.type.hasVersion | VoR | es_ES |
dc.rights.accessRights | open access | es_ES |