NAGIOS: RODERIC FUNCIONANDO

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulation

Repositori DSpace/Manakin

IMPORTANT: Aquest repositori està en una versió antiga des del 3/12/2023. La nova instal.lació está en https://roderic.uv.es/

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulation

dc.contributor.author	Martinez Gil, Francisco Antonio
dc.contributor.author	Lozano Ibáñez, Miguel
dc.contributor.author	García-Fernández, Ignacio
dc.contributor.author	Romero, Pau
dc.contributor.author	Serra, Dolors
dc.contributor.author	Sebastián Aguilar, Rafael
dc.date.accessioned	2020-10-05T14:06:33Z
dc.date.available	2020-10-05T14:06:33Z
dc.date.issued	2020
dc.identifier.citation	Martinez Gil, Francisco Antonio Lozano Ibáñez, Miguel García-Fernández, Ignacio Romero, Pau Serra, Dolors Sebastián Aguilar, Rafael 2020 Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulation Mathematics 8 1479
dc.identifier.uri	https://hdl.handle.net/10550/75739
dc.description.abstract	Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(λ)) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.
dc.language.iso	eng
dc.relation.ispartof	Mathematics, 2020, vol. 8, num. 1479
dc.subject	Aprenentatge
dc.subject	Informàtica
dc.title	Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulation
dc.type	journal article	es_ES
dc.date.updated	2020-10-05T14:06:33Z
dc.identifier.doi	10.3390/math8091479
dc.identifier.idgrec	140666
dc.rights.accessRights	open access	es_ES