× type: MemoryGym
name: MysteryPath-Grid-v0
frame_skip: 1
last_action_to_obs: False
last_reward_to_obs: False
obs_stacks: 1
grayscale: False
resize_vis_obs: [84, 84]
positional_encoding: False
reset_params:
        start-seed: 200001
        num-seeds: 1
        agent_scale: 0.25
        cardinal_origin_choice: [0, 1, 2, 3]
        show_origin: True
        show_goal: True
        visual_feedback: True
        reward_goal: 1.0
        reward_fall_off: 0.0
        reward_path_progress: 0.0
        seed: 200001
reward_normalization: 0

× load_model: False
model_path:
checkpoint_interval: 500
activation: relu
vis_encoder: cnn
vec_encoder: linear
num_vec_encoder_units: 128
hidden_layer: default
num_hidden_layers: 1
num_hidden_units: 512
recurrence:
        layer_type: gru
        sequence_length: -1
        hidden_state_size: 512
        hidden_state_init: zero
        reset_hidden_state: True
        residual: False
        num_layers: 1

× n_workers: 32
worker_steps: 512

× algorithm: PPO
resume_at: 0
gamma: 0.995
lamda: 0.95
updates: 10000
epochs: 3
refresh_buffer_epoch: -1
n_mini_batches: 8
advantage_normalization: no
value_coefficient: 0.5
max_grad_norm: 0.25
share_parameters: True
learning_rate_schedule:
        initial: 0.000275
        final: 1e-05
        power: 1.0
        max_decay_steps: 10000
beta_schedule:
        initial: 0.001
        final: 1e-06
        power: 1.0
        max_decay_steps: 10000
clip_range_schedule:
        initial: 0.1
        final: 0.1
        power: 1.0
        max_decay_steps: 10000
obs_reconstruction_schedule: {'initial': 0.0}

Mystery Path Grid (Cues on)
+ Gated Recurrent Unit

Frame Rate:

Show Ground Truth Estimation

Show Decoder Video

Show Agent Video

Step:

Number of Top Attention Scores:

Mystery Path Grid (Cues on) + Gated Recurrent Unit

Mystery Path Grid (Cues on)
+ Gated Recurrent Unit