Q-learning Simulation



Stewart Crawford, March 2004

Like it? Email me...

Agents are birthed in the GREEN FIELD and they travel out into the world searching for the 100 ingot reward at the GOAL. The world starts in darkness, in which agents move randomly, and when an agent finds any info about a reward, they discount that info slightly and pass it back to the previous square that they most recently came from. The lightness of a square indicates the knowledge of the square's best expected reward known at that time. Light spreads through exploration.

Born into the world are EXPLOiTERS (yellow-heads) and, less frequently, EXPLORERS (red-heads). EXPLOiTERS do just that, greedily using info, moving from square to square by making the best possible step, given what's known of their world at that time. EXPLORERS purposefully make non-optimal steps in order to explore. They will bring light to the darkness where the Exploiters fail to tread. The Explorers, in the long run, enlighten the universe.

TECHNICAL NOTES: This simulation implements the Q-learning methodology. The world is 4-connected (diagonal moves not allowed). The final steady-state values learned for each square are the same as the Dynamic Programming solution; this is an incremental, exploration-based avenue to learn that Dynamic Programming solution.. The red heads are Boltzman Explorers -- initially they move randomly, but over time they explore less and exploit more.


This applet was generated using AgentSheets® 1.5.2 with Ristretto®.

[help | more applets]