› jenish_patel
All Projects
AI / ML2025

Belief-State Robot Localization

AI research project implementing robot self-localization under uncertainty across 4 phases: optimal BFS over belief-set state space → greedy strategy → CNN regressor trained to predict localization cost → 3-policy comparison. The agent navigates a maze without knowing its position, narrowing a belief set to a single cell.

Key Metric

CNN regressor (π₁) outperforms greedy baseline (π₀) · dual-policy π₂ evaluated across belief sizes 1–20

01

Overview

A 4-phase AI project tackling the fundamental problem of robot localization under uncertainty: an agent moves through a grid maze without knowing its exact position. Instead of a single known position, it maintains a belief set L — the set of all cells where it could possibly be. The agent must take actions to reduce |L| to 1 (localized). This mirrors real-world robotics where sensor noise and map uncertainty mean a robot can only reason probabilistically about its location.

02

Architecture & Approach

Phase 1 formulates localization cost C* as a BFS over the belief-set state space (each node is a frozenset of possible positions), computing the minimum number of moves to localize from any initial belief set. Phase 2 implements a greedy baseline strategy (π₀) that navigates toward dead-end/corner targets, which tend to collapse belief sets faster due to topological constraints. Phase 2 also generates training data. Phase 3 trains a CNN regressor in PyTorch to predict C* from the current belief state, encoded as a 6-channel 2D spatial tensor: one channel for blocked cells, one for the current belief positions, and four directional movement channels. Training uses a LazyBeliefDataset (on-the-fly episode generation) with MSE loss. Phase 4 compares three policies: π₀ (greedy), π₁ (uses the trained Ĉ₀ model to pick moves), and π₂ (dual-CNN: uses Ĉ₁ model guided by π₁), plotting average moves to localize vs. initial belief size |L| from 1 to 20. Two trained model checkpoints are saved: best_c0_model.pt and best_c1_model.pt.

03

Results & Outcome

π₁ (CNN-guided) consistently outperforms π₀ (greedy) across belief sizes, particularly at larger initial uncertainty (|L| > 5). π₂ (dual-CNN) shows further improvement, demonstrating that composing learned cost heuristics reduces localization steps. The comparison plot across belief sizes 1–20 confirms that learned policies better exploit maze topology than pure greedy movement.

Tech Stack

PythonPyTorchBFSCNNReinforcement LearningNumPymatplotlib