Kaiyu Zheng zkytony

Why

To answer the questions, (1) how was the correctness of pomdp_py's implementation of POUCT validated, and (2) does it behave correctly? Prompted by this issue.

How

In the Tiger domain, with initial belief [0.5, 0.5], compare the value at the root of the POUCT search tree built after planning for the first action with the optimal value produced by pomdp-solve's vi pruning algorithm (an optimal solver) on the Tiger domain. The value in POUCT search tree should be an estimate of the optimal value and should be close.

	;;; mypy-mode.el --- Navigate Mypy Output in Emacs

	;; Copyright (C) 2023 Kaiyu Zheng

	;; Author: Your Name <[email protected]>
	;; Keywords: convenience
	;; Version: 0.0.1
	;; Package-Requires: ((emacs "24.3"))

	;;; Commentary:

	import random
	import pprint
	import pomdp_py
	import seaborn as sns
	import pandas as pd
	import matplotlib.pyplot as plt

	from pomdp_py.algorithms.value_function import expected_reward, belief_observation_model
	from pomdp_py.problems.tiger.tiger_problem import TigerProblem, TigerState