An introduction of Markov decision process (MDP) and two algorithms that solve MDPs (value iteration, policy iteration) along with their Python implementations.

Overview

Markov Decision Process

A Markov decision process (MDP), by definition, is a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards. It consists of a set of states, a set of actions, a transition model, and a reward function. Here's an example.

This is a simple 4 x 3 environment, and each block represents a state. The agent can move left, right, up, or down from a state. The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with the wall or boundary results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a constant reward (e.g. of -0.01).

Also, a MDP usually has a discount factor γ , a number between 0 and 1, that describes the preference of an agent for current rewards over future rewards.

Policy

A solution to a MDP is called a policy π(s). It specifies an action for each state s. In a MDP, we aim to find the optimal policy that yields the highest expected utility. Here's an example of a policy.

Value Iteration

Value iteration is an algorithm that gives an optimal policy for a MDP. It calculates the utility of each state, which is defined as the expected sum of discounted rewards from that state onward.

This is called the Bellman equation. For example, the utility of the state (1, 1) in the MDP example shown above is:

For n states, there are n Bellman equations with n unknowns (the utilities of states). To solve this system of equations, value iteration uses an iterative approach that repeatedly updates the utility of each state (starting from zero) until an equilibrium is reached (converge). The iteration step, called a Bellman update, looks like this:

Here's the pseudocode for calculating the utilities of states.

Then, after the utilities of states are calculated, we can use them to select an optimal action for each state.

valueIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

(Modified) Policy Iteration

Policy iteration is another algorithm that solves MDPs. It starts with a random policy and alternates the following two steps until the policy improvement step yields no change:

(1) Policy evaluation: given a policy, calculate the utility U(s) of each state s if the policy is executed;

(2) Policy improvement: update the policy based on U(s).

For the policy evaluation step, we use a simplified version of the Bellman equation to calculate the utility of each state.

For n states, there are n linear equations with n unknowns (the utilities of states), which can be solved in O(n^3) time. To make the algorithm more efficient, we can perform some number of simplified Bellman updates (simplified because the policy is fixed) to get an approximation of the utilities instead of calculating the exact solutions.

Here's the pseudocode for policy iteration.

policyIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

Reference

Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach (3rd ed.).

Owner
Yu Shen
UCLA '24
Yu Shen
Script that provides your TESLA access_token and refresh_token

TESLA tokens This script helps you get your TESLA access_token and refresh_token in order to connect to third party applications (Teslamate, TeslaFi,

Bun-Ny TAN 3 Apr 28, 2022
Login-python - Login system made in Python, using native libraries

login-python Sistema de login feito 100% em Python, utilizando bibliotecas nativ

Nicholas Gabriel De Matos Leal 2 Jan 28, 2022
Alisue 299 Dec 06, 2022
Flask user session management.

Flask-Login Flask-Login provides user session management for Flask. It handles the common tasks of logging in, logging out, and remembering your users

Max Countryman 3.2k Dec 28, 2022
JWT authentication for Pyramid

JWT authentication for Pyramid This package implements an authentication policy for Pyramid that using JSON Web Tokens. This standard (RFC 7519) is of

Wichert Akkerman 73 Dec 03, 2021
Pingo provides a uniform API to program devices like the Raspberry Pi, BeagleBone Black, pcDuino etc.

Pingo provides a uniform API to program devices like the Raspberry Pi, BeagleBone Black, pcDuino etc. just like the Python DBAPI provides an uniform API for database programming in Python.

Garoa Hacker Clube 12 May 22, 2022
A JOSE implementation in Python

python-jose A JOSE implementation in Python Docs are available on ReadTheDocs. The JavaScript Object Signing and Encryption (JOSE) technologies - JSON

Michael Davis 1.2k Dec 28, 2022
The ultimate Python library in building OAuth, OpenID Connect clients and servers. JWS,JWE,JWK,JWA,JWT included.

Authlib The ultimate Python library in building OAuth and OpenID Connect servers. JWS, JWK, JWA, JWT are included. Authlib is compatible with Python2.

Hsiaoming Yang 3.4k Jan 04, 2023
A full Rest-API With Oauth2 and JWT for request & response a JSON file Using FastAPI and SQLAlchemy 🔑

Pexon-Rest-API A full Rest-API for request & response a JSON file, Building a Simple WorkFlow that help you to Request a JSON File Format and Handling

Yasser Tahiri 15 Jul 22, 2022
Toolkit for Pyramid, a Pylons Project, to add Authentication and Authorization using Velruse (OAuth) and/or a local database, CSRF, ReCaptcha, Sessions, Flash messages and I18N

Apex Authentication, Form Library, I18N/L10N, Flash Message Template (not associated with Pyramid, a Pylons project) Uses alchemy Authentication Authe

95 Nov 28, 2022
Plotly Dash plugin to allow authentication through 3rd party OAuth providers.

dash-auth-external Integrate your dashboards with 3rd parties and external OAuth providers. Overview Do you want to build a Plotly Dash app which pull

James Holcombe 15 Dec 11, 2022
Social auth made simple

Python Social Auth Python Social Auth is an easy-to-setup social authentication/registration mechanism with support for several frameworks and auth pr

Matías Aguirre 2.8k Dec 24, 2022
Easy and secure implementation of Azure AD for your FastAPI APIs 🔒 Single- and multi-tenant support.

Easy and secure implementation of Azure AD for your FastAPI APIs 🔒 Single- and multi-tenant support.

Intility 220 Jan 05, 2023
Graphical Password Authentication System.

Graphical Password Authentication System. This is used to increase the protection/security of a website. Our system is divided into further 4 layers of protection. Each layer is totally different and

Hassan Shahzad 12 Dec 16, 2022
Some scripts to utilise device code authorization for phishing.

OAuth Device Code Authorization Phishing Some scripts to utilise device code authorization for phishing. High level overview as per the instructions a

Daniel Underhay 6 Oct 03, 2022
Ready to use and customizable Authentications and Authorisation management for FastAPI ⚡

AuthenticationX 💫 Ready-to-use and customizable Authentications and Oauth2 management for FastAPI ⚡ Source Code: https://github.com/yezz123/AuthX Doc

Yasser Tahiri 404 Dec 27, 2022
This is a Token tool that gives you many options to harm the account.

Trabis-Token-Tool This is a Token tool that gives you many options to harm the account. Utilities With this tools you can do things as : ·Delete all t

Steven 2 Feb 13, 2022
Python One-Time Password Library

PyOTP - The Python One-Time Password Library PyOTP is a Python library for generating and verifying one-time passwords. It can be used to implement tw

PyAuth 2.2k Dec 26, 2022
This project is an open-source project which I made due to sharing my experience around the Python programming language.

django-tutorial This project is an open-source project which I made due to sharing my experience around the Django framework. What is Django? Django i

MohammadMasoumi 6 May 12, 2022
API with high performance to create a simple blog and Auth using OAuth2 ⛏

DogeAPI API with high performance built with FastAPI & SQLAlchemy, help to improve connection with your Backend Side to create a simple blog and Cruds

Yasser Tahiri 111 Jan 05, 2023