An introduction of Markov decision process (MDP) and two algorithms that solve MDPs (value iteration, policy iteration) along with their Python implementations.

Overview

Markov Decision Process

A Markov decision process (MDP), by definition, is a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards. It consists of a set of states, a set of actions, a transition model, and a reward function. Here's an example.

This is a simple 4 x 3 environment, and each block represents a state. The agent can move left, right, up, or down from a state. The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with the wall or boundary results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a constant reward (e.g. of -0.01).

Also, a MDP usually has a discount factor γ , a number between 0 and 1, that describes the preference of an agent for current rewards over future rewards.

Policy

A solution to a MDP is called a policy π(s). It specifies an action for each state s. In a MDP, we aim to find the optimal policy that yields the highest expected utility. Here's an example of a policy.

Value Iteration

Value iteration is an algorithm that gives an optimal policy for a MDP. It calculates the utility of each state, which is defined as the expected sum of discounted rewards from that state onward.

This is called the Bellman equation. For example, the utility of the state (1, 1) in the MDP example shown above is:

For n states, there are n Bellman equations with n unknowns (the utilities of states). To solve this system of equations, value iteration uses an iterative approach that repeatedly updates the utility of each state (starting from zero) until an equilibrium is reached (converge). The iteration step, called a Bellman update, looks like this:

Here's the pseudocode for calculating the utilities of states.

Then, after the utilities of states are calculated, we can use them to select an optimal action for each state.

valueIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

(Modified) Policy Iteration

Policy iteration is another algorithm that solves MDPs. It starts with a random policy and alternates the following two steps until the policy improvement step yields no change:

(1) Policy evaluation: given a policy, calculate the utility U(s) of each state s if the policy is executed;

(2) Policy improvement: update the policy based on U(s).

For the policy evaluation step, we use a simplified version of the Bellman equation to calculate the utility of each state.

For n states, there are n linear equations with n unknowns (the utilities of states), which can be solved in O(n^3) time. To make the algorithm more efficient, we can perform some number of simplified Bellman updates (simplified because the policy is fixed) to get an approximation of the utilities instead of calculating the exact solutions.

Here's the pseudocode for policy iteration.

policyIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

Reference

Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach (3rd ed.).

Owner
Yu Shen
UCLA '24
Yu Shen
A JOSE implementation in Python

python-jose A JOSE implementation in Python Docs are available on ReadTheDocs. The JavaScript Object Signing and Encryption (JOSE) technologies - JSON

Michael Davis 1.2k Dec 28, 2022
Authentication for Django Rest Framework

Dj-Rest-Auth Drop-in API endpoints for handling authentication securely in Django Rest Framework. Works especially well with SPAs (e.g React, Vue, Ang

Michael 1.1k Jan 03, 2023
examify-io is an online examination system that offers automatic grading , exam statistics , proctoring and programming tests , multiple user roles

examify-io is an online examination system that offers automatic grading , exam statistics , proctoring and programming tests , multiple user roles ( Examiner , Supervisor , Student )

Ameer Nasser 4 Oct 28, 2021
Python's simple login system concept - Advanced level

Simple login system with Python - For beginners Creating a simple login system using python for beginners this repository aims to provide a simple ove

Low_Scarlet 1 Dec 13, 2021
CheckList-Api - Created with django rest framework and JWT(Json Web Tokens for Authentication)

CheckList Api created with django rest framework and JWT(Json Web Tokens for Aut

shantanu nimkar 1 Jan 24, 2022
Spotify User Token Generator Template

Spotify User Token Generator Template Quick Start $ pip3 install -r requirements

Arda Soyer 1 Feb 01, 2022
Todo app with authentication system.

todo list web app with authentication system. User can register, login, logout. User can login and create, delete, update task Home Page here you will

Anurag verma 3 Aug 18, 2022
Django CAS 1.0/2.0/3.0 client authentication library, support Django 2.0, 2.1, 2.2, 3.0 and Python 3.5+

django-cas-ng django-cas-ng is Django CAS (Central Authentication Service) 1.0/2.0/3.0 client library to support SSO (Single Sign On) and Single Logou

django-cas-ng 347 Dec 18, 2022
Django Auth Protection This package logout users from the system by changing the password in Simple JWT REST API.

Django Auth Protection Django Auth Protection This package logout users from the system by changing the password in REST API. Why Django Auth Protecti

Iman Karimi 5 Oct 26, 2022
Authentication, JWT, and permission scoping for Sanic

Sanic JWT Sanic JWT adds authentication protection and endpoints to Sanic. It is both easy to get up and running, and extensible for the developer. It

Adam Hopkins 229 Jan 05, 2023
Awesome Django authorization, without the database

rules rules is a tiny but powerful app providing object-level permissions to Django, without requiring a database. At its core, it is a generic framew

1.6k Dec 30, 2022
Crie seus tokens de autenticação com o AScrypt.

AScrypt tokens O AScrypt é uma forma de gerar tokens de autenticação para sua aplicação de forma rápida e segura. Todos os tokens que foram, mesmo que

Jaedson Silva 0 Jun 24, 2022
Mock authentication API that acceccpts email and password and returns authentication result.

Mock authentication API that acceccpts email and password and returns authentication result.

Herman Shpryhau 1 Feb 11, 2022
Login System Using Django

Login System Django

Nandini Chhajed 6 Dec 12, 2021
This python package provides a simple password reset strategy for django rest framework

Django Rest Password Reset This python package provides a simple password reset strategy for django rest framework, where users can request password r

Anexia 363 Dec 24, 2022
Per object permissions for Django

django-guardian django-guardian is an implementation of per object permissions [1] on top of Django's authorization backend Documentation Online docum

3.3k Jan 01, 2023
A generic, spec-compliant, thorough implementation of the OAuth request-signing logic

OAuthLib - Python Framework for OAuth1 & OAuth2 *A generic, spec-compliant, thorough implementation of the OAuth request-signing logic for Python 3.5+

OAuthlib 2.5k Jan 02, 2023
Django-registration (redux) provides user registration functionality for Django websites.

Description: Django-registration provides user registration functionality for Django websites. maintainers: Macropin, DiCato, and joshblum contributor

Andrew Cutler 920 Jan 08, 2023
Flask user session management.

Flask-Login Flask-Login provides user session management for Flask. It handles the common tasks of logging in, logging out, and remembering your users

Max Countryman 3.2k Dec 28, 2022
A Python package, that allows you to acquire your RecNet authorization bearer token with your account credentials!

RecNet-Login This is a Python package, that allows you to acquire your RecNet bearer token with your account credentials! Installation Done via git: p

Jesse 6 Aug 18, 2022