Safe Policy Optimization with Local Features

Overview

Safe Policy Optimization with Local Feature (SPO-LF)

This is the source-code for implementing the algorithms in the paper "Safe Policy Optimization with Local Generalized Linear Function Approximations" which was presented in NeurIPS-21.

Installation

There is requirements.txt in this repository. Except for the common modules (e.g., numpy, scipy), our source code depends on the following modules.

We also provide Dockerfile in this repository, which can be used for reproducing our grid-world experiment.

Simulation configuration

We manage the simulation configuration using hydra. Configurations are listed in config.yaml. For example, the algorithm to run should be chosen from the ones we implemented:

sim_type: {safe_glm, unsafe_glm, random, oracle, safe_gp_state, safe_gp_feature, safe_glm_stepwise}

Grid World Experiment

The source code necessary for our grid-world experiment is contained in /grid_world folder. To run the simulation, for example, use the following commands.

cd grid_world
python main.py sim_type=safe_glm env.reuse_env=False

For the monte carlo simulation while comparing our proposed method with baselines, use the shell file, run.sh.

We also provide a script for visualization. If you want to render how the agent behaves, use the following command.

python main.py sim_type=safe_glm env.reuse_env=True

Safety-Gym Experiment

The source code necessary for our safety-gym experiment is contained in /safety_gym_discrete folder. Our experiment is based on safety-gym. Our proposed method utilize dynamic programming algorithms to solve Bellman Equation, so we modified engine.py to discrtize the environment. We attach modified safety-gym source code in /safety_gym_discrete/engine.py. To use the modified library, please clone safety-gym, then replace safety-gym/safety_gym/envs/engine.py using /safety_gym_discrete/engine.py in our repo. Using the following commands to install the modified library:

cd safety_gym
pip install -e .

Note that MuJoCo licence is needed for installing Safety-Gym. To run the simulation, use the folowing commands.

cd safety_gym_discrete
python main.py sim_idx=0

We compare our proposed method with three notable baselines: CPO, PPO-Lagrangian, and TRPO-Lagrangian. The baseline implementation depends on safety-starter-agents. We modified run_agent.py in the repo source code.

To run the baseline, use the folowing commands.

cd safety_gym_discrete/baseline
python baseline_run.py sim_type=cpo

The environment that agent runs on is generated using generate_env.py. We provide 10 50*50 environments. If you want to generate other environments, you can change the world shape in safety_gym_discrete.py, and running the following commands:

cd safety_gym_discrete
python generate_env.py

Citation

If you find this code useful in your research, please consider citing:

@inproceedings{wachi_yue_sui_neurips2021,
  Author = {Wachi, Akifumi and Wei, Yunyue and Sui, Yanan},
  Title = {Safe Policy Optimization with Local Generalized Linear Function Approximations},
  Booktitle  = {Neural Information Processing Systems (NeurIPS)},
  Year = {2021}
}
Owner
Akifumi Wachi
Akifumi Wachi
IDAPatternSearch adds a capability of finding functions according to bit-patterns into the well-known IDA Pro disassembler based on Ghidra’s function patterns format.

IDA Pattern Search by Argus Cyber Security Ltd. The IDA Pattern Search plugin adds a capability of finding functions according to bit-patterns into th

David Lazar 48 Dec 29, 2022
Huskee: Malware made in Python for Educational purposes

𝐇𝐔𝐒𝐊𝐄𝐄 Caracteristicas: Discord Token Grabber Wifi Passwords Grabber Googl

chew 4 Aug 17, 2022
Learning to compose soft prompts for compositional zero-shot learning.

Compositional Soft Prompting (CSP) Compositional soft prompting (CSP), a parameter-efficient learning technique to improve the zero-shot compositional

Bats Research 32 Jan 02, 2023
ProxyLogon(CVE-2021-26855+CVE-2021-27065) Exchange Server RCE(SSRF->GetWebShell)

ProxyLogon For Python3 ProxyLogon(CVE-2021-26855+CVE-2021-27065) Exchange Server RCE(SSRF-GetWebShell) usage: python ProxyLogon.py --host=exchang

112 Dec 01, 2022
A Burp extension adding a passive scan check to flag parameters whose name or value may indicate a possible insertion point for SSRF or LFI.

BurpParamFlagger A Burp extension adding a passive scan check to flag parameters whose name or value may indicate a possible insertion point for SSRF

Allyson O'Malley 118 Nov 07, 2022
Attack SQL Server through gopher protocol

Attack SQL Server through gopher protocol

hack2fun 17 Nov 30, 2022
(D)arth (S)ide of the (L)og4j (F)orce, the ultimate log4j vulnerabilities assessor

DSLF DSLF stands for (D)arth (S)ide of the (L)og4j (F)orce. It is the ultimate log4j vulnerabilities assessor. It comes with four individual Python3 m

frontal 1 Jan 11, 2022
Fast and easy way to rollout on multiple GitLab project file a particular content.

Volatile Fast and easy way to rollout on multiple GitLab project file a particular content. Why ? After looking for a tool to simply enforce a develop

Lujeni 4 Jan 17, 2022
Security System using OpenCV

Security-System Security System using OpenCV Files in this Repository: email_send.py - This file contains python code to send an email when something

Mehul Patwari 1 Oct 28, 2021
IDA scripts for hypervisor (Hyper-v) analysis and reverse engineering automation

Re-Scripts IA32-VMX-Helper (IDA-Script) IA32-MSR-Decoder (IDA-Script) IA32 VMX Helper It's an IDA script (Updated IA32 MSR Decoder) which helps you to

Behrooz Abbassi 16 Oct 08, 2022
INFO 3350/6350, Spring 2022, Cornell

Information Science 3350/6350 Text mining for history and literature Staff and sections Instructor: Matthew Wilkens Graduate TAs: Federica Bologna, Ro

Wilkens Teaching 6 Feb 21, 2022
LinOTP - the open source solution for two factor authentication

LinOTP LinOTP - the Open Source solution for multi-factor authentication Copyright © 2010-2019 KeyIdentity GmbH Coypright © 2019- arxes-tolina GmbH In

LinOTP 462 Jan 02, 2023
宝塔面板Windows版提权方法

宝塔面板Windows提权方法 本项目整理一些宝塔特性,可以在无漏洞的情况下利用这些特性来增加提权的机会。

298 Dec 14, 2022
This is an advanced backdoor, created with Python

Backdoor This is a Backdoor, created with Python 3. Types of Commands: Downloading / Uploading files. Launching / Deleting / Reading file's content. S

swagkarna 28 Oct 28, 2022
对安卓APP注入MSF PAYLOAD,并且对手机管家进行BYPASS。

520_APK_HOOK 介绍 将msf生成的payload,注入到一个正常的apk文件中,重新打包后进行加固,bypass手机安全管家的检测。 项目地址: https://github.com/cleverbao/520apkhook 作者: BaoGuo 优点 相比于原始的msf远控,此版本ap

BaoGuo 368 Jan 02, 2023
集成crawlergo、xray、dirsearch、nmap等工具的src漏洞挖掘工具,使用docker封装运行;

tools下有几个工具,所以项目文件比较大,如果下载总是中断的话建议拆开下载各个项目然后直接拷贝dockefile和recon.py即可 0x01 hscan介绍 hscan是什么 hscan是一款旨在使用一条命令替代渗透前的多条扫描命令,通过集成crawlergo扫描和xray扫描、dirsear

102 Jan 04, 2023
ssh-audit is a tool for ssh server & client configuration auditing.

SSH server & client auditing (banner, key exchange, encryption, mac, compression, compatibility, security, etc)

Joe Testa 1.4k Dec 31, 2022
VPN Overall Reconnaissance, Testing, Enumeration and eXploitation Toolkit

Vortex VPN Overall Reconnaissance, Testing, Enumeration and Exploitation Toolkit Overview A very simple Python framework, inspired by SprayingToolkit,

315 Dec 28, 2022
Tool To generate Stable Undetected Payload

windowsPayload Tool To generate Stable Undetected Payload Don t Upload to Virus Total :) Follow on Social Media Platforms ScreenShots How to install +

youhacker55 117 Dec 30, 2022