The RAP community of practice includes all analysts and data scientists who are interested in adopting the working practices included in reproducible analytical pipelines (RAP) at NHS Digital.

Overview

Warning - this repository is a snapshot of a repository internal to NHS Digital. This means that links to videos and some URLs may not work.

Repository owner: NHS Digital Analytical Services

Email: [email protected]

To contact us raise an issue on Github or via email and will respond promptly.

RAP community of practice

Welcome to the landing page for the RAP community of practice repo.

You can learn all about Reproducible analytical pipelines (RAP) on our what is RAP page. In a nutshell though, RAP is becoming the standard for publishing analytical outputs in government. RAP combines a number of ways of working that help to improve the reliability, transparency, and speed of statistics publications. Reproducible Analytical Pipelines follow the principles of the AQUA Book guidelines, which revolve around analysis being reproducible, auditable, transparent, and quality assured.

The RAP community of practice includes all analysts and data scientists who are interested in adopting the working practices included in reproducible analytical pipelines (RAP). This repo is a central repository for resources and guidance to help teams adopting RAP practices. There is an associated [MS Teams page] where you can introduce yourself, ask for help, or discuss different approaches. Over time we hope to build up a community of people who can self-support and further develop these ways of working.

The community of practice aims to support teams in adopting RAP practices through:

  1. Offering in-person support as teams establish new working practices
  2. Producing learning materials that offer reusable templates adapted for the NHSD analytical environment

This work is prompted by the observations that teams can struggle to adopt RAP practices without direct support. While no one element of RAP is particularly difficult, learning several new skills at the same time as delivering BAU is challenging. Teams can struggle to find the defended time to embed these practices. See the Statistics Authority report on the barriers to RAP adoption for more information. Luckily, in NHSD we have strong senior support for RAP and many teams have already begun to adopt many of the practices included in RAP. Consequently, we already have a large pool of skilled, ethusiastic analysts who are willing to help others. These resources also aim to support the goals laid out in the Goldacre report Bringing NHS data analysis into the 21st century and to align with Tim Berners-Lee's Five star data principles.

Support and training

If your team is embarking upon a RAP journey, you should look at our what is RAP page and try to complete the self-assessment. From there, we recommend reaching out for some in-person support. The RAP Champion Function (within the Data Science Skilled Team) can offer support in many forms:

  • Reviewing your RAP work and assessing your progress against the levels of RAP
  • Peer review of code
  • Workshops for a specific RAP capability
  • Consultancy style engagement where we plan a migration strategy
  • Pair coding
  • Shadowing another team

If you want to talk about any of this then please reach out on the [RAP community of practice MS Teams] page (internal to NHSD).

We maintain a list of people who are willing to dedicate some time to support others. Please add your name to the mix if you are willing to support someone else. You don't need to be an expert - just willing to share what you know.

Tutorials and resources

As we work alongside teams, we try to produce reusable learning materials pitched at specifically supporting NHSD teams. We try (with partial success) to avoid reproducing guidance that is easily available online. Instead, we link to lots of external resources where you can self-serve. Our focus instead aims to create some bespoke guidance that lays out how you would accomplish these practices in the NHSD setting.

Here are some of the initial resources:

These resources are demand-driven so if you want something then please ask on the [MS Teams page]. We would also ask you to contribute if you can improve on any of the resources or can fill in any other gaps.

The resources are not intended to be prescriptive. There are many ways to accomplish a task and teams have valid reasons for choosing other approaches. Instead the intention of the resources provided here is to offer a way in for teams who want to adopt good practices that they have heard about but don't know where to start.

Misc

We have taken inspiration from the NHSD software engineering COP. It has tons of great material so I encourage you to read and reflect on these working practices.

Licence

RAP Community of Practice codebase is released under the MIT License.

The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.

Comments
  • Dead link

    Dead link

    opened by abbieprescott 4
  • dependency management

    dependency management "not possible in DAE"

    In Levels of RAP it say: Does your repo include dependency management? (i.e. requirements.txt or conda environment for RDS users. Not possible in DAE)

    It's not strictly true that this cannot be described for DAE - though it is more limited. One can describe the cluster used (runtime, libraries etc).

    opened by SamHollings 2
  • RAP Publishing Checks - Clarify what are credentials and secrets

    RAP Publishing Checks - Clarify what are credentials and secrets

    We've had some feedback that the part of the publishing checks that says "no credentials or secrets" is not clear, as analysts have not seen these terms before.

    The following text might make things easier to understand:

    Credentials or secrets are essentially passwords that computers use for encrypted communication or access to services. For example, with many APIs (like the Google Maps API) you must supply a credential code to access the service. Often times these codes look like long strange combinations of letters and numbers (l79sDgH9s...). We must not share our passwords publicly, so you should not commit credentials and secrets.

    opened by goodyguts 2
  • Environment and dependecy management - needs to be clearer

    Environment and dependecy management - needs to be clearer

    In the "levels of RAP" people become confused by environment and dependency management - we need to link to page which very clearly describe these, what the point of it is, and how they can know if they're meeting this requirement.

    opened by SamHollings 2
  • Pyspark guidance

    Pyspark guidance

    I'm not a fan of referring to it as a "flavour of python" (about PYspark page)

    I think Pyspark should be contained underneath Python.

    I also think it should make it clear that distribution of processing only occurs if its set up right - spark on a normal laptop will not be any more powerful than say pandas. On a big cluster in databricks is a different story.

    I think this page might also need a reference to other python datastructures - and how there is a right tool for the right job.

    duplicate 
    opened by SamHollings 1
  • Split out Terminal guidance from

    Split out Terminal guidance from "git" guidance.

    The terminal guidance is contained within the git guidance - but the terminal is a separate tool which can be used for many purposes - probably better to have it as its own level alongside Python, git etc, and then for these pages to be referenced by the other technologies.

    opened by SamHollings 1
  • code in the open - topics and add to data-analytics-services

    code in the open - topics and add to data-analytics-services

    On the "how to publish your code in the open page" - we should tell people they should add their publication to the page: https://github.com/NHSDigital/data-analytics-services and also that they should set appropriate topics for their publication, i.e. nhs-digital-publication

    opened by SamHollings 1
  • Signpost resources to ensure accessibility requirements are met

    Signpost resources to ensure accessibility requirements are met

    This is most relevant for any outputs produced. See guidance.

    As a starting point, the python visualisation guide should include tips on how to make visualisations more accessible:

    • The Home Office has some posters on accessible design
    • There are also countless online resources on accessibility relating to colour-blindness, visual impairments etc.

    We should also consider including a note on accessibility in the design of RAP. A pipeline would be difficult to reproduce if a user could not access any part of the pipeline. This includes README files, as well as output types.

    opened by harrietrs 1
  • Environment management external links

    Environment management external links

    We should do more to explain how environment management plays into reproducibility.

    This page is quite useful and would save us duplicating: https://realpython.com/python-virtual-environments-a-primer/

    opened by connor1q 1
  • Broken link

    Broken link

    https://github.com/NHSDigital/rap-community-of-practice/blob/main/python/project-structure-and-packaging.md#generic-package-template

    There is a broken link to the generic package template in the section above

    opened by connor1q 1
  • Contributions section

    Contributions section

    We're keen to encourage external improvements to these resources but we don't yet have a contributions section that explains how we will review and moderate.

    opened by connor1q 1
  • Code review page ideas

    Code review page ideas

    We have recently been doing some code reviewing. Here are a few things that we think might make the page more helpful.

    Code review before merge request

    Code should be reviewed with someone before submitting a merge request. The reviewer should consider whether the code needs to be refactored or redesigned.

    I'm not sure that I always agree with this. Merge requests make it really easy to leave comments on different parts of the code, and in some ways make the life of the reviewer and the merge request submitter easier. Maybe rephrase as

    You don't have to save reviewing your code until the end. You can do small reviewing and also pair programming while developing the ticket. Seeking feedback sooner could mean you save time because you do not have to change as much when the final review happens later.

    Different types of code review

    There are different types of code review that you can get. It may be worth highlighting them.

    1. Merge request code review

      A standard review process that checks whether changes to the codebase are acceptable. You focus only on the code that has changed. It should be relatively quick, and very regular (one every time you implement a new feature). Normally done by a member of the team.

    2. Full code review

      A code review where someone looks at all your code together, and gives you overall feedback. This review allows someone to look at the bigger picture, rather than one individual feature. These reviews take longer, and are less regular. Normally done by members outside your team, so that it is a fresh pair of eyes.

    3. Fitness to publish checks

      A code review to check the code is okay to publish. Note that, in the code review, you will normally limit yourself to making suggestions that you want completed before the code is published. This may mean you avoid suggesting big changes to the code, and instead focus in on checks like ensuring documentation is well written, or removing passwords from the code.

    Maybe split code review checklist into beginner and advanced items?

    One of the items on the code review checklist is

    Documentation is hosted for easy access. GitHub Pages and Read the Docs provide a free service for hosting documentation publicly.

    Even with advanced teams in data services I do not see them doing this. It might be worth prioritizing, so that the checklist is less overwhelming.

    Maybe organise the checklist items by the RAP level the team is aiming for.

    on jira workplan 
    opened by goodyguts 2
  • 03_quality-assuring-analytical-ouputs page not clearly linked with levels of RAP

    03_quality-assuring-analytical-ouputs page not clearly linked with levels of RAP

    The AQUA page (https://github.com/NHSDigital/rap-community-of-practice/blob/main/implementing_RAP/general_guidance/quality-assuring-analytical-ouputs.md) is not clearly associated with the levels of RAP and so people can find it a bit confusing when and how they should be following it.

    We need to more clearly link it into peoples workflow when planning out RAP (some of it is beyond RAP and more general guidance on managing analytical work), and perhaps reduce duplication by removing those bits already covered by the "levels of RAP" - and making these clear.

    on jira workplan 
    opened by SamHollings 1
  • Clean code guidance

    Clean code guidance

    some teams want to use clean code - we need guidance on the best way to approach this for analytical code, why you would want to do it, and what to watch out for.

    on jira workplan 
    opened by SamHollings 2
Releases(v1.1.0)
  • v1.1.0(Dec 21, 2022)

    What's Changed

    Automatic Release Notes

    • Release v1.1.0 by @xiyaozhuang in https://github.com/NHSDigital/rap-community-of-practice/pull/35

    New Contributors

    • @xiyaozhuang made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/35

    Full Changelog: https://github.com/NHSDigital/rap-community-of-practice/compare/v1.0.0...v1.1.0

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Dec 6, 2022)

    What Changed

    Automatic release notes

    • Hr 1188 r git by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/2
    • Add Intro to R link by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/3
    • Improving layout and expanding rollout section by @connor1q in https://github.com/NHSDigital/rap-community-of-practice/pull/4
    • Cq updates by @connor1q in https://github.com/NHSDigital/rap-community-of-practice/pull/5
    • Hr changes by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/9
    • Hr updates to git by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/10
    • Update publishing code in the open by @harrietrs in https://github.com/NHSDigital/rap-community-of-practice/pull/20
    • Sh new front page by @SamHollings in https://github.com/NHSDigital/rap-community-of-practice/pull/22
    • Restructure and edit files by @abbieprescott in https://github.com/NHSDigital/rap-community-of-practice/pull/23
    • Create gh-pages version by @harrietrs in https://github.com/NHSDigital/rap-community-of-practice/pull/31
    • add two new guides and pr prep by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/32
    • Publishes when to stop coding guide by @josephwilson8-nhs in https://github.com/NHSDigital/rap-community-of-practice/pull/33
    • Added new improved guides on virtual environments by @xiyaozhuang in https://github.com/NHSDigital/rap-community-of-practice/pull/34

    New Contributors

    • @helrich made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/2
    • @connor1q made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/4
    • @harrietrs made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/20
    • @SamHollings made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/22
    • @abbieprescott made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/23
    • @josephwilson8-nhs made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/33
    • @xiyaozhuang made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/34

    Full Changelog: https://github.com/NHSDigital/rap-community-of-practice/commits/v1.0.0

    Source code(tar.gz)
    Source code(zip)
Owner
NHS Digital
NHS Digital Public Repository
NHS Digital
Implements a polyglot REPL which supports multiple languages and shared meta-object protocol scope between REPLs.

MetaCall Polyglot REPL Description This repository implements a Polyglot REPL which shares the state of the meta-object protocol between the REPLs. Us

MetaCall 10 Dec 28, 2022
Automated Birthday Wisher built using Python

Automated Birthday Wisher This Automation of wishing Birthday is achieved using Python. Never forget to wish birthday! Table of contents Overview Scre

yashviradia 1 Nov 29, 2021
create cohort visualizations for a subscription business

pycohort The main revenue generator for subscription businesses is recurring payments. There might be additional one-time offerings but the number of

Yalim Demirkesen 4 Sep 09, 2022
Grouping nucleotide coordinate ranges.

NuclRanger Grouping nucleotide coordinate ranges. A quick pre-processing step for "bedtools getfasta":- https://bedtools.readthedocs.io/en/latest/cont

Sujanavan Tiruvayipati 1 Oct 04, 2022
Gunakan Dengan Bijak!!

YMBF Made with ❤️ by ikiwzXD_ menu Results notice me: if you get cp results, save 3/7 days then log in. Install script on Termux $ pkg update && pkg u

Ikiwz 0 Jul 11, 2022
A timer for bird lovers, plays a random birdcall while displaying its image and info.

Birdcall Timer A timer for bird lovers. Siriema hatchling by Junior Peres Junior Background My partner needed a customizable timer for sitting and sta

Marcelo Sanches 1 Jul 08, 2022
Python API for HotBits random data generator

HotBits Python API Python API for HotBits random data generator. Description This project is random data generator. It uses is HotBits API web service

Filip Š 2 Sep 11, 2020
Time tracking program that will format output to be easily put into Gitlab

time_tracker Time tracking program that will format output to be easily put into Gitlab. Feel free to branch and use it yourself! Getting Started Clon

Jake Strasler 2 Oct 13, 2022
Gmvault: Backup and restore your gmail account

Gmvault: Backup and restore your gmail account Gmvault is a tool for backing up your gmail account and never lose email correspondence. Gmvault is ope

Guillaume Aubert 3.5k Jan 01, 2023
A Python library to simulate a Zoom H6 recorder remote control

H6 A Python library to emulate a Zoom H6 recorder remote control Introduction This library allows you to control your Zoom H6 recorder from your compu

Matias Godoy 68 Nov 02, 2022
MIB2 STD ZR Firmware Upgrade

Upgrade MIB2 STD ZR Firmware (without Navigation) About This repository contains some scripts and documentation how to upgrade the MIB2 firmware to a

Fabian 18 Dec 29, 2022
Groupe du projet Python en 2TL2-4

Présentation Projet EpheCom Ce logiciel a été développé dans le cadre scolaire. EpheCom est un logiciel de communications - vocale et écrite - en temp

1 Dec 26, 2021
A cookiecutter to start a Python package with flawless practices and a magical workflow 🧙🏼‍♂️

PyPackage Cookiecutter This repository is a cookiecutter to quickly start a Python package. It contains a ton of very useful features 🐳 : Package man

Daniel Leal 16 Dec 13, 2021
🗽 Like yarn outdated/upgrade, but for pip. Upgrade all your pip packages and automate your Python Dependency Management.

pipupgrade The missing command for pip Table of Contents Features Quick Start Usage Basic Usage Docker Environment Variables FAQ License Features Upda

Achilles Rasquinha 529 Dec 31, 2022
Penelope Shell Handler

penelope Penelope is an advanced shell handler. Its main aim is to replace netcat as shell catcher during exploiting RCE vulnerabilities. It works on

293 Dec 30, 2022
The tool helps to find hidden parameters that can be vulnerable or can reveal interesting functionality that other hunters miss.

The tool helps to find hidden parameters that can be vulnerable or can reveal interesting functionality that other hunters miss. Greater accuracy is achieved thanks to the line-by-line comparison of

197 Nov 14, 2022
A simple weather app.

keather A simple weather app. This is currently not finished. Dependencies: yay -S python-beautifulsoup4 tk

1 Jan 09, 2022
Security-related flags and options for C compilers

Getting the maximum of your C compiler, for security

135 Nov 11, 2022
Python client library for the Databento API

Databento Python Library The Databento Python client library provides access to the Databento API for both live and historical data, from applications

Databento, Inc. 35 Dec 24, 2022
A minimal configuration for a dockerized kafka project.

Docker Kafka Quickstart A minimal configuration for a dockerized kafka project. Usage: Run this command to build kafka and zookeeper containers, and c

Nouamane Tazi 5 Jan 12, 2022