A simple XLSX/CSV reader - to dictionary converter

Overview

Test Python package codecov Upload Python Package to PyPI PythonVersions Sourcery Black Snyk Downloads Twitter Follow

sheet2dict

A simple XLSX/CSV reader - to dictionary converter

Installing

To install the package from pip, first run:

python3 -m pip install --no-cache-dir sheet2dict

Required pip packages for sheet2doc: csv, openpyxl

Usage

This library has 2 main features: reading a spreadsheet files and converting them to array of python dictionaries.

- XLSX

Use xlsx_to_dict() method when converting form spreadsheets.
Supported file formats for spreadsheets are: .xlsx,.xlsm,.xltx,.xltm
Spreadsheets with multiple worksheets are supported. If no sheet is specified, the active sheet is selected. If there is only one sheet, it is considered active.

# Import the library
from sheet2dict import Worksheet

# Create an object
ws = Worksheet()

# Convert active sheet (without specifying sheet name)
ws.xlsx_to_dict(path='inventory.xlsx')

# Convert the 'Main Warehouse' sheet of the 'inventory.xslx' spreadsheet file.
ws.xlsx_to_dict(path='inventory.xlsx', select_sheet='Main Warehouse')

# object.header returns first row with the data in a spreadsheet 
print(ws.header)

# object.sheet_items returns converted rows as dictionaries in the array 
print(ws.sheet_items)

You can parse data when worksheet is an object

# Import the library
from sheet2dict import Worksheet

# Example: read spreadsheet as object
path = 'inventory.xlsx'
xlsx_file = open(path, 'rb')
xlsx_file = BytesIO(xlsx_file.read())

# Parse spreadsheet from object
ws = Worksheet()
ws.xlsx_to_dict(path=xlsx_file)
print(ws.header)

- CSV

Use csv_to_dict() method when converting form csv.
CSV is a format with many variations, better handle encodings and delimiters on user side and not within module itself.

# Import the library
from sheet2dict import Worksheet

# Create an object
ws = Worksheet()

# Read CSV file
csv_file = open('inventory.csv', 'r', encoding='utf-8-sig')

# Convert 
ws.csv_to_dict(csv_file=csv_file, delimiter=';')

# object.header returns first row with the data in a spreadsheet 
print(ws.header)

# object.sheet_items returns converted rows as dictionaries in the array 
print(ws.sheet_items)

- Other functions

Worksheet object.header returns first row with the data in a spreadsheet

>> ws.header {'country': 'SK', 'city': 'Bratislava', 'citizens': '400000', 'random_field': 'cc'}">
Python 3.9.1
[Clang 12.0.0 (clang-1200.0.32.28)] on darwin
>>> from sheet2dict import Worksheet
>>> ws = Worksheet()
>>> ws.xlsx_to_dict(path="inventory.xlsx")

>>> ws.header
{'country': 'SK', 'city': 'Bratislava', 'citizens': '400000', 'random_field': 'cc'}

Worksheet object.sanitize_sheet_items removes None or empty dictionary keys from sheet_items

>> ws.sheet_items [ {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', None: '22', 'random_field': 'cc'}, {'country': 'UK', 'city': 'London', 'citizens': '2000000', None: '33', 'random_field': 'cc'} ] >>> ws.sanitize_sheet_items [ {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', 'random_field': 'cc'}, {'country': 'UK', 'city': 'London', 'citizens': '2000000', 'random_field': 'cc'} ]">
>>> from sheet2dict import Worksheet
>>> ws = Worksheet()
>>> ws.xlsx_to_dict(path="inventory.xlsx")

>>> ws.sheet_items
[
  {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', None: '22', 'random_field': 'cc'},
  {'country': 'UK', 'city': 'London', 'citizens': '2000000', None: '33', 'random_field': 'cc'}
]

>>> ws.sanitize_sheet_items
[
  {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', 'random_field': 'cc'},
  {'country': 'UK', 'city': 'London', 'citizens': '2000000', 'random_field': 'cc'}
]

Contributing and Code of Conduct

Contributing to sheet2dict

As an open source project, sheet2dict welcomes contributions of many forms.
Please read and follow our Contributing to sheet2dict

Contributors:

  • Thanks to 白一百 (bái-yī-bǎi) for making sheet2dict work with multi-sheet Excel files.

Code of Conduct

As a contributor, you can help us keep the sheet2dict project open and inclusive.
Please read and follow our Code of Conduct

You might also like...
SCTYMN is a GitHub repository that includes some simple scripts(currently only python scripts) that can be useful.

Simple Codes That You Might Need SCTYMN is a GitHub repository that includes some simple scripts(currently only python scripts) that can be useful. In

Żmija is a simple universal code generation tool.

Żmija Żmija is a simple universal code generation tool. It is intended to be used as a means to generate code that is both efficient and easily mainta

 Simple yet powerful CAD (Computer Aided Design) library, written with Python.
Simple yet powerful CAD (Computer Aided Design) library, written with Python.

Py-MADCAD it's time to throw parametric softwares out ! Simple yet powerful CAD (Computer Aided Design) library, written with Python. Installation

A simple USI Shogi Engine written in python using python-shogi.

Revengeshogi My attempt at creating a USI Shogi Engine in python using python-shogi. Current State of Engine Currently only generating random moves us

A simple document management REST based API for collaboratively interacting with documents

documan_api A simple document management REST based API for collaboratively interacting with documents.

Django-Text-to-HTML-converter - The simple Text to HTML Converter using Django framework

Django-Text-to-HTML-converter This is the simple Text to HTML Converter using Dj

Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files

pyexcel - Let you focus on data, instead of file formats Support the project If your company has embedded pyexcel and its components into a revenue ge

A flask extension using pyexcel to read, manipulate and write data in different excel formats: csv, ods, xls, xlsx and xlsm.

Flask-Excel - Let you focus on data, instead of file formats Support the project If your company has embedded pyexcel and its components into a revenu

Scrapes mcc-mnc.com and outputs 3 files with the data (JSON, CSV & XLSX)

mcc-mnc.com-webscraper Scrapes mcc-mnc.com and outputs 3 files with the data (JSON, CSV & XLSX) A Python script for web scraping mcc-mnc.com Link: mcc

LightCSV - This CSV reader is implemented in just pure Python.

LightCSV Simple light CSV reader This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles

JSON and CSV data for Swahili dictionary with over 16600+ words

kamusi JSON and CSV data for swahili dictionary with over 16600+ words. This repo consists of data from swahili dictionary with about 16683 words toge

Sheet Data Image/PDF-to-CSV Converter

Sheet Data Image/PDF-to-CSV Converter

DB-Drive-CSV - This is app is can be used to access CSV file as JSON from Google Drive.

DB Drive CSV This is app is can be used to access CSV file as JSON from Google Drive. How To Use Create file/ upload file to Google Drive There's 2 fi

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.
Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

scc4onnx Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel

A Python module for creating Excel XLSX files.
A Python module for creating Excel XLSX files.

XlsxWriter XlsxWriter is a Python module for writing files in the Excel 2007+ XLSX file format. XlsxWriter can be used to write text, numbers, formula

A Python module for creating Excel XLSX files.
A Python module for creating Excel XLSX files.

XlsxWriter XlsxWriter is a Python module for writing files in the Excel 2007+ XLSX file format. XlsxWriter can be used to write text, numbers, formula

É uma API feita em Python e Flask que pesquisa informações em uma tabela .xlsx e retorna o resultado.
É uma API feita em Python e Flask que pesquisa informações em uma tabela .xlsx e retorna o resultado.

API de rastreamento de pacotes É uma API feita em Python e Flask que pesquisa informações de rastreamento de pacotes em uma tabela .xlsx e retorna o r

Transfers a image file(.png) to an Excel file(.xlsx)
Transfers a image file(.png) to an Excel file(.xlsx)

Transfers a image file(.png) to an Excel file(.xlsx)

An XLSX spreadsheet renderer for Django REST Framework.

drf-renderer-xlsx provides an XLSX renderer for Django REST Framework. It uses OpenPyXL to create the spreadsheet and returns the data.

Comments
  • Empty cells in xlsx become the non-empty string

    Empty cells in xlsx become the non-empty string "None" in the dictionary

    Thank you for writing this. I have recently moved from using csv files to excel files and this has made the transition easier.

    One issue I have encountered is that empty cells are not being returned as a Falsey value like None, or the empty string ''.

    From what I have been able to understand, openpyxl reads the empty cell as None, and the following code in sheet2dict uses str to convert this into the string 'None'.

    https://github.com/Pytlicek/sheet2dict/blob/93bf731d327d620755ffd5585aedb0b23a17a6a8/sheet2dict/main.py#L30-L34

    I have filtered my own dictionaries to turn 'None' into '' but that will cause issues when any of my sheet really do contain the word None.

    I just did some quick testing, and it seems that the csv.DictReader() in the python standard library uses the empty string '' for missing values in the a csv file, so that might be a good thing to do for these cases.

    enhancement 
    opened by JoshuaCapel 5
  • Code Proposal: Allow User to Specify Sheet (Work with multi-sheet Excel Files)

    Code Proposal: Allow User to Specify Sheet (Work with multi-sheet Excel Files)

    This library is awesome! I'm using it a lot since it makes working with Excel files very simple.

    I have a proposal: Allow the user to specify the sheet they want to modify.

    I write a lot of functional programming and I can't remember if it's possible to overwrite a class function at import, so instead I've modified the sheet2dict to return a specified sheet when openpyxl opens the file.

    in main.py:

    def xlsx_to_dict(self, path, select_sheet=None):
            """
            Read a Worksheet and return it as array of dictionaries
            :param self: Worksheet Object
            :param path: Path to XLSX file
            :return: Array of rows as dictionaries
            """
            book = load_workbook(path)
            if select_sheet == None:
                  sheet = book.active
            else:
                  sheet = book[select_sheet]
    

    The user would then read an Excel file like this:

    ws =Worksheet()
    ws.xlsx_to_dict(path=<insert path>, select_sheet='<sheetname>'
    

    I don't know if this is the most elegant way to select a sheet, but it works for me. I suppose to import all worksheets by default - I think the maximum is 32 - into a larger dictionary, <edit I didn't complete my thought>, that there could be a clever way to put dictionaries in dictionaries, but this might be too convoluted since I assume most users will just use one sheet. Another option is just to continue creating Worksheet() objects with different names and sheets selected.

    enhancement 
    opened by bai-yi-bai 3
  • Add an option to parse all worksheets in one function call

    Add an option to parse all worksheets in one function call

    Added an option to parse all sheets from a workbook at once while keeping the way existing features worked untouched.

    Had to modify sheet_items to a dict with sheet names as keys and sheet's objects as values.

    enhancement 
    opened by 68Kamil68 2
  • Update README.md

    Update README.md

    Changed "the first sheet" to "active sheet" to avoid confusion if someone saves a spreadsheet and the active worksheet is, fe. second in line. Also, what is considered an active worksheet when there is only one worksheet in the spreadsheet. Add an example with and without the "select_sheet" option.

    documentation 
    opened by Pytlicek 1
Releases(0.1.5)
Owner
Tomas Pytel
PyConSK Organizer, Python Developer & DevOps
Tomas Pytel
Automated Integration Testing and Live Documentation for your API

Automated Integration Testing and Live Documentation for your API

ScanAPI 1.3k Dec 30, 2022
Que es S4K Builder?, Fácil un constructor de tokens grabbers con muchas opciones, como BTC Miner, Clipper, shutdown PC, Y más! Disfrute el proyecto. <3

S4K Builder Este script Python 3 de código abierto es un constructor del muy popular registrador de tokens que está en [mi GitHub] (https://github.com

SadicX 1 Oct 22, 2021
A swagger tool for tornado, using python to write api doc!

SwaggerDoc About A swagger tool for tornado, using python to write api doc! Installation pip install swagger-doc Quick Start code import tornado.ioloo

aaashuai 1 Jan 10, 2022
AiiDA plugin for the HyperQueue metascheduler.

aiida-hyperqueue WARNING: This plugin is still in heavy development. Expect bugs to pop up and the API to change. AiiDA plugin for the HyperQueue meta

AiiDA team 3 Jun 19, 2022
Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.

time-series-kafka-demo Mock stream producer for time series data using Kafka. I walk through this tutorial and others here on GitHub and on my Medium

Maria Patterson 26 Nov 15, 2022
Lightweight, configurable Sphinx theme. Now the Sphinx default!

What is Alabaster? Alabaster is a visually (c)lean, responsive, configurable theme for the Sphinx documentation system. It is Python 2+3 compatible. I

Jeff Forcier 670 Dec 19, 2022
NetBox plugin for BGP related objects documentation

Netbox BGP Plugin Netbox plugin for BGP related objects documentation. Compatibility This plugin in compatible with NetBox 2.10 and later. Installatio

Nikolay Yuzefovich 133 Dec 27, 2022
This contains timezone mapping information for when preprocessed from the geonames data

when-data This contains timezone mapping information for when preprocessed from the geonames data. It exists in a separate repository so that one does

Armin Ronacher 2 Dec 07, 2021
level2-data-annotation_cv-level2-cv-15 created by GitHub Classroom

[AI Tech 3기 Level2 P Stage] 글자 검출 대회 팀원 소개 김규리_T3016 박정현_T3094 석진혁_T3109 손정균_T3111 이현진_T3174 임종현_T3182 Overview OCR (Optimal Character Recognition) 기술

6 Jun 10, 2022
An awesome Data Science repository to learn and apply for real world problems.

AWESOME DATA SCIENCE An open source Data Science repository to learn and apply towards solving real world problems. This is a shortcut path to start s

Academic.io 20.3k Jan 09, 2023
the project for the most brutal and effective language learning technique

- "The project for the most brutal and effective language learning technique" (c) Alex Kay The langflow project was created especially for language le

Alexander Kaigorodov 7 Dec 26, 2021
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

applied-ml Curated papers, articles, and blogs on data science & machine learning in production. ⚙️ Figuring out how to implement your ML project? Lea

Eugene Yan 22.1k Jan 03, 2023
Plugins for MkDocs.

Plugins for MkDocs and Python Markdown pip install neoteroi-mkdocs This package includes the following plugins and extensions: Name Description Type m

35 Dec 23, 2022
Use Brainf*ck with python!

Brainfudge Run Brainf*ck code with python! Classes Interpreter(array_len): encapsulate all functions into class __init__(self, array_len: int=30000) -

1 Dec 14, 2021
Service for visualisation of high dimensional for hydrosphere

hydro-visualization Service for visualization of high dimensional for hydrosphere DEPENDENCIES DEBUG_ENV = bool(os.getenv("DEBUG_ENV", False)) APP_POR

hydrosphere.io 1 Nov 12, 2021
PySpark Cheat Sheet - learn PySpark and develop apps faster

This cheat sheet will help you learn PySpark and write PySpark apps faster. Everything in here is fully functional PySpark code you can run or adapt to your programs.

Carter Shanklin 168 Jan 01, 2023
204-python-string-21BCA90 created by GitHub Classroom

204-Python This repository is created for subject "204 Programming Skill" Python Programming. This Repository contain list of programs of python progr

VIDYABHARTI TRUST COLLEGE OF BCA 6 Mar 31, 2022
More detailed upload statistics for Nicotine+

More Upload Statistics A small plugin for Nicotine+ 3.1+ to create more detailed upload statistics. ⚠ No data previous to enabling this plugin will be

Nick 1 Dec 17, 2021
Paper and Code for "Curriculum Learning by Optimizing Learning Dynamics" (AISTATS 2021)

Curriculum Learning by Optimizing Learning Dynamics (DoCL) AISTATS 2021 paper: Title: Curriculum Learning by Optimizing Learning Dynamics [pdf] [appen

Tianyi Zhou 15 Dec 06, 2022
Always know what to expect from your data.

Great Expectations Always know what to expect from your data. Introduction Great Expectations helps data teams eliminate pipeline debt, through data t

Great Expectations 7.8k Jan 05, 2023