Python Eacc is a minimalist but flexible Lexer/Parser tool in Python.

Related tags

Documentationeacc
Overview

Eacc

Python Eacc is a parsing tool it implements a flexible lexer and a straightforward approach to analyze documents. It uses Python code to specify both lexer and grammar for a given document. Eacc can handle succinctly most parsing cases that existing Python parsing tools propose to address.

Documents are split into tokens and a token has a type when a sequence of tokens is matched it evaluates to a specific type then rematcned again against the existing rules. The types can be function objects it means patterns can be evaluated based on extern conditions.

The fact of it being possible to have a grammar rule associated to a type and the type being variable in the context of the program it makes eacc useful for some text analysis problems.

A document grammar is written mostly in an ambiguous manner. The parser has a lookahead mechanism to express precedence when matching rules.

It is possible to extend the document grammar at the time it is being parsed. Such a feature is interesting to handle some edge cases.

The parser also accept some special operators like Except, Only, Times etc. These operators are used to match sequences of tokens based on their token types and length.

Features

  • Fast and flexible Lexer

    • Use class inheritance to extend/modify your existing lexers.
  • Handle broken documents.

    • Useful in some edge cases.
  • Short implementation

    • You can easily extend or modify functionalities.
  • Powerful but easy to learn

    • Learn a few classes workings to implement a parser.
  • Pythonic notation for grammars

    • No need to dig deep into grammar theory.

Note: For a real and more sophisticated example of eacc usage check out.

Crocs is capable of reading a regex string then generating possible matches for the inputed regex.

https://github.com/iogf/crocs

Basic Example

The code below specifies a lexer and a parsing approach for a simple expression calculator. When one of the mathematical operations +, -, * or / is executed then the result is a number

Based on such a simple assertion it is possible to implement our calculator.

from eacc.eacc import Rule, Grammar, Eacc
from eacc.lexer import Lexer, LexTok, XSpec
from eacc.token import Plus, Minus, LP, RP, Mul, Div, Num, Blank, Sof, Eof

class CalcTokens(XSpec):
    # Used to extract the tokens.
    t_plus   = LexTok(r'\+', Plus)
    t_minus  = LexTok(r'\-', Minus)

    t_lparen = LexTok(r'\(', LP)
    t_rparen = LexTok(r'\)', RP)
    t_mul    = LexTok(r'\*', Mul)
    t_div    = LexTok(r'\/', Div)

    t_num    = LexTok(r'[0-9]+', Num, float)
    t_blank  = LexTok(r' +', Blank, discard=True)

    root = [t_plus, t_minus, t_lparen, t_num, 
    t_blank, t_rparen, t_mul, t_div]

class CalcGrammar(Grammar):
    # The token patterns when matched them become
    # ParseTree objects which have a type.
    r_paren = Rule(LP, Num, RP, type=Num)
    r_div   = Rule(Num, Div, Num, type=Num)
    r_mul   = Rule(Num, Mul, Num, type=Num)
    o_div   = Rule(Div)
    o_mul   = Rule(Mul)

    r_plus  = Rule(Num, Plus, Num, type=Num, up=(o_mul, o_div))
    r_minus = Rule(Num, Minus, Num, type=Num, up=(o_mul, o_div))

    # The final structure that is consumed. Once it is
    # consumed then the process stops.
    r_done  = Rule(Sof, Num, Eof)

    root = [r_paren, r_plus, r_minus, r_mul, r_div, r_done]

# The handles mapped to the patterns to compute the expression result.
def plus(expr, sign, term):
    return expr.val() + term.val()

def minus(expr, sign, term):
    return expr.val() - term.val()

def div(term, sign, factor):
    return term.val()/factor.val()

def mul(term, sign, factor):
    return term.val() * factor.val()

def paren(left, expression, right):
    return expression.val()

def done(sof, num, eof):
    print('Result:', num.val())
    return num.val()

if __name__ == '__main__':
    data = '2 * 5 + 10 -(2 * 3 - 10 )+ 30/(1-3+ 4* 10 + (11/1))' 

    lexer  = Lexer(CalcTokens)
    tokens = lexer.feed(data)
    eacc   = Eacc(CalcGrammar)
    
    # Link the handles to the patterns.
    eacc.add_handle(CalcGrammar.r_plus, plus)
    eacc.add_handle(CalcGrammar.r_minus, minus)
    eacc.add_handle(CalcGrammar.r_div, div)
    eacc.add_handle(CalcGrammar.r_mul, mul)
    eacc.add_handle(CalcGrammar.r_paren, paren)
    eacc.add_handle(CalcGrammar.r_done, done)
    
    ptree = eacc.build(tokens)
    ptree = list(ptree)

The defined rule below fixes precedence in the above ambiguous grammar.

    r_plus  = Rule(Num, Plus, Num, type=Num, up=(o_mul, o_div))

The above rule will be matched only if the below rules aren't matched ahead.

    o_div   = Rule(Div)
    o_mul   = Rule(Mul)

In case the above rule is matched then the result has type Num it will be rematched against the existing rules and so on.

When a mathematical expression is well formed it will result to the following structure.

Sof Num Eof

Which is matched by the rule below.

    r_done  = Rule(Sof, Num, Eof)

That rule is mapped to the handle below. It will merely print the resulting value.

def done(sof, num, eof):
    print('Result:', num.val())
    return num.val()

The Sof and Eof are start of file and end of file tokens. These are automatically inserted by the parser.

In case it is not a valid mathematical expression then it raises an exception. When a given document is well formed, the defined rules will consume it entirely.

The lexer is really flexible it can handle some interesting cases in a short and simple manner.

from eacc.lexer import XSpec, Lexer, SeqTok, LexTok, LexSeq
from eacc.token import Keyword, Identifier, RP, LP, Colon, Blank

class KeywordTokens(XSpec):
    t_if = LexSeq(SeqTok(r'if', type=Keyword),
    SeqTok(r'\s+', type=Blank))

    t_blank  = LexTok(r' +', type=Blank)
    t_lparen = LexTok(r'\(', type=LP)
    t_rparen = LexTok(r'\)', type=RP)
    t_colon  = LexTok(r'\:', type=Colon)

    # Match identifier only if it is not an if.
    t_identifier = LexTok(r'[a-zA-Z0-9]+', type=Identifier)

    root = [t_if, t_blank, t_lparen, 
    t_rparen, t_colon, t_identifier]

lex = Lexer(KeywordTokens)
data = 'if ifnum: foobar()'
tokens = lex.feed(data)
print('Consumed:', list(tokens))

That would output:

Consumed: [Keyword('if'), Blank(' '), Identifier('ifnum'), Colon(':'),
Blank(' '), Identifier('foobar'), LP('('), RP(')')]

The above example handles the task of tokenizing keywords correctly. The SeqTok class works together with LexSeq to extract the tokens based on a given regex while LexNode works on its own to extract tokens that do not demand a lookahead step.

Install

Note: Work with python3 only.

pip install eacc

Documentation

You might also like...
Sms Bomber, Tool Encryptor
Sms Bomber, Tool Encryptor

ɴᴏʙɪᴛᴀシ︎ ғᴏʀ ᴀɴʏ ʜᴇʟᴘシ︎ Install pkg install git -y pkg install python -y pip install requests git clone https://github.com/AK27HVAU/akash Run cd Akash

JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates.
JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates.

JTEX JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates. This package uses Jinja2 as the template engine with

Żmija is a simple universal code generation tool.

Żmija Żmija is a simple universal code generation tool. It is intended to be used as a means to generate code that is both efficient and easily mainta

epub2sphinx is a tool to convert epub files to ReST for Sphinx
epub2sphinx is a tool to convert epub files to ReST for Sphinx

epub2sphinx epub2sphinx is a tool to convert epub files to ReST for Sphinx. It uses Pandoc for converting HTML data inside epub files into ReST. It cr

Sphinx-performance - CLI tool to measure the build time of different, free configurable Sphinx-Projects
Sphinx-performance - CLI tool to measure the build time of different, free configurable Sphinx-Projects

CLI tool to measure the build time of different, free configurable Sphinx-Projec

A collection of simple python mini projects to enhance your python skills

A collection of simple python mini projects to enhance your python skills

Repository for learning Python (Python Tutorial)

Repository for learning Python (Python Tutorial) Languages and Tools 🧰 Overview 📑 Repository for learning Python (Python Tutorial) Languages and Too

A python package to avoid writing and maintaining duplicated python docstrings.

docstring-inheritance is a python package to avoid writing and maintaining duplicated python docstrings.

advance python series: Data Classes, OOPs, python

Working With Pydantic - Built-in Data Process ========================== Normal way to process data (reading json file): the normal princiople, it's f

Releases(v3.1.6)
Owner
Iury de oliveira gomes figueiredo
Iury de oliveira gomes figueiredo
YAML metadata extension for Python-Markdown

YAML metadata extension for Python-Markdown This extension adds YAML meta data handling to markdown with all YAML features. As in the original, metada

Nikita Sivakov 14 Dec 30, 2022
Python 3 wrapper for the Vultr API v2.0

Vultr Python Python wrapper for the Vultr API. https://www.vultr.com https://www.vultr.com/api This is currently a WIP and not complete, but has some

CSSNR 6 Apr 28, 2022
MkDocs plugin for setting revision date from git per markdown file

mkdocs-git-revision-date-plugin MkDocs plugin that displays the last revision date of the current page of the documentation based on Git. The revision

Terry Zhao 48 Jan 06, 2023
Projeto em Python colaborativo para o Bootcamp de Dados do Itaú em parceria com a Lets Code

🧾 lets-code-todo-list por Henrique V. Domingues e Josué Montalvão Projeto em Python colaborativo para o Bootcamp de Dados do Itaú em parceria com a L

Henrique V. Domingues 1 Jan 11, 2022
Crystal Smp plugin for show scoreboards

MCDR-CrystalScoreboards Crystal plugin for show scoreboards | Only 1.12 Usage !!s : Plugin help message !!s hide : Hide scoreboard !!s show : Show Sco

CristhianCd 3 Oct 12, 2021
This is a repository for "100 days of code challenge" projects. You can reach all projects from beginner to professional which are written in Python.

100 Days of Code It's a challenge that aims to gain code practice and enhance programming knowledge. Day #1 Create a Band Name Generator It's actually

SelenNB 2 May 12, 2022
A simple XLSX/CSV reader - to dictionary converter

sheet2dict A simple XLSX/CSV reader - to dictionary converter Installing To install the package from pip, first run: python3 -m pip install --no-cache

Tomas Pytel 216 Nov 25, 2022
More detailed upload statistics for Nicotine+

More Upload Statistics A small plugin for Nicotine+ 3.1+ to create more detailed upload statistics. ⚠ No data previous to enabling this plugin will be

Nick 1 Dec 17, 2021
Type hints support for the Sphinx autodoc extension

sphinx-autodoc-typehints This extension allows you to use Python 3 annotations for documenting acceptable argument types and return value types of fun

Alex Grönholm 462 Dec 29, 2022
Create docsets for Dash.app-compatible API browser.

doc2dash: Create Docsets for Dash.app and Clones doc2dash is an MIT-licensed extensible Documentation Set generator intended to be used with the Dash.

Hynek Schlawack 498 Dec 30, 2022
level2-data-annotation_cv-level2-cv-15 created by GitHub Classroom

[AI Tech 3기 Level2 P Stage] 글자 검출 대회 팀원 소개 김규리_T3016 박정현_T3094 석진혁_T3109 손정균_T3111 이현진_T3174 임종현_T3182 Overview OCR (Optimal Character Recognition) 기술

6 Jun 10, 2022
Python code for working with NFL play by play data.

nfl_data_py nfl_data_py is a Python library for interacting with NFL data sourced from nflfastR, nfldata, dynastyprocess, and Draft Scout. Includes im

82 Jan 05, 2023
JMESPath is a query language for JSON.

JMESPath JMESPath (pronounced "james path") allows you to declaratively specify how to extract elements from a JSON document. For example, given this

1.7k Dec 31, 2022
Pystm32ai - A Python wrapper for the stm32ai command-line tool

PySTM32.AI A python wrapper for the stm32ai command-line tool to analyse deep le

Thibaut Vercueil 5 Jul 28, 2022
freeCodeCamp Scientific Computing with Python Project for Certification.

Polygon_Area_Calculator freeCodeCamp Python Project freeCodeCamp Scientific Computing with Python Project for Certification. In this project you will

Rajdeep Mondal 1 Dec 23, 2021
:blue_book: Automatic documentation from sources, for MkDocs.

mkdocstrings Automatic documentation from sources, for MkDocs. Features - Python handler - Requirements - Installation - Quick usage Features Language

1.1k Jan 04, 2023
Python Eacc is a minimalist but flexible Lexer/Parser tool in Python.

Python Eacc is a parsing tool it implements a flexible lexer and a straightforward approach to analyze documents.

Iury de oliveira gomes figueiredo 60 Nov 16, 2022
Easy OpenAPI specs and Swagger UI for your Flask API

Flasgger Easy Swagger UI for your Flask API Flasgger is a Flask extension to extract OpenAPI-Specification from all Flask views registered in your API

Flasgger 3.1k Jan 05, 2023
Sphinx-performance - CLI tool to measure the build time of different, free configurable Sphinx-Projects

CLI tool to measure the build time of different, free configurable Sphinx-Projec

useblocks 11 Nov 25, 2022
Python Deep Dive Course - Accompanying Materials

Python Deep Dive Various Jupyter notebooks and Python sources associated with my Udemy Python 3 Deep Dive course series: Part 1: Mainly functional pro

Fred Baptiste 1.1k Dec 30, 2022