demir.ai Dataset Operations

Overview

demir.ai Dataset Operations

With this application, you can have the empty values (nan/null) deleted or filled before giving your dataset to machine learning algorithms, you can access visual or numerical information about your dataset and have more detailed information about your attributes.

The application is written in Python programming language, Flask framework is used in the backend, Html is used in the frontent. Pandas framework is used to navigate over the dataset, all numerical operations on the dataset were written by me and no ready-made functions were used, while the plots were created from scratch by me using the Opencv framework.

Before running the application, you can install the necessary packages for the application with the following command.

pip3 install -r requirements.txt

You can launch the web application with the following command, and then you can use the application by going to http://localhost:5000/.

python3 main.py

With this web application, you can delete rows or columns with empty values (nan/null) on your dataset or fill these empty values in three different ways.

  • Null value (nan) operations you can do on your dataset with demir.ai Dataset Operations:

    • Column-based deletion of null data (nan/null)
    • Row-based deletion of null data (nan/null)
    • Filling in blank data by mean, median and mode

Again, thanks to this web application, you can reach visual or numerical results about your dataset and have detailed information about your dataset.

  • Information you can learn about your dataset with demir.ai Dataset Operations:

    • Mean of columns
    • Median of columns
    • Mode of columns
    • Frequency of columns
    • Interquartile range value (IQR) of columns
    • Outliers of columns
    • Five number summary of columns
    • Box Chart of columns
    • Variance and standard deviation of columns

Null value (nan/null) operations

  • Column-based deletion of null data (nan/null): The number of nulls is calculated for each column, then the percentage of nulls is calculated and if this percentage is greater than the percentage the user enters, this column is deleted.

  • Row-based deletion of null data (nan/null): The number of nulls is calculated for each line, and if this number of nulls is greater than the number entered by the user, this line is deleted.

  • Filling in blank data by mean, median and mode:

    • Mean: The sum of the non-blank values of the columns is taken and divided by the total number of non-blank values, the average obtained is written instead of the empty values.

    • Median: The median is calculated according to the non-blank values in the columns, and then this median value is written instead of the empty columns.

    • Mode: The mode is calculated according to the non-blank values in the columns, and then this mode value is written instead of the empty columns

Information you can learn about your dataset

  • Mean of columns: The mean is calculated for each column separately and the column mean information is presented to the user.

  • Median of columns: The median is calculated for each column separately and the column median information is presented to the user.

  • Mode of columns: The mode is calculated for each column separately and the column mode information is presented to the user.

  • Frequency of columns: Frequency is calculated for each column and the frequency information of the columns is presented to the user. In this section, frequency visualization is also done by creating a bar plot from scratch with Opencv.

  • Interquartile range value (IQR) of columns: Q1 and Q3 values are found for each column, then the IQR value of the columns is found with Q3-Q1 and presented to the user.

  • Outliers of columns: If the data in the column is less than (Q1-IQR * 1.5) and greater than (Q3+IQR * 1.5), it is called outlier and this information is presented to the user.

  • Five number summary of columns: Minimum, Q1, median, Q3 and Maximum values are calculated and presented to the user.

  • Box Chart of columns: After finding the minimum, Q1, median, Q3 and maximum values for each column, a box chart is created from scratch with Opencv and this chart is presented to the user.

  • Variance and standard deviation of columns: The variance and standard deviation for each column are calculated and presented to the user.

Application video

demirai.mp4
Owner
Ahmet Furkan DEMIR
Hi, my name is Ahmet Furkan DEMIR. I study computer engineering at Necmettin Erbakan University.
Ahmet Furkan DEMIR
Quickly and accurately render even the largest data.

Turn even the largest data into images, accurately Build Status Coverage Latest dev release Latest release Docs Support What is it? Datashader is a da

HoloViz 2.9k Dec 28, 2022
A research of IT labor market based especially on hh.ru. Salaries, rate of technologies and etc.

hh_ru_research Проект реализован в учебных целях анализа рынка труда, в особенности по hh.ru Input data В качестве входных данных используются сериали

3 Sep 07, 2022
Python package to Create, Read, Write, Edit, and Visualize GSFLOW models

pygsflow pyGSFLOW is a python package to Create, Read, Write, Edit, and Visualize GSFLOW models API Documentation pyGSFLOW API documentation can be fo

pyGSFLOW 21 Dec 14, 2022
Painlessly create beautiful matplotlib plots.

Announcement Thank you to everyone who has used prettyplotlib and made it what it is today! Unfortunately, I no longer have the bandwidth to maintain

Olga Botvinnik 1.6k Jan 06, 2023
Set of matplotlib operations that are not trivial

Matplotlib Snippets This repository contains a set of matplotlib operations that are not trivial. Histograms Histogram with bins adapted to log scale

Raphael Meudec 1 Nov 15, 2021
A grammar of graphics for Python

plotnine Latest Release License DOI Build Status Coverage Documentation plotnine is an implementation of a grammar of graphics in Python, it is based

Hassan Kibirige 3.3k Jan 01, 2023
Squidpy is a tool for the analysis and visualization of spatial molecular data.

Squidpy is a tool for the analysis and visualization of spatial molecular data. It builds on top of scanpy and anndata, from which it inherits modularity and scalability. It provides analysis tools t

Theis Lab 251 Dec 19, 2022
Simple implementation of Self Organizing Maps (SOMs) with rectangular and hexagonal grid topologies

py-self-organizing-map Simple implementation of Self Organizing Maps (SOMs) with rectangular and hexagonal grid topologies. A SOM is a simple unsuperv

Jonas Grebe 1 Feb 10, 2022
A Python wrapper of Neighbor Retrieval Visualizer (NeRV)

PyNeRV A Python wrapper of the dimensionality reduction algorithm Neighbor Retrieval Visualizer (NeRV) Compile Set up the paths in Makefile then make.

2 Aug 29, 2021
YOPO is an interactive dashboard which generates various standard plots.

YOPO is an interactive dashboard which generates various standard plots.you can create various graphs and charts with a click of a button. This tool uses Dash and Flask in backend.

ADARSH C 38 Dec 20, 2022
HW 02 for CS40 - matplotlib practice

HW 02 for CS40 - matplotlib practice project instructions https://github.com/mikeizbicki/cmc-csci040/tree/2021fall/hw_02 Drake Lyric Analysis Bar Char

13 Oct 27, 2021
Draw datasets from within Jupyter.

drawdata This small python app allows you to draw a dataset in a jupyter notebook. This should be very useful when teaching machine learning algorithm

vincent d warmerdam 505 Nov 27, 2022
Focus on Algorithm Design, Not on Data Wrangling

The dataTap Python library is the primary interface for using dataTap's rich data management tools. Create datasets, stream annotations, and analyze model performance all with one library.

Zensors 37 Nov 25, 2022
A python-generated website for visualizing the novel coronavirus (COVID-19) data for Greece.

COVID-19-Greece A python-generated website for visualizing the novel coronavirus (COVID-19) data for Greece. Data sources Data provided by Johns Hopki

Isabelle Viktoria Maciohsek 23 Jan 03, 2023
A script written in Python that generate output custom color (HEX or RGB input to x1b hexadecimal)

ColorShell ─ 1.5 Planned for v2: setup.sh for setup alias This script converts HEX and RGB code to x1b x1b is code for colorize outputs, works on ou

Riley 4 Oct 31, 2021
Small project to recursively calculate and plot each successive order of the Hilbert Curve

hilbert-curve Small project to recursively calculate and plot each successive order of the Hilbert Curve. After watching 3Blue1Brown's video on Hilber

Stefan Mejlgaard 2 Nov 15, 2021
Example scripts for generating plots of Bohemian matrices

Bohemian Eigenvalue Plotting Examples This repository contains examples of generating plots of Bohemian eigenvalues. The examples in this repository a

Bohemian Matrices 5 Nov 12, 2022
Leyna's Visualizing Data With Python

Leyna's Visualizing Data Below is information on the number of bilingual students in three school districts in Massachusetts. You will also find infor

11 Oct 28, 2021
JupyterHub extension for ContainDS Dashboards

ContainDS Dashboards for JupyterHub A Dashboard publishing solution for Data Science teams to share results with decision makers. Run a private on-pre

Ideonate 179 Nov 29, 2022
Python Package for CanvasXpress JS Visualization Tools

CanvasXpress Python Library About CanvasXpress for Python CanvasXpress was developed as the core visualization component for bioinformatics and system

Dr. Todd C. Brett 5 Nov 07, 2022