Tools and data for measuring the popularity & growth of various programming languages.

Overview

growth-data

Tools and data for measuring the popularity & growth of various programming languages.

Install the dependencies

$ pip install -r requirements.txt

Example queries

Number of (non-fork) repositories

sqlite> .mode column
sqlite> SELECT
    ds,
    github_search_q AS q,
    MAX(github_search_total_count) AS num_repos
  FROM github_search
  GROUP BY 1, 2
  ORDER BY 3;
ds          q                                  num_repos
----------  ---------------------------------  ---------
2021-12-22  language:tla and fork:false        64       
2021-12-22  language:lean and fork:false       75       
2021-12-22  language:idris and fork:false      140      
2021-12-22  language:agda and fork:false       192      
2021-12-22  language:ada and fork:false        438      
2021-12-22  language:coq and fork:false        509      
2021-12-22  language:erlang and fork:false     2260     
2021-12-22  language:ocaml and fork:false      2278     
2021-12-22  language:fortran and fork:false    3196     
2021-12-22  language:verilog and fork:false    3882     
2021-12-22  language:assembly and fork:false   8654     
2021-12-22  language:haskell and fork:false    10052    
2021-12-22  language:terraform and fork:false  10254    
2021-12-22  language:rust and fork:false       21906    
2021-12-22  language:go and fork:false         67601    
2021-12-22  language:r and fork:false          114942   
2021-12-22  language:c and fork:false          174439   
2021-12-22  language:c++ and fork:false        270351   
2021-12-22  language:python and fork:false     762729   
2021-12-22  language:java and fork:false       943381   
sqlite> 

Stats about the average (non-fork) repository

sqlite> .mode column
sqlite> SELECT
    github_search.ds AS ds,
    github_search_q AS q,
    COUNT(*) AS repos,
    SUM(github_repo_has_issues) AS repos_with_issues,
    SUM(github_repo_has_wiki) AS repos_with_wiki,
    SUM(github_repo_has_pages) AS repos_with_pages,
    SUM(github_repo_license_name != '') AS repos_with_license,
    SUM(github_repo_size) AS sum_repo_size,
    SUM(github_repo_stargazers_count) AS sum_stars,
    AVG(github_repo_stargazers_count) AS avg_stars,
    AVG(github_repo_forks_count) AS avg_forks,
    AVG(github_repo_size) AS avg_size,
    AVG(github_repo_open_issues_count) AS avg_open_issues
  FROM github_search INNER JOIN github_search_repo
  ON github_search.obj_id = github_search_obj_id
  GROUP BY 1, 2
  ORDER BY 3;
ds          q                              repos  repos_with_issues  repos_with_wiki  repos_with_pages  repos_with_license  sum_repo_size  sum_stars  avg_stars         avg_forks         avg_size          avg_open_issues  
----------  -----------------------------  -----  -----------------  ---------------  ----------------  ------------------  -------------  ---------  ----------------  ----------------  ----------------  -----------------
2021-12-22  language:tla and fork:false    64     63                 61               1                 23                  1393879        1937       30.265625         2.34375           21779.359375      0.359375         
2021-12-22  language:lean and fork:false   75     73                 72               5                 22                  1119783        1475       19.6666666666667  1.85333333333333  14930.44          1.61333333333333 
2021-12-22  language:idris and fork:false  140    139                136              4                 63                  108818         1242       8.87142857142857  0.85              777.271428571429  0.728571428571429
2021-12-22  language:agda and fork:false   192    188                187              9                 51                  394233         1725       8.984375          0.90625           2053.296875       0.291666666666667
2021-12-22  language:ada and fork:false    438    421                406              12                155                 2387761        2210       5.04566210045662  1.13926940639269  5451.50913242009  1.09360730593607 
2021-12-22  language:coq and fork:false    509    502                493              42                204                 2894476        4304       8.45579567779961  1.50098231827112  5686.59332023576  0.846758349705305
sqlite>

Stats about the average recently-updated (non-fork) repository

sqlite> .mode column
sqlite> SELECT
    github_search.ds AS ds,
    github_search_q AS q,
    COUNT(*) AS repos,
    SUM(github_repo_has_issues) AS repos_with_issues,
    SUM(github_repo_has_wiki) AS repos_with_wiki,
    SUM(github_repo_has_pages) AS repos_with_pages,
    SUM(github_repo_license_name != '') AS repos_with_license,
    SUM(github_repo_size) AS sum_repo_size,
    SUM(github_repo_stargazers_count) AS sum_stars,
    AVG(github_repo_stargazers_count) AS avg_stars,
    AVG(github_repo_forks_count) AS avg_forks,
    AVG(github_repo_size) AS avg_size,
    AVG(github_repo_open_issues_count) AS avg_open_issues
  FROM github_search INNER JOIN github_search_repo
  ON github_search.obj_id = github_search_obj_id
  WHERE github_repo_updated_at >= '2021-01-01T00:00:00Z'
  GROUP BY 1, 2
  ORDER BY 3;
ds          q                              repos  repos_with_issues  repos_with_wiki  repos_with_pages  repos_with_license  sum_repo_size  sum_stars  avg_stars         avg_forks         avg_size          avg_open_issues  
----------  -----------------------------  -----  -----------------  ---------------  ----------------  ------------------  -------------  ---------  ----------------  ----------------  ----------------  -----------------
2021-12-22  language:tla and fork:false    33     32                 30               1                 18                  1322462        1921       58.2121212121212  4.39393939393939  40074.6060606061  0.636363636363636
2021-12-22  language:idris and fork:false  44     44                 43               3                 23                  33576          1052       23.9090909090909  2.22727272727273  763.090909090909  1.61363636363636 
2021-12-22  language:lean and fork:false   46     44                 43               3                 14                  1116533        1442       31.3478260869565  2.93478260869565  24272.4565217391  2.58695652173913 
2021-12-22  language:agda and fork:false   77     74                 75               8                 24                  310115         1520       19.7402597402597  1.93506493506494  4027.46753246753  0.376623376623377
2021-12-22  language:ada and fork:false    168    165                148              10                82                  1615474        2065       12.2916666666667  2.67261904761905  9615.91666666667  2.80357142857143 
2021-12-22  language:coq and fork:false    211    206                201              32                113                 1962100        4018       19.042654028436   3.22748815165877  9299.05213270142  1.89099526066351 
sqlite> 
nlp基础任务

NLP算法 说明 此算法仓库包括文本分类、序列标注、关系抽取、文本匹配、文本相似度匹配这五个主流NLP任务,涉及到22个相关的模型算法。 框架结构 文件结构 all_models ├── Base_line │   ├── __init__.py │   ├── base_data_process.

zuxinqi 23 Sep 22, 2022
Official codebase for Can Wikipedia Help Offline Reinforcement Learning?

Official codebase for Can Wikipedia Help Offline Reinforcement Learning?

Machel Reid 82 Dec 19, 2022
A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

Reddit text to speech generator A basic reddit tts video generator Current functionality Generate videos for subs based on comments,(askreddit) so rea

Aadvik 17 Dec 19, 2022
PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Chung-Ming Chien 1k Dec 30, 2022
Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

Wav2Vec2CTC With KenLM Using KenLM ARPA language model with beam search to decode audio files and show the most probable transcription. Assuming you'v

farisalasmary 65 Sep 21, 2022
Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

ProphetNet-X This repo provides the code for reproducing the experiments in ProphetNet. In the paper, we propose a new pre-trained language model call

Microsoft 394 Dec 17, 2022
This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.

Twitter COVID-19 Sentiment Analysis Members: Christopher Bach | Khalid Hamid Fallous | Jay Hirpara | Jing Tang | Graham Thomas | David Wetherhold Pro

4 Oct 15, 2022
CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT This repo provides the code for reproducing the experiments in CodeBERT: A Pre-Trained Model for Programming and Natural Languages. CodeBERT

Microsoft 1k Jan 03, 2023
An attempt to map the areas with active conflict in Ukraine using open source twitter data.

Live Action Map (LAM) An attempt to use open source data on Twitter to map areas with active conflict. Right now it is used for the Ukraine-Russia con

Kinshuk Dua 171 Nov 21, 2022
ByT5: Towards a token-free future with pre-trained byte-to-byte models

ByT5: Towards a token-free future with pre-trained byte-to-byte models ByT5 is a tokenizer-free extension of the mT5 model. Instead of using a subword

Google Research 409 Jan 06, 2023
Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipB

Jie Lei 雷杰 612 Jan 04, 2023
Saptak Bhoumik 14 May 24, 2022
PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing pororo performs Natural Language Processing and Speech-related tasks. It is easy to

Kakao Brain 1.2k Dec 21, 2022
A versatile token stream for handwritten parsers.

Writing recursive-descent parsers by hand can be quite elegant but it's often a bit more verbose than expected, especially when it comes to handling indentation and reporting proper syntax errors. Th

Valentin Berlier 8 Nov 30, 2022
Beyond Accuracy: Behavioral Testing of NLP models with CheckList

CheckList This repository contains code for testing NLP Models as described in the following paper: Beyond Accuracy: Behavioral Testing of NLP models

Marco Tulio Correia Ribeiro 1.8k Dec 28, 2022
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022
A natural language processing model for sequential sentence classification in medical abstracts.

NLP PubMed Medical Research Paper Abstract (Randomized Controlled Trial) A natural language processing model for sequential sentence classification in

Hemanth Chandran 1 Jan 17, 2022
Blackstone is a spaCy model and library for processing long-form, unstructured legal text

Blackstone Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project f

ICLR&D 579 Jan 08, 2023
Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Twitter-Sentiment-Analysis Twitter sentiment analysis for india's top online retailers(2019 to 2022) Project Overview : Sentiment Analysis helps us to

Balaji R 1 Jan 01, 2022
Search msDS-AllowedToActOnBehalfOfOtherIdentity

前言 现在进行RBCD的攻击手段主要是搜索mS-DS-CreatorSID,如果机器的创建者是我们可控的话,那就可以修改对应机器的msDS-AllowedToActOnBehalfOfOtherIdentity,利用工具SharpAllowedToAct-Modify 那我们索性也试试搜索所有计算机

Jumbo 26 Dec 05, 2022