2019 Data Science Bowl

Overview

2019 Data Science Bowl

Uncover the factors to help measure how young children learn

Screenshot

Ignite Possibilities.

Uncover new insights in early childhood education and how media can support learning outcomes. Participate in our fifth annual Data Science Bowl, presented by Booz Allen Hamilton and Kaggle.

PBS KIDS, a trusted name in early childhood education for decades, aims to gain insights into how media can help children learn important skills for success in school and life. In this challenge, you’ll use anonymous gameplay data, including knowledge of videos watched and games played, from the PBS KIDS Measure Up! app, a game-based learning tool developed as a part of the CPB-PBS Ready To Learn Initiative with funding from the U.S. Department of Education. Competitors will be challenged to predict scores on in-game assessments and create an algorithm that will lead to better-designed games and improved learning outcomes. Your solutions will aid in discovering important relationships between engagement with high-quality educational media and learning processes.

Data Science Bowl is the world’s largest data science competition focused on social good. Each year, this competition gives Kagglers a chance to use their passion to change the world. Over the last four years, more than 50,000+ Kagglers have submitted over 114,000+ submissions, to improve everything from lung cancer and heart disease detection to ocean health.

For more information on the Data Science Bowl, please visit www.DataScienceBowl.com

Where does the data for the competition come from?

The data used in this competition is anonymous, tabular data of interactions with the PBS KIDS Measure Up! app. Select data, such as a user’s in-app assessment score or their path through the game, is collected by the PBS KIDS Measure Up! app, a game-based learning tool.

PBS KIDS is committed to creating a safe and secure environment that family members of all ages can enjoy. The PBS KIDS Measure Up! app does not collect any personally identifying information, such as name or location. All of the data used in the competition is anonymous. To view the full PBS KIDS privacy policy, please visit: pbskids.org/privacy.

No one will be able to download the entire data set and the participants do not have access to any personally identifiable information about individual users. The Data Science Bowl and the use of data for this year’s competition has been reviewed to ensure that it meets requirements of applicable child privacy regulations by PRIVO, a leading global industry expert in children’s online privacy.

What is the PBS KIDS Measure Up! app?

Screenshot

In the PBS KIDS Measure Up! app, children ages 3 to 5 learn early STEM concepts focused on length, width, capacity, and weight while going on an adventure through Treetop City, Magma Peak, and Crystal Caves. Joined by their favorite PBS KIDS characters, children can also collect rewards and unlock digital toys as they play. To learn more about PBS KIDS Measure Up!, please click here.

PBS KIDS and the PBS KIDS Logo are registered trademarks of PBS. Used with permission. The contents of PBS KIDS Measure Up! were developed under a grant from the Department of Education. However, those contents do not necessarily represent the policy of the Department of Education, and you should not assume endorsement by the Federal Government. The app is funded by a Ready To Learn grant (PR/AWARD No. U295A150003, CFDA No. 84.295A) provided by the Department of Education to the Corporation for Public Broadcasting.

My Solution 460 Features | Simple | Easy | Less_overfit | Fast

Screenshot

Simple, easy and fast and less overfitting solution with 460 features

This notebook shows problem solving approach using LightGBM Regression and 890 features computed by bruno aquino in the following notebook which are later reduced to 460 features in my approach.

https://www.kaggle.com/braquino/890-features

It also uses the regression coefficients from following notebook by artgor.

https://www.kaggle.com/artgor/quick-and-dirty-regression

Apart from these i also have included resultant LightGBM parameters from exhaustive parameter tuning.

If you find this notebook helpful please press that thumbs up button and thank you :)

PLEASE NOTE THIS IMPORTANT POINT "DON'T BELIEVE IN PUBLIC LB" IT'S ONLY 14% of real data that's private!! We should build a model that's less overfittig and still finding the good results."

Your score will be different for different submissions that's because of randomness in gradient boosting! and that's completely normal you must focus on reducing overfitting, gather as much data as possible and ofcourse reduce the number of features as much as possible without sacrificing model validation score and that's exactly what i've done below :)

Thank you!

Owner
Deepak Nandwani
A Machine Learning and Data Science Engineer, my goal is to make a +ve impact on millions of people's daily lives & to be hyper-optimistic about the future.
Deepak Nandwani
Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

PremiershipPlayerAnalysis Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data. No

5 Sep 06, 2021
Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine Intro This repo contains the python/stan version of the Statistical Rethinking

Andrés Suárez 3 Nov 08, 2022
Approximate Nearest Neighbor Search for Sparse Data in Python!

Approximate Nearest Neighbor Search for Sparse Data in Python! This library is well suited to finding nearest neighbors in sparse, high dimensional spaces (like text documents).

Meta Research 906 Jan 01, 2023
Catalogue data - A Python Scripts to prepare catalogue data

catalogue_data Scripts to prepare catalogue data. Setup Clone this repo. Install

BigScience Workshop 3 Mar 03, 2022
Visions provides an extensible suite of tools to support common data analysis operations

Visions And these visions of data types, they kept us up past the dawn. Visions provides an extensible suite of tools to support common data analysis

168 Dec 28, 2022
Leverage Twitter API v2 to analyze tweet metrics such as impressions and profile clicks over time.

Tweetmetric Tweetmetric allows you to track various metrics on your most recent tweets, such as impressions, retweets and clicks on your profile. The

Mathis HAMMEL 29 Oct 18, 2022
Powerful, efficient particle trajectory analysis in scientific Python.

freud Overview The freud Python library provides a simple, flexible, powerful set of tools for analyzing trajectories obtained from molecular dynamics

Glotzer Group 195 Dec 20, 2022
Python Project on Pro Data Analysis Track

Udacity-BikeShare-Project: Python Project on Pro Data Analysis Track Basic Data Exploration with pandas on Bikeshare Data Basic Udacity project using

Belal Mohammed 0 Nov 10, 2021
Stream-Kafka-ELK-Stack - Weather data streaming using Apache Kafka and Elastic Stack.

Streaming Data Pipeline - Kafka + ELK Stack Streaming weather data using Apache Kafka and Elastic Stack. Data source: https://openweathermap.org/api O

Felipe Demenech Vasconcelos 2 Jan 20, 2022
This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics!

COSMETICS GENERATOR This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics! Remember to put the l

ᴅᴊʟᴏʀ3xᴢᴏ 11 Dec 13, 2022
Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

Larch: Data Analysis Tools for X-ray Spectroscopy and More Documentation: http://xraypy.github.io/xraylarch Code: http://github.com/xraypy/xraylarch L

xraypy 95 Dec 13, 2022
Statistical Analysis 📈 focused on statistical analysis and exploration used on various data sets for personal and professional projects.

Statistical Analysis 📈 This repository focuses on statistical analysis and the exploration used on various data sets for personal and professional pr

Andy Pham 1 Sep 03, 2022
Jupyter notebooks for the book "The Elements of Statistical Learning".

This repository contains Jupyter notebooks implementing the algorithms found in the book and summary of the textbook.

Madiyar 369 Dec 30, 2022
statDistros is a Python library for dealing with various statistical distributions

StatisticalDistributions statDistros statDistros is a Python library for dealing with various statistical distributions. Now it provides various stati

1 Oct 03, 2021
CPSPEC is an astrophysical data reduction software for timing

CPSPEC manual Introduction CPSPEC is an astrophysical data reduction software for timing. Various timing properties, such as power spectra and cross s

Tenyo Kawamura 1 Oct 20, 2021
A stock analysis app with streamlit

StockAnalysisApp A stock analysis app with streamlit. You select the ticker of the stock and the app makes a series of analysis by using the price cha

Antonio Catalano 50 Nov 27, 2022
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and lo

Coiled 102 Nov 10, 2022
First steps with Python in Life Sciences

First steps with Python in Life Sciences This course material is part of the "First Steps with Python in Life Science" three-day course of SIB-trainin

SIB Swiss Institute of Bioinformatics 22 Jan 08, 2023
A tax calculator for stocks and dividends activities.

Revolut Stocks calculator for Bulgarian National Revenue Agency Information Processing and calculating the required information about stock possession

Doino Gretchenliev 200 Oct 25, 2022
An easy-to-use feature store

A feature store is a data storage system for data science and machine-learning. It can store raw data and also transformed features, which can be fed straight into an ML model or training script.

ByteHub AI 48 Dec 09, 2022