Materials to reproduce our findings in our stories, "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up"

Overview

Amazon Brands and Exclusives

This repository contains code to reproduce the findings featured in our story "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up" from our series Amazon's Advantage.

Our methodology is described in "How We Analyzed Amazon’s Treatment of Its Brands in Search Results".

Data that we collected and analyzed is in the data folder.
To use the full input dataset (which is not hosted here), please refer to Download data.

Jupyter notebooks used for data preprocessing and analysis are available in the notebooks folder.
Descriptions for each notebook are outlined in the Notebooks section below.

Installation

Python

Make sure you have Python 3.6+ installed. We used Miniconda to create a Python 3.8 virtual environment.

Then install the Python packages:
pip install -r requirements.txt

Notebooks

These notebooks are intended to be run sequentially, but they are not dependent on one another. If you want a quick overview of the methodology, you only need to concern yourself with the notebooks with an asterisk(*).

0-data-preprocessing.ipynb

This notebook parses Amazon search results and Amazon product pages, and produces the intermediary datasets (data/output/datasets/) used in ranking analysis and random forest classifiers.

1-data-analysis-search-results.ipynb *

Bulk of the ranking analysis and stats in the data analysis.

2-random-forest-analysis.ipynb *

Feature engineering training set, finding optimal hyperparameters, and performing the ablation study on a random forest model. The most predictive feature is verified using three separate methods.

3-survey-results.ipynb

Visualizing the survey results from our national panel of 1,000 adults.

4-limiations-product-page-changes.ipynb

Analysis of how often the Buy Box's default shipper and seller change between Amazon and a third party.

utils.py

Contains convenient functions used in the notebooks.

parsers.py

Contains parsers for search results and product pages.

Data

This directory is where inputs, intermediaries, and outputs are saved.

data
├── output
│   ├── figures
│   ├── tables
│   └── datasets
│       ├── amazon_private_label.csv.xz
│       ├── products.csv.xz
│       ├── searches.csv.xz
│       ├── training_set.csv.gz
│       ├── pairwise_training_set.csv.gz
│       └── trademarks
└── input
    ├── combined_queries_with_source.csv
    ├── best_sellers
    ├── generic_search_terms
    ├── search-private-label
    ├── search-selenium
    ├── search-selenium-our-brands-filter_
    ├── selenium-products
    ├── seller_central
    └── spotcheck

data/output/ contains tables, figures, and datasets used in our methodology.

data/output/datasets/amazon_private_label.csv.xz is our dataset of Amazon brands, exclusives, and proprietary electronics (N=137,428 products). We use each product's unique ID (called an ASIN) to identify Amazon's own products in our methodology.

data/output/datasets/trademarks contains a dataset of trademarked brands registered by Amazon. The data was collected from USPTO.gov and Amazon. We included an additional README with the exact steps we took to build this dataset in the directory.

data/output/datasets/searches.csv.xz parsed search result pages from top and generic searches (N=187,534 product positions). You can filter this by search_term for each of these subsets from data/input/combined_queries_with_source.csv.

data/output/datasets/products.csv.xz parsed product pages from the searches above (N=157,405 product pages).

data/output/training_set.csv.gz metadata used to train and evaluate the random forest. Additionally, feature engineering is conducted in notebooks/2-random-forest-analysis.ipynb, which produces pairwise_training_set.csv.gz.

Every file in data/input except combined_queries_with_source.csv is stored in AWS s3. Those are not hosted in this repository.

Download Data

You can find the raw inputs in data/input in s3://markup-public-data/amazon-brands/.

If you trust us, you can download the HTML and JSON files in data/input using this script: sh data/download_input_data.sh

Note this is not necessary to run notebooks and see full results.

data/input/search-selenium/ (12 GB uncompressed)

First page of search results collected in January 2021. Download the HTML files search-selenium.tar.xz (238 MB compressed) here.

data/input/selenium-products/ (220 GB uncompressed)

Product pages collected in February 2021. Download the HTML files selenium-products.tar.xz (9 GB compressed) here.

data/input/search-selenium-our-brands-filter_/ (35 GB uncompressed)

Search results filtered by "our brands". Contains every page of search results. Download search-selenium-our-brands-filter_.tar.xz (403 MB compressed) here.

data/input/search-private-label/ (25 GB uncompressed)

API responses for search results filtered down to products Amazon identifies as "our brands". Contains paginated API results. Download the JSON files search-private-label.tar.xz (402 MB uncompressed) here.

data/input/seller_central/ (105 MB)

Seller central data for Q4 2020. Download the CSV file All_Q4_2020.csv.xz (105 MB compressioned) here.

data/input/best_sellers/ (4 GB)

Amazon's best sellers under the category "Amazon Devices & Accessories". Download the HTML files best_sellers.tar.xz (60MB compressed) here.

data/input/spotcheck/ (4 GB)

A sub-sample of product pages for spot-checking Buy Box changes. Download the HTML files spotcheck.tar.xz (159 MB compressed) here.

GroupMenter : New Telegram Group Manager Bot🔸Fast 🔸Python🔸Pyrogram 🔸

GroupMenter An PowerFull Group Manager Bot. Written In Pytelethon. Info • A modular Telegram Python bot running on python3. • Can be found on telegram

Group Menter 24 Jun 28, 2022
Seth Userbot with python

SETH-USERBOT DEPLOY TO HEROKU Group Support: String Session : Stay Support 🚀 ❁ LonamiWebs and Telethon © Credits ⚡ THANK YOU VERY MUCH FOR zeinzo Zei

seth 4 Jan 10, 2022
Бот для мини-игры "Рабы" ("Рабство") ВКонтакте.

vk-slaves-bot Бот для мини-игры "Рабы" ("Рабство") ВК Группа в ВК, в ней публикуются новости и другая полезная информация. У группы есть беседа, в кот

Almaz 80 Dec 17, 2022
Add members to unlimited telegram channels and groups

Program Features 📌 Coded with Python version 10. 📌 without the need for a proxy. 📌 without the need for a Telegram ID. 📌 Ability to add infinite p

hack4lx 10 Nov 25, 2022
Discord Multitool made in python 3.9

XTool Discord Multitool 24 Features: Webhook Delete VC Lagger Fast Token Checker Mass Report [Not Done] Token rape 2K Characters Bypass Block bypass M

Tiie 50 Dec 20, 2022
Discord-Mass-Mention - Yup the title says it all

Protocol - Mass Mention (i havent tested this with any token other than my own t

Mallowies 14 Nov 06, 2022
Command-line program to download videos from YouTube.com and other video sites

youtube-dl - download videos from youtube.com or other video platforms INSTALLATION DESCRIPTION OPTIONS CONFIGURATION OUTPUT TEMPLATE FORMAT SELECTION

youtube-dl 116.4k Jan 07, 2023
Python bot for send videos of a Youtube channel to a telegram group , channel or chat

py_youtube_to_telegram Usage: If you want to install ytt and use it, run this command: sudo sh -c "$(curl -fsSL https://raw.githubusercontent.com/nima

Nima Fanniasl 8 Nov 22, 2022
unofficial source of the discord bot, “haunting.” created by: vorqz, vert, & Veltz

hauntingSRC unofficial source of the discord bot, “haunting.” created by: vorqz, vert, & Veltz reasoning: creators skidded the most of this bot and do

Vast 11 Nov 04, 2022
Docker image for epicseven gvg qq chatbot based on Xunbot

XUN_Langskip XUN 是一个基于 NoneBot 和 酷Q 的功能型QQ机器人,目前提供了音乐点播、音乐推荐、天气查询、RSSHub订阅、使用帮助、识图、识番、搜番、上车、磁力搜索、地震速报、计算、日语词典、翻译、自我检查,权限等级功能,由于是为了完成自己在群里的承诺,一时兴起才做的,所

Xavier Xiong 2 Jun 08, 2022
A Script to automate fowarding all new messages from one/many channel(s) to another channel(s), without the forwarded tag.

Channel Auto Message Forward A script to automate fowarding all new messages from one/many channel(s) to another channel(s), without the forwarded tag

16 Oct 21, 2022
Get random jokes bapack2 on telegram

Jokes Bapack2 Telegram Bot Get random jokes bapack2 from jokes-bapack2-api on telegram bot Screenshot Requirements python pip pipenv python-telegram-b

Miftah Afina 2 Nov 17, 2021
a Disqus alternative

Isso – a commenting server similar to Disqus Isso – Ich schrei sonst – is a lightweight commenting server written in Python and JavaScript. It aims to

Martin Zimmermann 4.7k Jan 02, 2023
Send song lyrics to iMessage users using the Genius lyrics API

pyMessage Send song lyrics to iMessage users using the Genius lyrics API. Setup 1.) Open the main.py file, and add your API key on line 7. 2.) Install

therealkingnull 1 Jan 23, 2022
Tools to download and aggregate feeds of vaccination clinic location information in the United States.

vaccine-feed-ingest Pipeline for ingesting nationwide feeds of vaccine facilities. Contributing How to Configure your environment (instructions on the

Call the Shots 26 Aug 05, 2022
A simple Spamming software made in python

Spam-qlk Warning!!! 'I' am not responsible for the 'damage or harm' caused by this 'Software'!!! Use at your own risk!!! Input the message. After you

Aditya kumar 1 Nov 30, 2021
gBasic - The easy multiplatform bot

gBasic The easy multiplatform bot gBasic is the module at the core of @GianpiertoldaBot, maintained with 3 for the entire community by the Stockdroid

Stockdroid Fans 5 Nov 03, 2021
Azure DevOps Extension for Azure CLI

Azure DevOps Extension for Azure CLI The Azure DevOps Extension for Azure CLI adds Pipelines, Boards, Repos, Artifacts and DevOps commands to the Azur

1 Nov 03, 2021
A Python script that wraps the gitleaks tool to enable scanning of multiple repositories in parallel

mpgitleaks A Python script that wraps the gitleaks tool to enable scanning of multiple repositories in parallel. The motivation behind writing this sc

Emilio Reyes 7 Dec 29, 2022
Documentation and Samples for the Official HN API

Hacker News API Overview In partnership with Firebase, we're making the public Hacker News data available in near real time. Firebase enables easy acc

Y Combinator Hacker News 9.6k Jan 03, 2023