A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

Overview

GitHub contributors GitHub issues GNews license GNews stars PyPI PyPI - Downloads

GNews

🚩 A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response
🚩 As well as you can fetch full article (No need to write scrappers for articles fetching anymore)

Gnews

Installation

pip install gnews

Usage

from gnews import GNews

google_news = GNews()
json_resp = google_news.get_news('Pakistan')
print(json_resp[0])
{
'publisher': 'Aljazeera.com',
 'description': 'Pakistan accuses India of stoking conflict in Indian Ocean  '
                'Aljazeera.com',
 'published date': 'Tue, 16 Feb 2021 11:50:43 GMT',
 'title': 'Pakistan accuses India of stoking conflict in Indian Ocean - '
          'Aljazeera.com',
 'url': 'https://www.aljazeera.com/news/2021/2/16/pakistan-accuses-india-of-nuclearizing-indian-ocean'
 }

  • Get news will return the list, [{'title': '...', 'published date': '...', 'description': '...', 'url': '...', 'publisher': '...'}]

Available locations and languages

print(google_news.countries)

'Australia', 'Botswana', 'Canada ', 'Ethiopia', 'Ghana', 'India ', 'Indonesia', 'Ireland', 'Israel ', 'Kenya', 'Latvia',
'Malaysia', 'Namibia', 'New Zealand', 'Nigeria', 'Pakistan', 'Philippines', 'Singapore', 'South Africa', 'Tanzania',
'Uganda', 'United Kingdom', 'United States', 'Zimbabwe', 'Czech Republic', 'Germany', 'Austria', 'Switzerland', 'Argentina',
'Chile', 'Colombia', 'Cuba', 'Mexico', 'Peru', 'Venezuela', 'Belgium ', 'France', 'Morocco', 'Senegal', 'Italy', 'Lithuania',
'Hungary', 'Netherlands', 'Norway', 'Poland', 'Brazil', 'Portugal', 'Romania', 'Slovakia', 'Slovenia', 'Sweden',
'Vietnam', 'Turkey', 'Greece', 'Bulgaria', 'Russia', 'Ukraine ', 'Serbia', 'United Arab Emirates', 'Saudi Arabia', 'Lebanon',
'Egypt', 'Bangladesh', 'Thailand', 'China', 'Taiwan', 'Hong Kong', 'Japan', 'Republic of Korea'
print(google_news.languages)

'english', 'indonesian', 'czech', 'german', 'spanish', 'french', 'italian', 'latvian', 'lithuanian', 'hungarian',
'dutch', 'norwegian', 'polish', 'portuguese brasil', 'portuguese portugal', 'romanian', 'slovak', 'slovenian', 'swedish',
'vietnamese', 'turkish', 'greek', 'bulgarian', 'russian', 'serbian', 'ukrainian', 'hebrew', 'arabic', 'marathi', 'hindi', 'bengali',
'tamil', 'telugu', 'malyalam', 'thai', 'chinese simplified', 'chinese traditional', 'japanese', 'korean'

We can set country, language, period and size during initialization

google_news = GNews(language='english', country='United States', period='7d', max_results=10)

Others methods to set country, language, period and size

set_period('7d') # News from last 7 days
max_results(10) # number of responses across a keyword
set_country('United States') # News from a specific country 
set_language('english') # News in a sepcific language

Google News cover across 141+ countries with 41+ languages. On the bottom left side of the Google News page you may find a Language & region section where you can find all of the supported combinations.

Article Properties

Properties Description Example
title Title of the article IMF Staff and Pakistan Reach Staff-Level Agreement on the Pending Reviews Under the Extended Fund Facility
url Google news link to article Article Link
published date Published date Wed, 07 Jun 2017 07:01:30 GMT
description Short description of article IMF Staff and Pakistan Reach Staff-Level Agreement on the Pending Reviews Under the Extended Fund Facility ...
publisher Publisher of article The Guardian

Getting full article

you can use newspaper3k to scrap full article, you can also get full article using get_full_article by passing url.

Make sure you already install newspaper3k

Install newspaper3k

pip3 install newspaper3k

from gnews import GNews

google_news = GNews()
json_resp = google_news.get_news('Pakistan')
article = google_news.get_full_article(json_resp[0]['url']) # newspaper3k instance, you can access newspaper3k all attributes in article
article.title 

IMF Staff and Pakistan Reach Staff-Level Agreement on the Pending Reviews Under the Extended Fund Facility'

article.text 

End-of-Mission press releases include statements of IMF staff teams that convey preliminary findings after a mission. The views expressed are those of the IMF staff and do not necessarily represent the views of the IMF’s Executive Board.\n\nIMF staff and the Pakistani authorities have reached an agreement on a package of measures to complete second to fifth reviews of the authorities’ reform program supported by the IMF Extended Fund Facility (EFF) ..... (full article)

article.images

{'https://www.imf.org/~/media/Images/IMF/Live-Page/imf-live-rgb-h.ashx?la=en', 'https://www.imf.org/-/media/Images/IMF/Data/imf-logo-eng-sep2019-update.ashx', 'https://www.imf.org/-/media/Images/IMF/Data/imf-seal-shadow-sep2019-update.ashx', 'https://www.imf.org/-/media/Images/IMF/Social/TW-Thumb/twitter-seal.ashx', 'https://www.imf.org/assets/imf/images/footer/IMF_seal.png'}

article.authors

[]

Read full documentation for newspaper3k newspaper3k

Todo

  • Save to MongoDB
  • Save to MySQL

License

MIT ©

Comments
  • Google News URL format update

    Google News URL format update

    Hello,

    Thanks for providing this piece of code.

    I have recently come across weird behavior regarding the period parameter (e.g. 7d, you can get news from weeks prior). More importantly, the number of news output have dramatically reduced recently when combining countries and languages or even just providing a language and leaving the country parameter to None (for English)

    Turning language parameter to any other language (e.g. French ['fr']) returns 0 articles systematically even for popular searches.

    I suspect Google has changed/updated their url format and/or available countries/languages !

    opened by sif-gondy 6
  • get_news stopped working

    get_news stopped working

    Been working on some code for past week and it had been working fine with get_news("topic") . Stopped working earlier get_top_news() still works. Tried using other keywords for topic but still returns nothing.

    Any debug help?

    opened by shorenewsbeacon 5
  • [Questions] Hello author. Is possible to make Gnews get news from multiple topics?

    [Questions] Hello author. Is possible to make Gnews get news from multiple topics?

    This is my test code. It working with single keyword. Now i tried to make it with multiple keyword. It possible to do that? Example :

    google_news = GNews(language='vi', country='Vietnam',
                        period='1h', max_results=20)
    json_resp = google_news.get_news('Covid', 'Apple')
    print(json_resp)
    
    opened by ghost 3
  • Feature/results in date range

    Feature/results in date range

    get_news('key') can search within a date range, if provided. Other functions return warnings if a date range has been provided as they do not support searching in this way. A workaround for each other function is suggested, but will provide slightly different results

    opened by tigsinthetrees 2
  • Top headlines?

    Top headlines?

    Nice!

    There doesn't currently appear to be a way to get news stories without specifying a topic (key). Could I modify the function so if the user uses get_news() without a key, or passing None or the empty string, it just grabs the top stories from the main feed for your locale?

    Do you support Category or Location based searches?

    Thanks!

    opened by aaronchantrill 2
  • Streamlined workflows by minimizing clutter.

    Streamlined workflows by minimizing clutter.

    ⚔️ Things changed:

    This PR primarily focuses on .github/workflows/python-publish.yml.

    • The workflow now only triggers upon manual dispatch / a successful published release (which the comments at the beginning said that it did but actually didn't).

    • The PyPI publish workflow doesn't require multiple dependencies / commands to be set up anymore. The build job uses the build Python package and the publishing workflow has been changed to the official one provided by the Python Packaging Authority.

    • Bumped dependency version for checking out source code.

    • Bumped dependency version for setting up Python.

    🔖 To make this work:

    Since the publishing workflow has been changed, you will need to remove these secrets from the repository:

    • PYPI_USERNAME
    • PYPI_PASSWORD

    ... and replace them with PYPI_API_TOKEN. This secret will contain a token provided by PyPI itself, which you can get from the Manage page of your project by clicking on "Create a token for project-name".

    I hope this helps :D

    opened by hitblast 1
  • Stop unauthorized and redundant installs of the

    Stop unauthorized and redundant installs of the "newspaper" library

    The call to utils.import_or_install() introduced several issues:

    • It was called with the parameter "newspaper3k". Since the correct import name is "newspaper" ("newspaper3k" is used for installation purposes), the __import __ call always failed, which means every call to the get_full_article() method would result in a redundant "pip install" process starting.

    • Just to reiterate: Every user of this library right now sees a long and cryptic pip install output message WITH EACH AND EVERY CALL they make to get_full_article().

    • Installing pip packages and/or modifying a user's environment without permission or any indication of such behavior (nothing in the docs) is unacceptable.

    • Installing packages using direct calls to pip module internals is NOT the way to install packages and also yields warnings regarding the usage of incorrect pip wrappers.

    opened by valorien 1
  • Problems about get full article in a docker container

    Problems about get full article in a docker container

    Hello. I have some doubts about this line https://github.com/ranahaani/GNews/blob/master/gnews/gnews.py#L86. I need to run GNews in a docker container by using Airflow in order to get information about articles. I got the following message:

    WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
    Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
    To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
    

    And then Airflow sent me a negative signal to fail my task.

    opened by aoelvp94 1
  • results are limited to 100

    results are limited to 100

    hi, it seems that I can't get more than 100 results despite changing the max_Result, why is that and would I get different results if I repeat the search ?

    opened by Alloooshe 1
  • [Snyk] Security upgrade certifi from 2021.10.8 to 2022.12.7

    [Snyk] Security upgrade certifi from 2021.10.8 to 2022.12.7

    Snyk has created this PR to fix one or more vulnerable packages in the `pip` dependencies of this project.

    Changes included in this PR

    • Changes to the following files to upgrade the vulnerable dependencies to a fixed version:
      • requirements.txt
    ⚠️ Warning
    requests 2.26.0 requires certifi, which is not installed.
    
    

    Vulnerabilities that will be fixed

    By pinning:

    Severity | Priority Score (*) | Issue | Upgrade | Breaking Change | Exploit Maturity :-------------------------:|-------------------------|:-------------------------|:-------------------------|:-------------------------|:------------------------- medium severity | 626/1000
    Why? Recently disclosed, Has a fix available, CVSS 6.8 | Insufficient Verification of Data Authenticity
    SNYK-PYTHON-CERTIFI-3164749 | certifi:
    2021.10.8 -> 2022.12.7
    | No | No Known Exploit

    (*) Note that the real score may have changed since the PR was raised.

    Some vulnerabilities couldn't be fully fixed and so Snyk will still find them when the project is tested again. This may be because the vulnerability existed within more than one direct dependency, but not all of the affected dependencies could be upgraded.

    Check the changes in this PR to ensure they won't cause issues with your project.


    Note: You are seeing this because you or someone else with access to this repository has authorized Snyk to open fix PRs.

    For more information: 🧐 View latest project report

    🛠 Adjust project settings

    📚 Read more about Snyk's upgrade and patch logic


    Learn how to fix vulnerabilities with free interactive lessons:

    🦉 Learn about vulnerability in an interactive lesson of Snyk Learn.

    opened by snyk-bot 0
  • [Snyk] Security upgrade python from 3.10.0 to 3.12.0a3

    [Snyk] Security upgrade python from 3.10.0 to 3.12.0a3

    This PR was automatically created by Snyk using the credentials of a real user.


    Keeping your Docker base image up-to-date means you’ll benefit from security fixes in the latest version of your chosen image.

    Changes included in this PR

    • Dockerfile

    We recommend upgrading to python:3.12.0a3, as this image has only 272 known vulnerabilities. To do this, merge this pull request, then verify your application still works as expected.

    Some of the most important vulnerabilities in your base image include:

    | Severity | Priority Score / 1000 | Issue | Exploit Maturity | | :------: | :-------------------- | :---- | :--------------- | | critical severity | 714 | Directory Traversal
    SNYK-DEBIAN11-DPKG-2847942 | No Known Exploit | | critical severity | 714 | Out-of-bounds Read
    SNYK-DEBIAN11-LIBTASN16-3061097 | No Known Exploit | | critical severity | 714 | OS Command Injection
    SNYK-DEBIAN11-OPENSSL-2807596 | No Known Exploit | | critical severity | 714 | OS Command Injection
    SNYK-DEBIAN11-OPENSSL-2933518 | No Known Exploit | | high severity | 614 | Improper Input Validation
    SNYK-DEBIAN11-XZUTILS-2444276 | No Known Exploit |


    Note: You are seeing this because you or someone else with access to this repository has authorized Snyk to open fix PRs.

    For more information: 🧐 View latest project report

    🛠 Adjust project settings


    Learn how to fix vulnerabilities with free interactive lessons:

    🦉 Learn about vulnerability in an interactive lesson of Snyk Learn.

    opened by ranahaani 0
  • Unable to obtain news Reports within a specified Date range

    Unable to obtain news Reports within a specified Date range

    I am trying to obtain news reports within a specific date range, using start_date and end_date parameters, but the ranging doesn't seem to work. It fetches the top news reports from the current date only. I have also attached the code and results image below. I have tried both the tuple approach as well as the datetime object approach, but none seem to work. I have also pointed to particular piece of code, which could have not set the parameter for end date. Screenshot 2022-11-23 at 11 48 52 AM

    Screenshot 2022-11-23 at 11 51 00 AM Screenshot 2022-11-23 at 11 56 16 AM
    opened by AryanKapadia 0
  • Nothing is fetched anymore

    Nothing is fetched anymore

    For some reason, I cannot seem to fetch any news anymore, not even with the README example. Could it be an IP issue? It seems to be working when using VPN.

    opened by rolandgvc 2
  • Allow config parameter in the gnews.get_full_article()

    Allow config parameter in the gnews.get_full_article()

    I using GNews get_full_article() function to extract the top_image from the Article. However, when I run this on my production server it throws me the below error:

    ERROR: Articledownload()failed with HTTPSConnectionPool(host='indianexpress.com', port=443): Max retries exceeded with url: /article/idea-exchange/gautam-gambhir-idea-exchange-first-challenge-mcd-polls-change-narrative-bjp-doesnt-do-anything-8158944/ (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden'))) on URL https://indianexpress.com/article/idea-exchange/gautam-gambhir-idea-exchange-first-challenge-mcd-polls-change-narrative-bjp-doesnt-do-anything-8158944/

    I searched through Google and ended up with this solution:

    from newspaper import Article
    from newspaper import Config
    
    user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
    config = Config()
    
    config.browser_user_agent = user_agent
    
    url = "https://www.chicagotribune.com/nation-world/ct-florida-school-shooter-nikolas-cruz-20180217-story.html"
    
    page = Article(url, config=config)
    
    
    page.download()
    page.parse()
    print(page.text)
    

    As per the code above, I need to mention the user agentand get it assigned to config.browser_user_agent to prevent the server from getting banned. However, if I want to use gnews.get_full_article() I am not able to specify the config parameter inside. Is there any provision to mention this parameter? Am I missing something?

    opened by sohaibrahman64 1
  • I got a lot less links than earlier in the last month / 2 months?

    I got a lot less links than earlier in the last month / 2 months?

    Hi,

    I use GNews a lot and lately I've been having some trouble with it.. The code I use is: google_news.get_news("News I want to get"). Normally I get several hundred links a day, now only a few? It seems that he can't find anything anymore, because even if I use the code once for a specific search term, it doesn't find all the news by far?

    I've been using the same code for a long time and nothing has changed in the code. I also did not change the version of GNews. I suddenly just got a lot less links?

    opened by Colder347 2
  • Non Issue - Just a Suggestion

    Non Issue - Just a Suggestion

    First off, thanks for creating and releasing this very helpful Package, it saved me a lot of time from coding it on my own for my quick project.

    The only suggestion I have is in reference to the formatting of the return value for 'description' What gets returned to me is not a description of the article but a series of short titles from other news sites, without links. I know you are running it through BeautifulSoup which removes the links and the list structure and what is left is a confusing mess.

    I modified the code so that I get back everything I want, but for other users you may want to add an option to switch that on and off. I added this to my base and control it during initialization of GNews, now I switch formatting by BeautifulSoup on/off with a simple option I pass once anytime it's needed. Considering most consumers of your API are technical this will not be confusing.

    opened by RaulEstaka 1
Releases(0.2.3)
Owner
Muhammad Abdullah
Python/Django
Muhammad Abdullah
A webdriver-based script for reserving Tsinghua badminton courts.

AutoReserve A webdriver-based script for reserving badminton courts. 使用说明 下载 chromedriver 选择当前Chrome对应版本 安装 selenium pip install selenium 更改场次、金额信息dat

Payne Zhang 4 Nov 09, 2021
Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

SpaceX Sofware I developed software to scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info to use the software you need Python a

Maxence Rémy 16 Aug 02, 2022
Searching info from Google using Python Scrapy

Python-Search-Engine-Scrapy || Python-爬虫-索引/利用爬虫获取谷歌信息**/ Searching info from Google using Python Scrapy /* 利用 PYTHON 爬虫获取天气信息,以及城市信息和资料**/ translatio

HONGVVENG 1 Jan 06, 2022
LSpider 一个为被动扫描器定制的前端爬虫

LSpider LSpider - 一个为被动扫描器定制的前端爬虫 什么是LSpider? 一款为被动扫描器而生的前端爬虫~ 由Chrome Headless、LSpider主控、Mysql数据库、RabbitMQ、被动扫描器5部分组合而成。

Knownsec, Inc. 321 Dec 12, 2022
Examine.com supplement research scraper!

ExamineScraper Examine.com supplement research scraper! Why I want to be able to search pages for a specific term. For example, I want to be able to s

Tyler 15 Dec 06, 2022
This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

crawler_to_visual_gmane Analyzing an EMAIL Archive from gmane and vizualizing the data using the D3 JavaScript library. This is a set of tools that al

Saim Zafar 1 Dec 20, 2021
A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

Universal Online Judge Spider Introduction This is a spider for Universal Online Judge (UOJ) system (https://uoj.ac/). It also works for all other Onl

TriNitroTofu 1 Dec 07, 2021
Anonymously scrapes onlinesim.ru for new usable phone numbers.

phone-scraper Anonymously scrapes onlinesim.ru for new usable phone numbers. Usage Clone the repository $ git clone https://github.com/thomasgruebl/ph

16 Oct 08, 2022
A modern CSS selector implementation for BeautifulSoup

Soup Sieve Overview Soup Sieve is a CSS selector library designed to be used with Beautiful Soup 4. It aims to provide selecting, matching, and filter

Isaac Muse 151 Dec 23, 2022
Explore scraping with BeautifulSoup!

beautifulsoup-scrape Explore scraping with BeautifulSoup! Part One: Start from Shakespeare As my professor is a poet (yes, and he teaches me data and

Chuqin 2 Oct 05, 2022
WebScrapping Project - G1 Latest News

Web Scrapping com Python Esse projeto consiste em um código para o usuário buscar as últimas nóticias sobre um termo qualquer, no site G1. Para esse p

Eduardo Henrique 2 Feb 13, 2022
UdemyBot - A Simple Udemy Free Courses Scrapper

UdemyBot - A Simple Udemy Free Courses Scrapper

Gautam Kumar 112 Nov 12, 2022
Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation This repository provides two web crawlers to label domain nam

1 Nov 05, 2021
京东茅台抢购最新优化版本,京东秒杀,添加误差时间调整,优化了茅台抢购进程队列

京东茅台抢购最新优化版本,京东秒杀,添加误差时间调整,优化了茅台抢购进程队列

776 Jul 28, 2021
Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc).

Amit 6 Aug 26, 2022
自动完成每日体温上报(Github Actions)

体温上报助手 简介 每天 10:30 GMT+8 自动完成体温上报,如想修改定时运行的时间,可修改 .github/workflows/SduHealthReport.yml 中 schedule 属性。 如果当日有异常,请手动在小程序端/PC 端填写!

Teng Zhang 23 Sep 15, 2022
A Spider for BiliBili comments with a simple API server.

BiliComment A spider for BiliBili comment. Spider Usage Put config.json into config directory, and then python . ./config/config.json. A example confi

Hao 3 Jul 05, 2021
Binance Smart Chain Contract Scraper + Contract Evaluator

Pulls Binance Smart Chain feed of newly-verified contracts every 30 seconds, then checks their contract code for links to socials.Returns only those with socials information included, and then submit

14 Dec 09, 2022
This program scrapes information and images for movies and TV shows.

Media-WebScraper This program scrapes information and images for movies and TV shows. Summary For more information on the program, read the WebScrape_

1 Dec 05, 2021
对于有验证码的站点爆破,用于安全合法测试

使用方法 python3 main.py + 配置好的文件 python3 main.py Verify.json python3 main.py NoVerify.json 以上分别对应有验证码的demo和无验证码的demo Tips: 你可以以域名作为配置文件名字加载:python3 main

47 Nov 09, 2022