川崎のシステム開発・アプリ開発・Web制作

Scraping News with newspaper3k

We will use newspaper3k to automatically extract structured information instead of spend a long time to write scraping code for each website. First step, we going to istall newspaper3k, open terminal and type : $ pip install newspaper3k After installation completed, open your editor and start coding: url = 'https://www.nytimes.com/live/2020/09/04/business/stock-market-today-coronavirus?action=click&module=Top%20Stories&pgtype=Homepage'

article = Article(url)
Then parse the article with the following code:
article.download()
article.parse()
After that parse the article, now we cat get any data from that page like author, text, image, publish date: Get author name:
Get publish date:
Get text:
Get image:
Get keywords:
The conclusion of this post is that we can do scraping news with various sources in python quite easily using the newspaper3k package, compared to if we have to spend a long time to write scraping code for each website. By using this package, we can retrieve basic information needed such as authors, publish date, text, and images from a news.
この記事を書いた人

コメントを残す

メールアドレスが公開されることはありません。 * が付いている欄は必須項目です