Scrapy の履歴(No.2) - PukiWiki

[ トップ ] [ 新規 | 一覧 | 検索 | 最終更新 | ヘルプ | ログイン ]

履歴一覧
差分を表示
現在との差分を表示
ソースを表示
Scrapy へ行く。
- 1 (2021-01-10 (日) 12:52:51)
- 2 (2021-01-10 (日) 20:42:58)
- 3 (2021-01-21 (木) 11:24:58)
- 4 (2021-02-07 (日) 14:26:56)
- 5 (2021-04-14 (水) 17:32:04)

情報
関連

Tag: スクレイピング Python

情報†

Scrapy 2.4 documentation — Scrapy 2.4.1 documentation

Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Scrapy 1.7 文書 — Scrapy 1.7.3 ドキュメント

Scrapyは高速で高レベルのWebクロール(web crawling)およびWebスクレイピング(web scraping)フレームワークであり、Webサイトをクロールし、ページから構造化データを抽出するために使用されます。

Python, Scrapyの使い方（Webクローリング、スクレイピング） | note.nkmk.me

Scrapyはスクレイピングだけでなくクローリングも行う。複数ページを対象とするのならScrapyのほうが便利。

10分で理解する Scrapy - Qiita

Scrapy はこれらのライブラリと違うレイヤーのもので、クローラーのアプリケーション全体を実装するためのフレームワークです

python - How can I get all the plain text from a website with Scrapy? - Stack Overflow
```
The easiest option would be to extract //body//text() and join everything found:
''.join(sel.select("//body//text()").extract()).strip()
```
- コメント：別案のBeautifulSoupを使う方がテキスト抽出の精度が良かった。(2021/01/10)
```
Another option is to use BeautifulSoup's get_text():
```

関連†

Python