About This Book
- A hands-on guide to web scraping with real-life problems and solutions
- Techniques to download and extract data from complex websites
- Create a number of different web scrapers to extract information
Who This Book Is For
This book is aimed at developers who want to build reliable solutions to scrape data from websites. It is assumed that the reader has prior programming experience with Python. Anyone with general knowledge of programming languages should be able to pick up the book and understand the principles involved.
What You Will Learn
- Follow links to crawl a website
- Extract data from web pages with lxml
- Build a threaded crawler to process web pages in parallel
- Cache downloads to reduce bandwidth
- Interact with forms and sessions
- Solve CAPTCHAs on protected web pages
- Reverse engineer AJAX calls
- Create high level scrapers with Scrapy
Web scraping is becoming increasingly useful as a way to easily gather and make sense of the plethora of information available online. Using a simple language such as Python, you can scrape complex websites with little programming.
This book is the ultimate guide to using Python to scrape data from websites. It covers how to extract data from static web pages and how to use caching to manage the load on servers. Learn how to use AJAX URLs, employ the Firebug extension. Discover more scraping nitty-gritties such as using a browser renderer, managing cookies, submitting forms to extract data from complex websites protected by CAPTCHA, and so on. Finally, create high-level scrapers with Scrapy and implement what has been learned on real websites.
To view this DRM protected ebook on your desktop or laptop you will need to have Adobe Digital Editions installed. It is a free software. We also strongly recommend that you sign up for an AdobeID at the Adobe website. For more details please see FAQ 1&2. To view this ebook on an iPhone, iPad or Android mobile device you will need the Adobe Digital Editions app, or BlueFire Reader or Txtr app. These are free, too. For more details see this article.
|Size: ||5.4 MB|
|Publisher: ||Packt Publishing|
|Date published: || 2015|
|ISBN: ||9781782164371 (DRM-EPUB)|
|Read Aloud: ||not allowed|