Visit LEDU token page

How to Create a Web Scraper/Crawler using Python and Scrapy

How to Create a Web Scraper/Crawler using Python and Scrapy

  • English
  • Programming
  • PythonPython
  • (2437)
  • Project length: 2h 57m

This tutorial will cover the widely used Python framework - Scrapy. You will learn how to use this great tool to create your own web scrapers/crawlers. We are going to create a couple different spiders including a simple Wikipedia Scraper.

Overview

In this tutorial, we are going to learn about the widely used Python framework - Scrapy. Using our new knowledge in Scrapy we are going to create a few 'spiders' including a Wikipedia scraper. I will guide you through each step of the process including debugging common issues when working with Scrapy and web scrapers/crawlers in general.

What are the requirements?

  • Basic Python knowledge is required as we are not going to go over how Python works in this tutorial
  • HTML and CSS knowledge would be very beneficial to understand how the spiders work

What is the target audience?

  • Everyone interested in creating a web scraper/crawler
  • Everyone who wants to polish their python skills

Project Outline

Session 1: Setting up the Environment

  • In this video, we are going to perform a full install of our editor VsCode in a Linux environment (Lubuntu ).

Session 2: Scrapy Installation

  • This video will guide you through Scrapy installation with a demonstration in our Linux environment.

Session 3: Our first Scrapy project

  • Introduction to Scrapy. How to create your first Scrapy project.

Session 4: Extracting website data

  • In this video, you will scrap your first website data using a Scrapy spider

Session 5: Scrapy shell

  • Here, you will learn how to use the very powerful Scrapy shell

Session 6: About web crawling / scrapping

  • A short video with a little discussion about the legal aspect of web crawling/scraping

Session 7: Creating a Scrapy spider

  • In this video we are going to create our first fully function spider, that will scrap quotes from multiple web pages.

Session 8: Wiki Scraper Intro

  • In this short video, you will learn about the main project of this tutorial. The Wikipedia Scraper

Sessions 9.1: Scraping Wikipedia Part 1

  • In this session we are going to start scraping data from Wikipedia using our new Wikipedia Spider

Sessions 9.2: Scraping Wikipedia Part 2

  • In this session we are going to solve a few problems that came up in the previous session and also test our new spider

Session 10: Scraping Multiple pages

  • We are going to take our app one step further and scrap multiple Wikipedia pages at the same time

Session 11: Scrapy Items

  • We will polish our program by changing the way we store our items using Scrapy Items

Session 12: Scrapy selection using CSS

  • In this final part of the project, we are going to create another spider that instead of XPath, it is going to use CSS to scrap our web page elements

Reviews

Average rating

5(2437 Reviews)