How to Create a Web Scraper/Crawler using Python and Scrapy

This tutorial will cover the widely used Python framework - Scrapy. You will learn how to use this great tool to create your own web scrapers/crawlers. We are going to create a couple different spiders including a simple Wikipedia Scraper.


In this tutorial, we are going to learn about the widely used Python framework - Scrapy. Using our new knowledge in Scrapy we are going to create a few 'spiders' including a Wikipedia scraper. I will guide you through each step of the process including debugging common issues when working with Scrapy and web scrapers/crawlers in general.

What are the requirements?

  • Basic Python knowledge is required as we are not going to go over how Python works in this tutorial
  • HTML and CSS knowledge would be very beneficial to understand how the spiders work

What is the target audience?

  • Everyone interested in creating a web scraper/crawler
  • Everyone who wants to polish their python skills

Project Outline

Session 1: Setting up the Environment

  • In this video, we are going to perform a full install of our editor VsCode in a Linux environment (Lubuntu ).

Session 2: Scrapy Installation

  • This video will guide you through Scrapy installation with a demonstration in our Linux environment.

Session 3: Our first Scrapy project

  • Introduction to Scrapy. How to create your first Scrapy project.

Session 4: Extracting website data

  • In this video, you will scrap your first website data using a Scrapy spider

Session 5: Scrapy shell

  • Here, you will learn how to use the very powerful Scrapy shell

Session 6: About web crawling / scrapping

  • A short video with a little discussion about the legal aspect of web crawling/scraping

Session 7: Creating a Scrapy spider

  • In this video we are going to create our first fully function spider, that will scrap quotes from multiple web pages.

Session 8: Wiki Scraper Intro

  • In this short video, you will learn about the main project of this tutorial. The Wikipedia Scraper

Sessions 9.1: Scraping Wikipedia Part 1

  • In this session we are going to start scraping data from Wikipedia using our new Wikipedia Spider

Sessions 9.2: Scraping Wikipedia Part 2

  • In this session we are going to solve a few problems that came up in the previous session and also test our new spider

Session 10: Scraping Multiple pages

  • We are going to take our app one step further and scrap multiple Wikipedia pages at the same time

Session 11: Scrapy Items

  • We will polish our program by changing the way we store our items using Scrapy Items

Session 12: Scrapy selection using CSS

  • In this final part of the project, we are going to create another spider that instead of XPath, it is going to use CSS to scrap our web page elements


