How to Create a Web Scraper/Crawler using Python and Scrapy
- Project length: 2h 57m
This tutorial will cover the widely used Python framework - Scrapy. You will learn how to use this great tool to create your own web scrapers/crawlers. We are going to create a couple different spiders including a simple Wikipedia Scraper.
In this tutorial, we are going to learn about the widely used Python framework - Scrapy. Using our new knowledge in Scrapy we are going to create a few 'spiders' including a Wikipedia scraper. I will guide you through each step of the process including debugging common issues when working with Scrapy and web scrapers/crawlers in general.
What are the requirements?
- Basic Python knowledge is required as we are not going to go over how Python works in this tutorial
- HTML and CSS knowledge would be very beneficial to understand how the spiders work
What is the target audience?
- Everyone interested in creating a web scraper/crawler
- Everyone who wants to polish their python skills
Session 1: Setting up the Environment
- In this video, we are going to perform a full install of our editor VsCode in a Linux environment (Lubuntu ).
Session 2: Scrapy Installation
- This video will guide you through Scrapy installation with a demonstration in our Linux environment.
Session 3: Our first Scrapy project
- Introduction to Scrapy. How to create your first Scrapy project.
Session 4: Extracting website data
- In this video, you will scrap your first website data using a Scrapy spider
Session 5: Scrapy shell
- Here, you will learn how to use the very powerful Scrapy shell
Session 6: About web crawling / scrapping
A short video with a little discussion about the legal aspect of web crawling/scraping
Session 7: Creating a Scrapy spider
- In this video we are going to create our first fully function spider, that will scrap quotes from multiple web pages.
Session 8: Wiki Scraper Intro
- In this short video, you will learn about the main project of this tutorial. The Wikipedia Scraper
Sessions 9.1: Scraping Wikipedia Part 1
- In this session we are going to start scraping data from Wikipedia using our new Wikipedia Spider
Sessions 9.2: Scraping Wikipedia Part 2
- In this session we are going to solve a few problems that came up in the previous session and also test our new spider
Session 10: Scraping Multiple pages
- We are going to take our app one step further and scrap multiple Wikipedia pages at the same time
Session 11: Scrapy Items
- We will polish our program by changing the way we store our items using Scrapy Items
Session 12: Scrapy selection using CSS
- In this final part of the project, we are going to create another spider that instead of XPath, it is going to use CSS to scrap our web page elements