Python Download PDF From URL Using BeautifulSoup4 and Requests Library

0Shares

In this tutorial, I will teach you how to download PDF files from URLs using Python programming language. The complete script to download pdfs from website is given below.

We will make use of Beautiful Soup 4 and Requests libraries to build the functionality of downloading PDF files from URLs.

Python Download PDF From URL

Install Dependencies

pip install requests
pip install bs4

code.py

# Import libraries 
import requests 
from bs4 import BeautifulSoup 

# URL from which pdfs to be downloaded 
url = "https://nanonets.com/blog/deep-learning-ocr/"

# Requests URL and get response object 
response = requests.get(url) 

# Parse text obtained 
soup = BeautifulSoup(response.text, 'html.parser') 

# Find all hyperlinks present on webpage 
links = soup.find_all('a') 

i = 0

# From all links check for pdf link and 
# if present download file 
for link in links: 
    if ('.pdf' in link.get('href', [])): 
        i += 1
        print("Downloading file: ", i) 

        # Get response object for link 
        response = requests.get(link.get('href')) 

        # Write content in pdf file 
        pdf = open("pdf"+str(i)+".pdf", 'wb') 
        pdf.write(response.content) 
        pdf.close() 
        print("File ", i, " downloaded") 

print("All PDF files downloaded")

Run the Project

python code.py
0Shares

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.