
Python & Selenium Windows & MacOS Guide + Videos [2021]
In this article, I am going to give you all the information and tools necessary to get your web testing automation or web scraping project up and running quickly using Python & Selenium.
Table of Contents
Introduction
Selenium is good for automated testing of web sites and web apps, as well as web scraping.
Web scraping involves finding patterns in web page source code that can be used to harvest useful data. Python, being a language great for list crawling, is a natural match with Selenium, an industry standard tool for browser automation.
Python/Selenium Beginners Free Course from Edureka
You can follow my guide below, but if you learn better with video, then I recommend this hour long free course on the subject by Edureka:
If you’re looking to become professional certified with Selenium or if you learn better with a live instructor and personalized help from tutors, then I recommend checking this out instead:
- Recommended course: Edureka! Selenium Certification Training
- Average annual salary for a Selenium Test Automation Engineer is $94,000 per year according to Indeed.com
- Checkout my previous Edureka comparisons and reviews against other elearning platforms.
Python + Selenium + Beautiful Soup Install Guide
At the time of writing, Python 3.8.5 is the latest version and is what I am linking to and installing. As new versions come out, you will probably want to grab those versions instead. The main Python download page is located here: https://www.python.org/downloads/
Linux Mint 19 Installation
This was in 2019, and I’m not covering Linux in the rest of the tutorial, but here are the commands that worked for me:
sudo apt install python-pip
pip install selenium
wget https://github.com/mozilla/geckodriver/releases/download/v0.23.0/geckodriver-v0.23.0-linux64.tar.gz
tar -xvzf geckodriver-v0.23.0-linux64.tar.gz
chmod +x geckodriver
sudo mv geckodriver /usr/local/bin/
pip install setuptools
pip install mysql-connector
sudo apt-get install python-bs4
pip install beautifulsoup4
1. Download and install Python 3.8.5
MacOS 64bit: https://www.python.org/ftp/python/3.8.5/python-3.8.5-macosx10.9.pkg
Windows 64bit: https://www.python.org/ftp/python/3.8.5/python-3.8.5-amd64.exe
python -V
If everything installed correctly, you should get a response similar to this:
2. Next install Selenium & Beautiful Soup 4.x
We use Python’s package manager, pip to install it.
From a Terminal prompt (MacOS) or an elevated CMD/PowerShell (Windows) execute this command:
pip install selenium
After that completes, type this command:
pip install beautifulsoup4
If you get an error, you probably need to install pip, although a default installation of Python now installs pip by default.
Install pip using these two commands in the same terminal (MacOS only)
sudo curl -O https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py
3. Download Browser Webdrivers
Next, we need to pick Firefox and/or Chrome and install their webdrivers. The webdrivers allow Selenium through Python to control the web browser as if they were a user.
The Chrome webdriver needs to be updated everytime Chrome updates, but the Firefox webdriver does not. This makes Firefox more desirable to use… except there is a known bug with the Firefox webdriver on MacOS, which makes it a little trickier to set up initially.
You must select the Chrome webdriver that corresponds to the version of the web browser you are using. Because of this requirement, I recommend disabling auto-updates on Chrome or use Firefox so that you don’t constantly need to download new webdrivers everytime Chrome releases an update.
Chrome Webdrivers Downloads
- Chrome version 86.x: Windows / Mac
- Chrome version 85.x: Windows / Mac
- Chrome version 84.x: Windows / Mac
- Other versions of Chrome: http://chromedriver.chromium.org/downloads
Firefox (geckodriver) Webdrivers Downloads
- Firefox version 60 or greater+ Windows 64-bit / MacOS
- Other versions of geckodriver: https://github.com/mozilla/geckodriver/releases
There are also drivers for Safari and Edge. You can always find the latest official drivers at this URL: https://pypi.org/project/selenium/
4. Install Browser Webdrivers
So you downloaded one or more webdrivers in the last step, now we need to install them. This means extracting the files and making sure the webdrivers are accessible by being in the system path. For Firefox on macOS 10.15 or above, it also means disabling the notarization requirement.
MacOS: I recommend downloading the webdrivers and extracting them to the ~/Downloads folder. Even if you install multiple webdrivers, you need to run these two commands once (don’t worry if the first command produces an error.)
This command creates one of the default MacOS path locations if it doesn’t exist (produces an error that you ignore if it does.
sudo mkdir -p /usr/local/bin
This command makes the default ~/Downloads folder (where ~ is a shortcut to the User’s folder) part of the path by symbolically linking the Downloads folder with the /usr/local/bin folder.
sudo ln -s ~/Downloads /usr/local/bin
sudo xattr -r -d com.apple.quarantine ~/Downloads/geckodriver
Fore more information on MacOS paths, even though this MacOS guide is out of date it is still accurate: https://coolestguidesontheplanet.com/add-shell-path-osx/ or the answers on this StackExhange topic: https://apple.stackexchange.com/questions/41542/adding-a-new-executable-to-the-path-environment-variable
Windows: I recommend extracting or moving your webdrivers to C:\Windows\ directory so that they are in a location in your path.
For Windows, this article thoroughly explains how to add to the PATH on each different version:
https://www.java.com/en/download/help/path.xml
chromedriver
geckodriver
You should get a response as pictured below:
5. Download & Customize My Free Web Scraping Script
Download my Python script to your PATH (as outlined above, so continuing with the defaults, this would be your ~\Downloads folder on macOS.)
- Download learnonlineshop.py
You launch this script by typing this at elevated terminal/CMD/PowerShell:
python learnonlineshop.py
or if that causes you issues, on MacOS you can try this elevated command instead:
sudo python learnonlineshop.py
Using this script will launch Firefox in a special mode with a new, blank profile, which means you will need to sign into TikTok, Instagram and YouTube or whatever site you want to scrape. That’s hard to do if the script/bot is controlling the browser, so I’ve given you 20 seconds to do it on the first page loaded.

Chrome looks like this when successful:
My Real-World Use-Cases
I’ve used Python + Selenium for several clients to help them take information on the web (that they had legal rights to use) and to scrape that data into databases and spreadsheets for every day business use.
For another company, I wrote Python & Selenium scripts that scraped popular social media accounts for lead generation. This project didn’t get finished because I read the TOS for the social media services and realized it was against their terms. Remember, don’t use this web scraping technology on website’s that specifically forbid it.
For a company that sells appliance parts, I wrote Python & Selenium scripts that scraped partner websites for inventory, then the script applied a price markup and used Amazon MWS to automatically list these appliance parts on Amazon if they were competitive. The example script I included in this article is lightly edited from one of those scripts.
Here is the full source code of this Python web scraper:
from selenium import webdriver # Selenium for opening browsers import mysql.connector #We need MySQL for this project from bs4 import BeautifulSoup #This make HTML parsing much easier import sys import time from selenium.webdriver.support.ui import Select #Make a db connection: mydb = mysql.connector.connect( host="localhost", user="database_user", passwd="database_password", database="database_table_name" ) mycursor = mydb.cursor() #Use this for messing around with database results browser = webdriver.Firefox() #replace with .Chrome(), or with the browser of your choice urlArray = [] # Make an array for all our pages result_brand = "BrandName" #fill out this before running per brand urlArray.append("https://www.example.com/login/") #log in to partner website first urlArray.append("https://www.example.com/category-a/") urlArray.append("https://www.example.com/category-b/") urlArray.append("https://www.example.com/category-c/") urlArray.append("https://www.example.com/category-d/") urlArray.append("https://www.example.com/category-r/") # The main stuff we're looking to extract for each product: result_title = "" # Inside of an A tag, inside of an H2, inside a TR inside a TD with class name "PD-name" inside table with class name "product-details" inside another table with class "product" /// Actually inside the IMAGE the ALT tag has a better title! result_cost = "0.00" # Inside a table class name "product-cart" in a span with class "PC-Price" result_img = "" # Inside TR in TD with class "product-image" in an A tag, in an IMG tag in the SRC result_partno = "" # Inside a TR, in a TD with class "PD-number" print("Starting Inventory Collection") i = 0 while i < len(urlArray): browser.get(urlArray[i]) try: if i != 0: #Don't do this on the first one #here we are trying to select 100 from the drop down select = Select(browser.find_element_by_id('MainContent_DDLPageSize')) select.select_by_value('100') print("Selected 100, waiting 5 seconds") time.sleep(5) except: print("Failed to change dropdown") searchResulted = browser.find_elements_by_xpath("//*[@class='product']") if i == 0: #Pause on the first one for 15 seconds so we can log in time.sleep(20) for searchResults in searchResulted: failed = 0 try: try: result_title_pre = searchResults.find_element_by_css_selector("td.PD-name h2 a") #find the title result_title = result_title_pre.text if result_title == 'NO LONGER AVAILABLE': print("This item is no longer available") failed = 1 except: print("Failed at title") failed = 1 try: result_cost_pre = searchResults.find_element_by_css_selector("span.PC-price") #get the price str(result_cost_pre).replace("Your Price:","") str(result_cost_pre).replace(" ","") result_cost = result_cost_pre.text except: print("Failed at price") failed = 1 try: result_img_pre = searchResults.find_element_by_css_selector('.product-image a img') result_img = result_img_pre.get_attribute("src") except: print("Failed at image") failed = 1 try: result_partno_pre = searchResults.find_element_by_css_selector("td.PD-number strong") result_partno = result_partno_pre.text except: print("Failed at Part Number") failed = 1 if failed == 0: val = (str(result_title), str(result_partno).replace("Part Number:",""), str(result_brand), str(result_img), str(result_cost).replace("$","")) #values in query go here print("Title: %s --- Cost: %s --- Image: %s --- PartNo: %s ---" % (result_title, result_cost, result_img, result_partno )) sql = "INSERT INTO `amazon_listings` (`title`, `description`, `part_no`, `model_no`, `brand`, `image`, `sorted`, `added_to_amazon`, `date_inserted_db`, `date_last_sorted`, `user_id_last_sorted`, `user_id_added_to_amazon`, `user_id_locking`, `az_listed_price`, `az_asin`, `az_barcode`, `az_barcode_type`, `az_sku`, `az_category`, `az_shipping_profile`, `az_qty`, `az_handling_time_days`, `az_title`, `az_brand`, `az_manufacturer`, `user_notes`, `current_supplier_id`, `current_supplier_cost`, `current_supplier_last_checked`, `current_supplier_last_checked_by_userid`, `metadata`) VALUES(%s, '', %s, '', %s, %s, 0, 0, '', '', 0, 0, 0, NULL, '', '', '', '', '', '', NULL, NULL, '', '', NULL, '', 10, %s, '', 0, '');" #query goes here mycursor.execute(sql, val) #execute the querry mydb.commit() #commit the changes to the db print("Added %s to the database!" % result_title) except: e = sys.exc_info()[0] print( "Error: %s
" % e ) i += 1 print("Finished Inventory Collection")
Editing the Script for Your Own Use
The script I’ve provided is good for a starting point, but you need your own MySQL database details and you need to be targeting the data you want to collect using the CSS selectors that the site uses.
You use urlArray.append(“https://www.example.com/category-a/”) to manually add URLs to scrape. This was useful to me because there were a small, finite number of URLs I needed to scrape. On other projects, I’ve had to use urlArray.append() as part of a multi-process of visiting a parent URL, then adding children URLs into the urlArray for later traversal in a second or third phase.
I like to employ plenty of try / except blocks so that I know what is and isn’t working and so the script keeps running even if there are small issues.
I like to use the Chrome or Firefox developer tools, especially the inspector, to find CSS elements to target with this script. I’ve used this exact script as a template to scrape several other websites, with the key being able to find CSS patterns that allowed me to target data in specific HTML elements. Since this will differ with every site, it’s imperative that you use tools like the Chrome or Firefox developer tools, especially the inspector, to locate the right CSS selectors to scrape.
I’ve also used this script to output into a CSV spreadsheet file instead of a database. The point is that you will have to modify this script regardless, so feel free to prune the MySQL content if it doesn’t fit your project.
Conclusion
Python & Selenium are a powerful combination that is relatively quick and easy to get started with. Let me know if I’ve helped you or if you have any additional questions, comments or concerns and I’ll be happy to help. Thanks for reading and happy coding!