If any of you are like me, you find that your scrapers are limited from time to time by the number of requests you can make to a website within a certain period of time. To get around this, you can setup python scraping through nordvpn to automatically change of your IP addresses. There are other programs as well…Tor allows this for free. I personally use NORD, because I have had better performance and the cost is a few dollars per month not to have to deal with the headaches.
So I setup my cron job as follows
Every hour (9:00, 10:00, etc) it will connect to Nord
5 Mins past every hour (9:05, 10:05, etc) it will run my scraper
15 Mins past every hour (9:15, 10:15, etc) it will disconnect from Nord
30 Min past every hour (9:30, 10:30, etc) it will reconnect to Nord (this usually gives it enough time to get a different IP. If you log back in immediately you will attach to the same server. This fixes that.
35 Mins past every hour (9:35, 10:35, etc) it runs the scraper
45 Mins past every hour (9:45, 10:45, etc) it disconnects from Nord
To set this up I run the following
sudo bash
Enter Password
crontab -e
Select 1 (for nano)
Scroll to the bottom of the file and add the following:
0 * * * * /usr/bin/nordvpn connect >> /home/user/cron.log 2>&1
5 * * * * python3 /home/user/scraper.py >> /home/user/cron.log 2>&1
15 * * * * /usr/bin/nordvpn d >> /home/user/cron.log 2>&1
30 * * * * /usr/bin/nordvpn connect >> /home/user/cron.log 2>&1
35 * * * * python3 /home/user/scraper.py >> /home/user/cron.log 2>&1
45 * * * * /usr/bin/nordvpn d >> /home/user/cron.log 2>&1
Save the file, exit, and you are all set. Your scraper will run as directed