Skip to content
Home » Scraping » Page 2

Scraping

SCRAPE DAILY SHORT DATA FROM FINRA

Here is a quick how-to on getting the daily short data. From this I look to see if the short volume has increased or decreased on a particular stock by leveraging a Power BI Dashboard that I had created. This guide is to show the simple command I run to initially scrape all the data

From a Linux box

Open a terminal

Type:
mkdir FINRA
cd FINRA
wget -r -np http://regsho.finra.org/regsho-Index.html

After it downloads the data I run
rm *.html

To remove the additional files that are not relevant, and then I move the data to where I need it

EXPORT TELEGRAM CHAT HISTORY FOR DATA ANALYTICS

From time to time you come across very valuable telegram channels with great information in them. You can run analytics on the chats, to identify trends and make decisions based on the data. To export the data perform the following:

Download and install the Telegram Desktop App
Login using your phone number and the code that Telegram sends you in the Phone App
Click on the Telegram Channel that you want to export
Click on the three dots at the top right of the chat window
Select Export Chat History
Place a check mark in all of the boxes, set the size limit to 2000MB and then click export
This will give you all the data.
If you want chats only, make sure you do not place checkmarks in all the extra fields
Click Export
Click Allow from your phone (yes you need to use both the desktop app and your phone app)
After 24 hours or so the Data will be available for you to download

SCRAPING YAHOO FINANCE FOR ALL STOCK DATA

I found a great little snippet of code over on Kaggle,


and needed to build a system to run it from
https://tacticalware.com/jupyter-installation-on-a-headless-raspberry-pi-4-running-ubuntu-20-10/

Once you build the system as I did in my guide above you will need to do one other thing to get it to run
Open Putty
SSH to your Jupyter Pi
Login
Type:
sudo bash
pip instal yfinance

Now on another computer you can open your Jupyter notebook at
http://Jupyter:8888

Then go back to


Click File
Click Download
It will give you a file with the extension of ipynb

You can then go back to your Jupyter notebook and upload it

Once you upload it, run the notebook and it should scrape about 2.5GB of data, direct for your viewing pleasure

Hardware that I used:
Raspberry Pi 4 (4gb)
https://amzn.to/3q551IO

Plugable USB C to M.2 NVMe Tool-free Enclosure
https://amzn.to/3lflV3L

CanaKit 3.5A Raspberry Pi 4 Power Supply (USB-C)
https://amzn.to/3fNTYPu

CanaKit Raspberry Pi 4 Micro HDMI Cable – 6 Feet
https://amzn.to/33u5hr9

Western Digital 500GB WD_Black SN750 NVMe
https://amzn.to/3nZ5pH4


JUPYTER INSTALLATION ON A HEADLESS RASPBERRY PI 4 RUNNING UBUNTU 20.10

More and More, I am finding my way to Jupyter Notebooks. And guides online are scarce to come by, especially when you are running it on a Raspberry Pi 4. Below is a mixture of a few guides that I had found, and this is what worked for me.

For my setup I am running a Raspberry Pi 4 from a NVME M.2 Drive connected via USB 3.0. I am NOT using the slow micro SD card. To have the same setup as me, follow this guide:
https://tacticalware.com/boot-raspberry-pi-4-from-m-2-usb-drive/

After your Pi recognizes the M.2 NVME Drive, I then use Raspberry Pi Imager to write Ubuntu Server 20.10 to the USB Drive. Then I plug it into the Pi, to get ready for the steps below

Now boot your Raspberry Pi 4 and Login
Then
sudo bash
apt-get update
apt-get dist-upgrade
apt-get install net-tools
sudo hostnamectl set-hostname Jupyter
reboot
sudo bash
sudo rm /usr/bin/python
sudo ln -s /usr/bin/python3 /usr/bin/python
sudo apt-get install python3-pip
sudo pip3 install –upgrade pip
sudo apt-get install npm
sudo npm install -g configurable-http-proxy
sudo -H pip3 install notebook jupyterhub
jupyterhub –generate-config
sudo mv jupyterhub_config.py /root
nano /root/jupyterhub_config.py
ctrl +w
http://:8000

change to http://:8888
uncomment and delete space on this line c.JupyterHub.bind_url = ‘http://:8888’
ctrl +x
y
jupyterhub -f /root/jupyterhub_config.py

nano /lib/systemd/system/jupyterhub.service
[Unit]
Description=JupyterHub Service
After=multi-user.target

[Service]
User=root
ExecStart=/usr/local/bin/jupyterhub –config=/root/jupyterhub_config.py
Restart=on-failure

[Install]
WantedBy=multi-user.target

ctrl + x to exit
y to save

sudo systemctl daemon-reload
sudo systemctl start jupyterhub
sudo systemctl enable jupyterhub
sudo systemctl status jupyterhub.service

on a different computer
open an internet browser
http://Jupyter:8888
login as any user

Back on the Raspberry Pi
sudo -H pip3 install testresources
sudo -H pip3 install jupyterlab
jupyter serverextension enable –py jupyterlab –system
nano /root/jupyterhub_config.py
uncomment and modicy the c.Spawner.default line to reflect the following:
c.Spawner.default_url = ‘/lab’
ctrl + x
to exit
y to save
sudo apt-get install libatlas-base-dev
y
sudo -H pip3 install numpy
pip3 install seaborn
pip install Cython numpy
pip install cmake

sudo addgroup jupyter_admin
sudo adduser tacticalware
sudo usermod -aG jupyter_admin tacticalware
nano /root/jupyterhub_config.py

ctrl +x to exit
y
to save
Add this to the end of the file
c.PAMAuthenticator.admin_groups = {‘jupyter_admin’}
ctrl + x to exit

You are now finished. On the other computer
Open an internet browser
http://Jupyter:8888
And play!

Hardware that I used:
Raspberry Pi 4 (4gb)
https://amzn.to/3q551IO

Plugable USB C to M.2 NVMe Tool-free Enclosure
https://amzn.to/3lflV3L

CanaKit 3.5A Raspberry Pi 4 Power Supply (USB-C)
https://amzn.to/3fNTYPu

CanaKit Raspberry Pi 4 Micro HDMI Cable – 6 Feet
https://amzn.to/33u5hr9

Western Digital 500GB WD_Black SN750 NVMe
https://amzn.to/3nZ5pH4