Scraping with Scrapy

01 Jan 2018


Sooooo I was going to scrap with scrapy - this was a million times more painful than i imagined!!!

likely because i had no clue what i was doing. lol

here goes:

i had a tutorial from class (very similar to this[] ( which i used to set up scrapy the proper way - but honestly, if you follow the commands (for build and run your web spiders) on the first page of scrapy at you have your spider up and running in less than a minute with the example!

i should have tried this earlier :(

the tutorials on the herman website and of course on scrapy teach you the fundamentals which are extremely useful, but hey, you gotta make things work first before you delve into the details, right? …right?

one thing to note - remember to do try and except to catch errors! maybe it is only a noob like me that forgets… you don’t want your process to stop a few minutes after you go to sleep…

in other news - my processes stop running after my ssh session is ended! faints.

obviously, i did not realise this and wasted one night yesterday… i shut down my computer. of course the logical thing is that since this is a cloud server, it continues running, right? WRONG.

ok so how do i get this thing going?? apparently there’s this thing called tmux (haven’t tried it yet) that helps.

see here:([]