HAPPY NEW YEAR!!
Sooooo I was going to scrap with scrapy - this was a million times more painful than i imagined!!!
likely because i had no clue what i was doing. lol
i had a tutorial from class (very similar to this[http://mherman.org/blog/2012/11/05/scraping-web-pages-with-scrapy/] (http://mherman.org/blog/2012/11/05/scraping-web-pages-with-scrapy/) which i used to set up scrapy the proper way - but honestly, if you follow the commands (for build and run your web spiders) on the first page of scrapy at https://scrapy.org/ you have your spider up and running in less than a minute with the example!
i should have tried this earlier :(
the tutorials on the herman website and of course on scrapy https://doc.scrapy.org/en/0.16/intro/tutorial.html teach you the fundamentals which are extremely useful, but hey, you gotta make things work first before you delve into the details, right? …right?
one thing to note - remember to do try and except to catch errors! maybe it is only a noob like me that forgets… you don’t want your process to stop a few minutes after you go to sleep…
in other news - my processes stop running after my ssh session is ended! faints.
obviously, i did not realise this and wasted one night yesterday… i shut down my computer. of course the logical thing is that since this is a cloud server, it continues running, right? WRONG.
ok so how do i get this thing going?? apparently there’s this thing called tmux (haven’t tried it yet) that helps.