with 80,000 images, the headache begins. for convenience sake (and my own clarity lol), here is the workflow below. pardon the extra step in extracting just the links #regret - it is possible to download the images directly with scrapy.
1) Extract links from Wikipedia This saves all the links into ‘items.csv’ in a tab delimited csv, using my spider called ‘myspider.py’.
scrapy runspider myspider.py -o items.csv -t csv
2) Download pictures from links, crop and resize (50x50) and upload to AWS
3) Resize (28x28) and resave it to another bucket in AWS
4) Save pictures to an array in AWS
5) Run test model on AWS