Fabio Rosado
My time is divided between flying and coding
Keep up with reading list
Attempt to figure out why CI isn't reporting the status to github
Run script to run all the scrapy spiders and check how many items we can get (around 600)
Change export encoding to utf-8 for right encoding in json when scrapper is run
Check each url and split url from mail rss scrapper to get the right categories and source
Update scrapper to include article's date
Add try/except to helper function to check if article is todays
Update scrapper settings to handle user agent errors
Update scrappy spiders to add categories and source
Train classifier with new dataset and get an accuracy of 77% \o/
Add 57 new rss feeds to scrape news
Test classifier on 450k dataset - speed test and works very well
Use nltk.SentimentIntensityAnalysis to add another layer to the classify method
Refactor old Tweeter classifier to improve performance
Update the classifier code to use old twitter one since the new one is way too slow!
Create helper function to check if date from article is from todays date and return bool
Test new classifier with json file and check how quick it can classify compared with actual one
Shipping live on Shipstreams! https://shipstreams.com/FabioRosado
Replace classifier with old twitter classifier - got good results and it’s blazing fast!
Create offline scene for twitch
Improve classification speed by moving loading of the vocabulary and classifier into the classifier class
Add logic to discard articles from dates that are not today
Test classifier with the data
Create script to run all the scrapers at once and get the data into the same file