I wanted to participate in Udacity’s contest started right after the CS101 completion.
Since almost everything in my life lately is electronics related i implemented a small news reader system.
It connects to 4 websites and regularly reads the news, stores them into a local database and displays them in a clear, easy to read fashion.
Long story short, this is a small demo video on Youtube :
—– Long story following —–
Fetches content from the news sites I usually follow and stores them into a local database.
Why is it useful ?
During work days sometimes i need to take a short break and clear my mind. I browse through news sites looking for what’s new in the world. Since my interest are IT, Electronics and game development most of these websites are related. The problem is not all sites are the same: some are white on black, some are black on whitish, some have a lot of flashy banners and advertising on them. I need a place where i can find the information i am interested in, displayed in a clean tidy way my eyes are comfortable with. I don’t need to waste 15 minutes to mentally filter the information and then another 10, 15 minutes to get back to what i was doing before reading.
So far there are 5 sites in config files : slashdot.org, hackaday.com, dangerousprototypes.com, eevblog.com and gamasutra.com
The system also highlights news which contain specific keywords.
How it works
It parses HTML content and cleans the information using regular expression and standard string search.
When I want to see “what’s new” I call the second script which reads all news from the local databases (which is implemented using pickle serialization system from python), sorts them according to dates and generates two webpages:
– The first contains last “#”news (# configurable number parameter)
– The second one contains the news from today
The “posted date” is displayed using a pretty format (eg. Yesterday, 2 weeks ago, 35 minutes ago)
If the news site is not displaying a date for the entries it posts then the date is replaced by the current date to make sure the local sorting of the content will keep working.
All the configuration is done using two text config files. Each news site is specified as a section in one of the config files and the section contains all the information necessary to parse the HTML data.
There is an output folder in which you can find JQuery library, a header and a footer used to generate the HTML webpages, CSS file for specifying the layout and pictures for every site/section in database (just to make the output a little bit more friendly and nice).
How it looks like ?
This is a small demo video on Youtube which shows how you can get it and how it works
The string “Currently matched keywords” is added to announce the use of one of those keywords inside news “body” or “title”
How do I use it ?
It is very easy to use hence the title. Modify “newsreader.cfg” and “readit.cfg” config files according to the explanations found inside or leave them as they are to read the news from the sites already configured.
Call newsreader.py (one time or regularly using a task scheduler or a cron-like system) and fetch the news.
Call readit.py to compile all the news and generate webpages.
Open the webpages in a web browser of your liking. readit.py is also opening the “last # news” webpage automatically in the default browser in a windows based system.
That’s it ! Enjoy reading clean news
Where can i find it ?
The code is hosted on github: Easy News Reader repository
Why not just import RSS feeds ?
Almost all of the sites I follow export only partial information in their RSS feeds and some of them even put advertising inside. Parsing the webpages is a much better, faster, cleaner and complete solution
Is it legal ?
The system parses robots.txt file and checks if it is allowed to read the news. Also the purpose of this system is to implement a personal way to read and not to republish the content in any other way on the web.
The sources are released in CC BY-NC-SA license and can be found on GitHub EasyNewsReader Repository