BlogPump: Blog Post Client with Web Crawler(1) – big picture
1) Module A: Interface with supported Weblog Server to post/retrieve web page, article and others;
2) Module B: Container to support editor or list for data;
3) Module C: Interface with Web Crawler to grasp pages you wanted or articles relevant information against popular search engines;
4) Module D: Profile management for source, patterns and destination combination flexibility;
5) Module E: Data persistent module to store/read locally;
Actions are permanent for hard code like “Do Crawl”, “Do Post” and “Do Save/Read”. The how and where depend on plugin and profile.
Programming language: Python
First stage target server: WordPress hosted websites
HTML parser: beautiful soap or lxml
Protocol for post: XML-RPC
GUI: wxPython
Related posts:














