Can I read website data (newspaper articles) through a program(any language)?
I need to write a program that gets articles from websites (news websites mainly) according to some keywords on a daily basis…for example if my keyword is "global warming" the program would get me all the articles from the website(s) i specified that mentioned "global warming"…
is it applicable?!?! if yes, how can I do this (I prefer Java or VB.Net)
if not, what's the alternative?!
thanks in advance!
tzumpy i thought of rss feeds….but i don't want full feeds! are there readers that can get me articles depending on keywords?!
Can you? Yes. The easiest way is what is called a shell script in Unix/Linux and batch file programming in Windows. All you have to do is divide your program into tasks, specifiy paths as well as web addresses ("MyNewspaper.com/Technology" rather than "MyNewspaper.com") and first and foremost, make sure you have programs which will accomplish the specific tasks. In fact, on Linux I could do a file which would wget pages from specific directories, scan them for the key words, and delete those that didn't contain the key words.
While it takes up a lot of disk space the reason I haven't done it is because I don't have any need for it despite being a news junkie. I know batch programming has become less accessible in recent iterations of Windows, but while I do have issues with that OS — that isn't one of them. For something like this, I'm reasonably confident you could do it with only one or two non-microsoft tools.
Should you? Again, I wouldn't. It's certainly for you to decide but I wouldn't spend too much time on it. A shell script or batch file has another advantage: it's easier to get a job processor like *nix's cron or whatever Windows calls theirs to execute it.
There is, of course a final point. Newspapers are businesses. They make their money by selling ads, and try to discourage simple downloading of their content, the way you can download something from SimTel.
With thought you can work around that. but I'd rather encourage you to use the online tools out there and take the time to think about what you are reading because sometimes normal access can provide you with a context you miss when text just suddenly appears.
Suppose you're convinced reporter X is a liar, and you're two or three articles into a multi-part series before you realize reporter X is the author?



RSS readers do that for you
References :
no you cant i have tried. and it sounds dumb too.
References :
Can you? Yes. The easiest way is what is called a shell script in Unix/Linux and batch file programming in Windows. All you have to do is divide your program into tasks, specifiy paths as well as web addresses ("MyNewspaper.com/Technology" rather than "MyNewspaper.com") and first and foremost, make sure you have programs which will accomplish the specific tasks. In fact, on Linux I could do a file which would wget pages from specific directories, scan them for the key words, and delete those that didn't contain the key words.
While it takes up a lot of disk space the reason I haven't done it is because I don't have any need for it despite being a news junkie. I know batch programming has become less accessible in recent iterations of Windows, but while I do have issues with that OS — that isn't one of them. For something like this, I'm reasonably confident you could do it with only one or two non-microsoft tools.
Should you? Again, I wouldn't. It's certainly for you to decide but I wouldn't spend too much time on it. A shell script or batch file has another advantage: it's easier to get a job processor like *nix's cron or whatever Windows calls theirs to execute it.
There is, of course a final point. Newspapers are businesses. They make their money by selling ads, and try to discourage simple downloading of their content, the way you can download something from SimTel.
With thought you can work around that. but I'd rather encourage you to use the online tools out there and take the time to think about what you are reading because sometimes normal access can provide you with a context you miss when text just suddenly appears.
Suppose you're convinced reporter X is a liar, and you're two or three articles into a multi-part series before you realize reporter X is the author?
References :
http://en.wikipedia.org/wiki/Batch_programming
http://en.wikipedia.org/wiki/Batch_file
http://en.wikipedia.org/wiki/Shell_script
Add A Comment