lovingboth: (Default)
Ian ([personal profile] lovingboth) wrote2009-12-12 09:44 pm

lazyweb: filtering rss feeds

There is an rss feed I am interested in which has a high level of crap I am not interested in on it. I would like to avoid having to look at the crap.

Does anyone use popfile (email / newsgroup Bayes classifier, usually used to detect spam) and nttp//rss to do this? In theory, the latter should take the rss feed and present it as a newsgroup, for the former to look at and classify according to the 'buckets' I set up. As you may guess :) it's not working for me at the moment...

Are there any other ways to do it?

Edit: Oh, yes, I know about Sux0r.org, but public implementations of that want to concentrate on 'more academic' feeds than the one I am looking at. I might try their software though... Commercial services exist (FeedZero and FeedScrub) but I want, as ever, to do this for free.

Edit2: Ah, it looks like you can at least try the commercial services. Let's see how they do. Annoyingly, both only categorise things into 'yes/no' rather than an arbitrary number of categories (I'd like at least four...)

Edit3: It looks like FeedScrub cannot cope with long URLs for feeds. In a busy feed with lots of crap - exactly the sort of feed you need this for - it also has a habit of a) not looking at it often enough (I cannot find a setting for how often to check it) and b) deciding everything in it is crap, and leaving you to look through the 'I'm not going to show you these, they are crap' feed. Which, if you do it via their website, only shows you about ten items (I cannot find a 'ok, show me more' button). And then when you try to train it via this, asks for your password regardless of whether you are logged in, and then doesn't apparently do anything.

Let's see how easy it is to install Sux0r...

[identity profile] drdoug.livejournal.com 2009-12-12 09:58 pm (UTC)(link)
Yahoo Pipes is probably what you want for any plumbing/manipulation of RSS feeds. Reportedly a little unresponsive on occasion of late, tho'.

[identity profile] topbit.livejournal.com 2009-12-12 09:59 pm (UTC)(link)
Minds thinking alike, etc.

Pipes have recently been banned from Craigslist for example - they are a little heavy on the site apparently, as they can fetch the feeds too often.

[identity profile] topbit.livejournal.com 2009-12-12 09:58 pm (UTC)(link)
I've not run my own bayes filters for some time, and not on an RSS feed. The first thought I had on that for something that could do it is to wire it up through http://pipes.yahoo.com/ though. You can add various regexes and other filters to remove what you don't want to see.