Monthly Archives: May 2003

Unexpected Opinions

To me, the most valuable opinion in a debate is often the unexpected one. I tend to post those here whenever I see them. Well, Ted Turner has an editorial in yesterday’s Washington Post condemning the FCC’s proposed rule change. Highly recommended. [via IP]

Bayesian Filtering

I installed SpamBayes a week or so ago, and I have started using it to filter all of my mail. (Before, I was using SpamAssassin on one account and nothing on the other.) So far, I am quite impressed with its accuracy and effectiveness. It is doing a much better job than SpamAssassin, but we didn’t have the newest version of SpamAssassin installed (the newest version includes Bayesian filtering along with the other rules).

This puts me in something of a dilemma, as I am supposed to be one of the developers working on a SpamAssassin plugin for Outlook. On the plus side, SpamBayes gives me lots of good ideas for the interface. On the minus side, though, it takes away some of my incentives to even work on SA for Outlook since SB works so well already.

This success with SpamBayes got me thinking. I’ve been using a news aggregator to read weblogs for a while. It’s almost replaced my other random web browsing, in fact. Unfortunately, my number of feeds is quite high and becomes particularly overwhelming when I travel. (Compared to some other people who have mentioned the number of feeds they read, my absolute number of feeds is quite low. Relative to the amount of time I have to surf the web, though, it’s pretty high.)

I’d really like a newsreader that could learn my preferences. What I want it to do, really, is to seperate the wheat from the chaff. It would learn, for instance, that I always read everything on Medley and Screenshot, largely because I am aquainted with the authors in meatspace and respect their opinions. I also never miss a chance to read Whatever, because the writing there is so fresh. At the other end of the spectrum, though, are high volume feeds that I tend to troll very quickly for the occasional good link, and that’s where I really want help. What I want is for the classifier to recognize words that commonly appear in stories of interest to me and to score those stories by relevance to me.

So, since I really want the filter to rank stories by relevance, so that I can start with the most interesting stories and then work my way down, I guess I’m really looking at something more sophisticated than a Bayesian classifier. Still, I think that the general approach to learning preferences is a good one. So, let’s say that it asks me to rank each story from 1 to 5. Based on those rankings, it starts to learn and score stories itself. After reading a story, though, I can still manually reassign the story a different score.

What would be even cooler is if it then could communicate with some sort of web based service that implemented an Amazon-like feature with weblogs. “If you liked these stories, then you might also like this weblog.”

Cool stuff. I’ll try to work out some of the theory in the next couple of weeks, but I probably don’t have the time or inclination to implement it myself. Any interest out there?

Replace the Email Infrastructure

I am deeply suspcious of proposals to completely destroy an existing structure and replace it with something new. It’s a process that is riddled with perils and usually unlikely to succeed. For instance, we haven’t destroyed our automobiles and highways and replaced them with high speed, automatic, personal transportation systems where we climb into pods, punch in our destination, and get routed through a transportation grid at high speeds. Much of the technology exists, or could be created, but we have too much invested in the current infrastructure to throw it out. So, we have to do the much more difficult task of evolving our current system into a better one…

That having been said, here is a proposal to replace the current Email system (based on the Simple Mail Transfer Protocol (SMTP)) with a completely new system. There are lots of unanswered questions here, but a number of experts seem to be in favor of replacing SMTP. It’s definitely broken (spam, lack of a reliable authentication mechanism, lack of encryption), but I’m not sure this is the way to fix it.

Image

I believe that image is important in its own right, and that substance often doesn’t work unless it’s well presented. Nevertheless, the Bush White House seems to be taking it a bit far. It’s like they’ve turned the whole world into a giant made-for-television drama.