A little bit ago, I wrote a piece about how a new start-up, Bit.ly, was ignoring the wishes of web content producers by creating cached copies of pages that are explicitly marked (by those same content producers) with headers directing that they not be cached. So here we are, three weeks later, and it crossed my mind that maybe Bit.ly had fixed the problem… and disappointingly, they appear to still not give a flying crap. (That’s their cached version of this page, a page that couldn’t make itself any clearer that it’s not to be cached.)

I hate to push this to the next level, but is it time to drop Amazon a DMCA notice saying that the page is copyrighted (as all works are, once they’re fixed in a tangible medium) and is being hosted on Amazon’s network?

(And one other thing: how annoying is it that when Bit.ly’s caching engine makes its page requests, it doesn’t send any user agent string, so it’s literally impossible for a website owner to identify the Bit.ly bot programmatically? They appear to be running the caching engine off of an Amazon EC2 instance, as well, so there’s not even a way to watch for a known IP address — it’ll change as they move around the EC2 cloud. Nevermind pissing in the pool; the Bit.ly folks are out-and-out taking a dump in the pool.)


There’s been quite a bit of press today about bit.ly, a new service from the folks at switchAbit; it’s a service that adds page caching, click-through counting, and a bunch of semantic data analysis atop a URL-shortening service that’s very much like TinyURL and others (and others!). Reading the unveiling announcement, the part that interested me most was the page caching part — they bill it as a service to help prevent link rot (i.e., when a page you’ve linked to or bookmarked then goes away), which would be a great service to those folks who rely on linked content remaining available. (And since they store their cached content on Amazon’s S3 network, robustness and uptime should be great as well.)

That being said, having worked with (and on) a bunch of caching services in the past, I also know that caching is a feature that many developers implement haphazardly, and in a way that isn’t exactly adherent to either specs or the wishes of the page authors. So I set out to test how bit.ly handles page caching, and I can report here that the service does a great job of caching the text of pages, a bad job of caching the non-text contents of pages, and a disappointingly abhorrent job of respecting the wishes of web authors who ask for services like this to not cache their pages.

Living in the nation’s capital, using Google to search for service providers is nigh-impossible, rather than getting companies and their websites, most of the search returns are lobbying groups, regulatory agencies, and sites documenting the laws associated with whatever service it is I’m looking into. It’s sort of maddening.

It’s amazing to me that CNN.com is being redesigned and still defaults the search engine to perform a general web search rather than a search of CNN’s own news content. (The current CNN site has been this way since the beginning of 2003.) Are the folks at CNN’s interactive bureau really under the impression that there are people who go to their site in order to search the web? Do they think that nobody wants to search their own content? It’s just weird.

Oh, hallelujah — Google has finally released a version of Google Calendar that works on mobile phones! Shannon and I rely a lot on the online calendaring app, using it for everything from our own schedules to posting community and fundraising events for our Save Eastern Market website, and by far the #1 complaint I’ve had is that mobile access to the calendar has been damn near impossible to date. Now, though, I can both see my calendar and add events to it via the “Quick Add” functionality… looks like I’ll have to familiarize myself with the Quick Add syntax to get the most out of it. (And it’s sort of a bummer that there’s no ability to add an event to anything but the default calendar.)

Today, I figured I’d do a one-month check-in on the fact that Google Maps is lost when it comes to mapping Washington, DC, and the verdict is: still totally, completely horked! It’s horked in a different way now, though; the link from my original post works, but other ones don’t work worth a damn at all. (And while neither of those is a link to our house’s address, our house is one of the addresses that’s unmappable… meaning that all the various bookmarks for directions we’ve sent people over the past year still don’t work at all.) There’ve been no further replies from the folks at Google, either; Matt Cutts replied to that prior post of mine in the comments and followed up with me by email a few days later, but he now appears to be going on a one-month work hiatus and doesn’t look to be receiving email.

I seriously can’t believe that the folks at Google don’t care about the bug in their address parsing routines, but the truth appears to be evident in the fact that they remain broken.

Update: I just got an email reply from Matt Cutts (too quickly for it to be due to this post!), and in working through some examples, it looks like the breakage might be specific only to the various C Streets in Washington, DC — addresses on C Street SE (and the other four quadrants) don’t work, and addresses on all other one-letter streets appear to map fine. He’s going to bug the mapping folks again, so we’ll see what happens!

If you live or spend any amount of time in Washington, DC, you might have noticed a problem recently: Google Maps essentially no longer works here. Sometime in mid-February, it appears that the folks behind the previously-amazing mapping service updated the address parser that it uses, and at this point the parser doesn’t have any clue how to understand the one-letter streets and quadrant system that’s used throughout the District of Columbia.

Take this map link, which is supposed to show 500 E Street SE (the address of our local police station). You don’t have to be eagle-eyed to see that that’s not the address the map shows; here’s a MapQuest view of the distance between Google’s mapped location and the true one, nearly five miles away. Try to use Google Maps to locate any address on a lettered street in the District, and you’ll get the same result.

I’ve avoided posting about this for a little bit in the hope that Google would get around to fixing it… but there are a half-dozen posts or threads in the Maps troubleshooting group, dating as far back as the last week in February, that have gone completely unanswered by Google. Similarly, I’ve personally had email correspondence with “The Google Team” which reassures me that “they’re aware of the issue” but neglects to mention anything about whether they care about the issue, despite me pressing the question and getting a similarly cookie-cutter reply. Since our house is on an essentially-unmappable street, none of the map links I’ve sent people over the past year work anymore, and Shannon and I have pretty much stopped using Google Maps for any of our regular direction-finding for trips out and about on weekends.

I know Google is a huge company now, and that it’s hard for them to reply to the concerns of individual users, but when a change they made causes one of their larger products to stop working entirely in a reasonably large and well-traveled city, you’d think that they’d get hop onto fixing that. So far as I can tell, though, you’d be thinking wrong.

Wow — how long do you think it’ll take the folks at Google to realize that their custom Valentine’s Day logo is missing the letter “l”?

Google Missing L


Dear Yahoo:

As requested, this week I decided to merge my Flickr old-skool login with my Yahoo account. The process was painless and trivial to do, as advertised, and despite the massive how-dare-you-make-us-merge freakout that’s been flowing across the web, no part of my soul died in the process.

Once I was back in the folds of my Yahoo account, I decided to check my email and found that my account had been deactivated due to disuse. (This is not too surprising, seeing as how that account became a spam vacuum within moments of me opening it however long ago I did so.) What was odd to me was the way in which you offered me the various reactivation options — you did so without warning me in any way, shape, or form that one of the options costs money, and you provided me with no links to pages which might help me discover this fact. In many ways, this felt purposeful, as if you might want people to be lacking this bit of information while making what otherwise would be an obvious choice.

Yahoo's deceptive reactivation options

(Wily as I am, I managed to defeat your Jedi mind tricks by opening another browser tab and using Google to search for the truth before making my choice. And yes, the use of Google rather than your own search engine was purposeful; after all, I figured that not providing the information right there in the context of asking me to make the choice was a clear indication that Yahoo might not have the information to begin with, and thus it was unlikely to show up in your own search engine.) And therein I learned that opting for the first of the two choices would cost me 20 smackeroos, a fact that definitely shifted the balance a bit.

So I guess my point in all this is: while I was certainly glad to give you all the benefit of the doubt on the whole Flickr account merge issue, it didn’t help when you betrayed that trust by trying to trick me into a premium email service by withholding information at the precise moment I would need it in order to make an informed choice. You were this close to having a customer who was solidly baffled by the group of folks who question their ability to trust Yahoo with their Flickr accounts; instead, you managed to make me question whether it’s reasonable to trust you as a company. If you notice me keeping you at arm’s length for the next little while, even as you release cool new services I’m sure I’d love to play with, I hope you understand…


Interesting: Google Apps for Your Domain. Veeerrrrrryyyy interesting. It’s hosted Gmail, chat, calendering, and web design all under the banner of your own domain name, all currently in beta-test mode. Unsurprisingly, Anil does a better job of reviewing the landscape than I’d ever be able to do.

Holy crap: Amazon is now doing groceries! It’s (obviously) limited to non-perishable items, but everything’s eligible for Amazon Prime (and Super Saver) shipping, and they seem to have a pretty good selection. Shannon and I have been devout Peapod users here in Boston, but we’ll have to change with our move to DC, so it’s a nice option for us. Key will be for Amazon to get the interface right — Peapod allows you to assemble an order using prior orders as templates, has a nice interface for adding things to your cart, and does a good job of showing you options when you’re just browsing. Right now, Amazon’s using its standard ordering interface, which probably will get in the way if we become regular users, but we’ll see.

There’s been the tiniest bit of preview press given to Sphere, which bills itself as a weblog search engine and has been in soft-launch mode for a little while now. Today, the service actually went live, so I figured a little exploration might be in order. Alas, after spending a little time with it, I concluded that the folks in charge of Sphere might want to change its billing to reflect that it’s more a splog search engine — the sheer number of spam weblogs in the search returns is pretty amazing. That, combined with Sphere’s apparent indexing of quite a few non-weblogs, makes its usefulness dwindle quite a bit.

Here are a few example searches, looking at the first page of ten hits that Sphere returns:

  • razr v3c”: returns five spam weblogs, two questionable spam weblogs, one overt non-weblog, and two legitimate sites.
  • honda accord”: three spam weblogs, one non-weblog, six legitimate sites.
  • bluetooth headset”: four spam weblogs, three legitimate sites.
  • dual core intel”: three spam weblogs, one questionable spam weblog, six legitimate sites.

I don’t claim for these results to be rigorously scientific, only representative of the experience that’s led me to relegate Sphere to the bin of sites that seem to have gone live without addressing all the issues inherent in their areas of focus, and as such, aren’t really all that useful.

I noticed a bit of weirdness in the Google index today, weirdness that I don’t really understand.

For a few years, if you searched Google for the phrase “incredible day,” (or Yahoo!, or MSN), I was the top hit, linked to the bit I wrote about going on a visiting nurse service call and getting to be a part of saving an infant’s life back during my pediatrics internship. This always amused me (if only because I’ve never considered myself the caliber of weblogger to merit a top-Google-hit on pretty much anything); from time to time, I’ve checked in to see if the search results have changed, and they’ve stayed pretty consistent. Sometime over the past few weeks, though, Google’s results did change — I’m still the top hit, but now Google oddly links the phrase to my page about copyright infringement complaints. I’m comfortable admitting that I don’t know a ton about how the search engine giant builds its indices, but from what I do know, I can’t figure out what caused this difference.

Alas, I can only fix that which I can control. So for the mean time, I’ve changed the script on my site that associates phrase-based URLs (like “copyrightInfringement”) with actual posts, and now it looks for page requests referred by Google searches for “incredible day” and does the right thing with them. At this point, I don’t have to do anything to handle the same searches over at Yahoo! or MSN — they’re still pointing to the right place.

Fun fun fun with web scripting…

Update: Between the time I posted this entry and now, Google’s index has now moved an Ariana Huffington post above mine on the page of “incredible day” results. Interesting!

Wow — I never realized that Google uses different sources for the maps it displays on its own Google Maps app and the ones it serves to developers who use the Google Maps API.

Looking around for a Google Maps mashup of the 2006 Boston Marathon route, I found this little application over at Running Ahead. I meandered my way to the intersection at which I usually plant myself to watch the runners go by, but I realized that the Running Ahead application didn’t allow me to bookmark that specific view so I headed over to the normal Google Maps site to find the intersection and bookmark it. Imagine my surprise, then, when I zoomed in and noticed that there was an entire street missing — Danforth Road was present on the Running Ahead map, but just wasn’t there on the map being served up on the “regular” Google Maps site. Looking more closely at the two maps, I noticed that the one at Running Ahead had a copyright notice for Tele Atlas, and the one at Google Maps was copyright NAVTEQ; noticing that Danforth Road is also missing on the NAVTEQ-based Yahoo Maps beta, I figure the source of the map data is what explains the difference.

Apparently, the differences aren’t just limited to the roads on the maps; the NAVTEQ sources provide satellite imagery at a higher resolution than Tele Atlas, meaning that Google Maps can zoom in closer than anything that’s user-generated via the API. I’m sure there are other differences, as well; now, I know enough to pay attention and find them!

In a combination of what seems to be a weird quirk and a slew of not-so-bright internet users, MSN’s most recent search engine update has brought a little bit of unanticipated fun to my email inbox.

About two weeks ago, I noticed that I was starting to get quite a few odd emails sent my way via my send-me-an-email webpage. I couldn’t really find a common thread running through the emails; the topics were diverse, the people sending the emails were spread all over the map, and none of it looked very spam-like. (Oddly, I do get the occasional spam manually submitted through that webpage, something that always both confuses and amuses me.) I figured that the page had ended up linked somewhere and that things would die down, but the frequency of the emails just accelerated over the past week. Finally, I modified the script that runs behind that webpage so that it passes along to me the webpage that referred the sender to my contact form, and learned something interesting: all of the senders were coming from an MSN Search result page for the phrase “send mail”. Going to that page, I see that my contact form is the sixth hit, and is the first hit with a title that might imply that it’s a generic email interface. Apparently, users of MSN’s search engine are following those clues, clicking through to my contact form, and sending their emails straight to my inbox. Today alone, I’ve received two resumes, best wishes and prayers on my upcoming exams, and an attempt to submit a late sociology assignment, and the day’s only half-over.

What’s totally baffling to me is why the senders don’t notice that they’re never asked for the recipient of their email, and in the case of that last sender today, how the obvious lack of a facility to upload documents failed to warn him that his “attached” late assignment wasn’t really going to work out so well. I started figuring out how I should modify the page to convey to viewers that it’s only a way to contact me, but then decided that I’m having way too much fun seeing all these emails pop in. We’ll see how long I keep getting misdirected notes, and how long the page stays in that first tier of MSN Search hits.

Holy crap, is the new Yahoo! Maps beta sweet. It looks quite a bit like Google Maps — which makes sense, since they both get their data from the same vendor — but it has a few features that go well beyond Google’s offering. My favorite is the interactive zooming and positioning tool that floats above the upper righthand corner of the map, which reminds me of a hopped-up-on-steroids version of the Navigator window in Photoshop and makes it a cinch to drill down to specific areas of a city you’re viewing. There’s also a checkbox for overlaying local traffic conditions onto the map, and the local search function is implemented incredibly well, with reasonably unintrusive tags overlying the map that can be opened and closed for a few different levels of information detail. The web programmer geek in me also loves that it’s an Ajax application implemented with full back-button and bookmarking support, and that there’s what appears to be a well-documented API available for people to use in integrating maps into their own websites. The only issues I can see are that there’s no satellite view (which Google Maps has made me come to rely on), and the “Printable Version” link throws a Javascript error for me in Firefox.

Update: I appear to have been a bit quick to assume; the new Yahoo! Maps beta is a Flash application, not an Ajax one, which makes a little more sense to this web geek. Note that this doesn’t make me have less respect for its designers, but rather explains how they were able to integrate all the cool features into one display interface… it’s still nicely done.

It looks like Google has finally launched a weblog-specific search engine, a move that I’d imagine is reasonably sure to doom Technorati and it’s smaller cousin, Daypop. (It also doesn’t help Technorati compete when its service has become unreliable enough to inspire both annoyed rants and sites of outright mockery.) It’s not like people didn’t see this coming; for a while now, Google has done a fairly good job of quickly indexing weblogs and liberally returning hits to the sites in its search results, and a specialized search site is the logical extension to that. The new search engine sits behind the blogsearch.google.com address, the search.blogger.com address, and the navigation bar at the top of every Blogspot website, and it looks like it sustains itself on a steady diet of sites that alert one of the weblog update notification services (although the folks at Google don’t share a comprehensive list of the services that are used). As with all things new from Google, it’s labeled “beta”, so I imagine we’ll see a bunch of improvements in the coming weeks and months.

Interesting — it looks like a few inquisitive folks have figured out a way to get onto Google’s soon-to-be-announced talk service using any ol’ Jabber instant messaging client. I’ve been hanging out online for a little bit now using Adium (others are doing just fine using iChat and Trillian), and shockingly, it’s just like instant messaging! This isn’t entirely fair, though, since Google is also developing a dedicated client that’s rumored to place audio and video chat on par with text instant messaging; it’ll be interesting to see what they use for the audio and video components.

(Note that if you follow the instructions and hop onto Google’s IM servers, you might find that your connection occasionally dies; I’m assuming that they’re readying the service for tomorrow’s rumored release. Just so you know!)

I got a happy email today from the people behind my new favorite magazine, Make, saying that the magazine is now available online in its entirety! It’s only for current subscribers, but the site provides high-quality proofs of every page of content, proofs that can be viewed, printed, and (cooler still) shared with non-subscribers via email. The archives are also available and searchable, which is a cool bonus. Just when I thought the magazine couldn’t get better, it did!

google moon

In honor of the 36th anniversary of the landing of Apollo 11 on the moon, Google has created Google Moon, an extension of the Google Maps interface to allow exploration of the surface of the moon. Unfortunately, it includes only the region in which the manned Apollo missions landed, rather than the entire visible hemisphere of the moon; apparently, NASA only gave them high-resolution images of that region, so they couldn’t provide anything beyond that. (Though according to Larry Schwimmer, the engineer for Google who is behind the moon project, we might just get to see more coverage sometime!) And while the resolution is certainly not good enough to see the flag we left behind at Tranquility Base or Edwin Aldrin’s famous bootprint in the dust, Google engineers appear to have tweaked the detail sufficiently to show what the moon is really made of!

It’s a testament to how well Shannon knows me that she sent me this link to a post on Defense Tech about cool Area 51-related satellite images in Google Maps. This is my favorite, mostly because it’s fun to think about the work that went into making it…

For even more fun with Google’s new satellite imagery, take a look at this MetaFilter thread — people have posted links to a slew of awesome pictures. Some of my favorites: Death Valley, CA; the Grand Canyon; Niagra Falls; the Air Force boneyard in Tuscon, AZ; Black Rock City, NV (where Burning Man is held).


James Bennett published a nice op-ed piece over at Kuro5hin on Friday that tries to get to the bottom of the Google Toolbar shitstorm, and concludes that the large push against it has nothing to do with copyrights, derivative works, or content publishers’ rights, and instead is all about fear and loathing of Google. It’s worth the read, as is the resultant comment thread.

It’s so much fun watching someone like Cory Doctorow completely dissassemble the Google Toolbar nonsense. In a post on Saturday, he outlined the reasons why services that let users decide how to display content are the very reason for the innovation that’s driven the web since the day it appeared; yesterday, he followed that up with a glimpse of other projects out there that provide similar services for users using Google’s own data. And then this morning brought the latest salvo, a two-fer that included a good real-world analogy and a smackdown to a pathetic attempt to raise Cory’s ire.

Every time that the total hacks of the weblog world start to annoy me by simultaneously attempting to rule the terms of debate and ignoring the incredibly nuanced debate that’s occurring all around them (including in their very own comment threads!), it’s refresing to see that people like Cory step into the breach and provide a voice of reason.

Whoa — Google Maps. When did this appear? (Oh, it looks like it’s new as of today!) The implementation looks awesome (as you’d expect), right down to the three-dimensional shadow beneath the locators. I’ll definitely have to play with this a bit more later today! (Update: Rafe says everything I’ve thought of so far, and then some, so all I’ll add is that I’m equally impressed.)

After Anil won the staying-power portion of the first search engine optimization contest (the “nigritude ultramarine” one), a followup contest was announced, this time for the phrase “seraphim proudleduck” and for a considerably higher-value award (£1000, or over $1900 with the current exchange rate). Pairing the second contest’s surprisingly lucrative prize with horrid, make-your-eyes-bleed website (what is that, teddy bear porn in the lower righthand corner?!?), a few people surmised early on that this was all a scam — and unsurprisingly, it appears that it was.

(Hilariously, while the first contest was not a hoax, the website set up by the the people running it appears to have been hacked; I wonder how long it’s been like this!)

Clive Thompson’s Wired article on Bram Cohen and BitTorrent is a good read, explaining a bit about how the technology works (pretty simple, but completely unknown to me before reading the article) and a bit more about how Cohen himself works.

I tend to shy away from the filesharing networks, but it’s been pretty much impossible to avoid noticing the excitement that BitTorrent has generated over the past year. Now that I understand the technology a bit better, it’s clear why it is so exciting — BitTorrent solves a problem that has plagued filesharing, that of asymmetric bandwidth (or the fact that most people’s internet connections allow them to download much faster than they can upload). One problem that BitTorrent did not deal with, though, has turned out to be its Achilles heel: the need for centralized sites which help organize the network, sites like SuprNova that are ripe for the picking by organizations that are threatened by (mostly) decentralized content distribution networks. The next generation of these networks looks to eliminate the need for any centralization whatsoever; the next version of SuprNova (eXeem) is rumored to do exactly this. It will be interesting to see how the technology works, and whether it will still remain reliable and reasonably easy to use.

Looks like Google’s making a play for dominance in searching printed text, as well — adding to its Google Print service, the company is piloting a program that will digitize a small chunk of the library holdings at Harvard, Oxford, Stanford, Michigan, and the New York Public Library. It looks like the pilot will just include public-domain texts for now, but there’s a glimmer of information in the Harvard Library FAQ that the at some point, the index may also include copyrighted works. Interesting!