I was always amazed by how popular usernamecheck was when we launched it. First rule of internet success, can you list the barriers to entry. What will stop someone else creating what you’ve already created and making it faster, better or shinier. Many people seem to rely on “I was first” mentality, something which I thought never made business sense and didn’t seem to be backed up by historical evidence (see friendster getting eaten by facebook for an example). The thing that really amazed me was that over the 4 month period that we were getting 20-30,000 visitors a day, no one created a clone. We had a couple of sites that tried to harvest the sites functionality from their own UI for adsense gain, but it would probably have been easier to create the site from scratch and not risk getting cut off at the neck by me when I found out about your piggy backing and locked you out.
Now that I’ve taken the site down, I thought I’d share how the site worked, and maybe others might want to make clones or improve the system. Hopefully this is for the greater good. So here we go behind the curtain.
Usernamecheck as super simple and basically made up of 3 parts.
- The UI – I won’t talk much about that, it’s a Dojo driven Ajax using HTML front end
- Curl – it’s a way of getting pages from one server to another server whilst pretending to be a browser
- List of sites and unique content on certain pages
The UI
We made the UI ajax driven, because each check takes between 2 and 7 seconds and there are 60 of them, so running then all on page load and waiting for the response would have left the user with minute+ long page loads, which would have resulted in people leaving. Ajax gave us the opportunity to show people we were doing something and update them as we progressed. We used Dojo for the Javascript library, because we use it extensively at work, and I personally love it for rapid prototyping. We could have just as easily used jquery, prototype or something else.
Curl and the Sites
The back end used CakePHP, for no other reason that it’s my PHP rapid prototype server tool of choice, the site barely needs a library like this, but when I started I didn’t know what the site would be so I started building on that library. We wrote vanilla curl, not using any libraries. The idea was to have a list of URL’s for websites, we knew that these urls contained a username string…
- http://myspace.com/username
- http://username.jaiku.com/
- http://digg.com/users/username
- http://ma.gnolia.com/people/username
- http://username.stumbleupon.com/
- http://www.youtube.com/user/username
- http://www.virb.com/username
… we manually went to everyone of these sites and we looked at what was returned to the browser by the server when we had a username that we already registered and what was returned when we had one that wasn’t registered (i.e. a “that user is not found” page), we looked for things that were unique to the “not found” page, so our test was for negative. Once we’d done that we had a list that looked like this…
- Invalid Friend ID.
- <title>Jaiku | Your Conversation</title>
- <title>Digg – Digg / Error</title>
- Sorry, but we can’t find that person.
- <span class=bold>No such username</span>
- not found!
- <title>VIRB° – Page Not Found</title>
We did that for all 60+ sites. It meant we were left with information stored in our database that if we use curl to fetch the HTML returned by requesting “http://myspace.com/username” if the returned content contains the text string “Invalid Friend ID.” then we know that username has not been registered, if it doesn’t we know it has been registered. You can do that string check very easily using something like regex, we used a native PHP function stristr (I think I was trying to reduce server load).
I saved all the URL’s for the social networking sites in the format http://socialnetwork.com/<username> and then did a string_replace against <username> with whatever the user had typed into the form field.
This was why we couldn’t add a lot of sites that people asked us about, since they either didn’t use the username in the url or they returned content that we couldn’t differentiate available from taken. It’s also the reason the site was a bit of a nightmare to maintain, if myspace pushes an update and suddenly their “not found” string is “No Friend ID” usernamecheck will start throwing false negatives. It’s also why people were often told usernames they couldn’t actually have were available, since we didn’t actually check availability we just checked that that username when used in the url threw an error message.
One thing that I think if this was done correctly again (by someone like google – if anyone from google wants to add this as a “username:someName” type of advanced search like “link:usernamecheck” just drop me an email) would be the use of API’s. I’d started reaching out to social networks about using their apis’ to reduce the amount of data I needed to request, and thus speed things up and reduce server load, I’d say we’d got maybe 40% of the sites we checked working on this system and it was much easier. Some I cheated, like the gmail one (which never worked well), I just hijacked the url that google uses from it’s Ajax username availability check when you try to register….
https://www.google.com/accounts/CheckAvailability?service=mail&continue= http://www.google.com&
Email=<username>&FirstName=&LastName=&formId=createaccount&inputId=Email
… I’d like to take a second to thank everyone of those social networks who offered or coded usernamecheck API’s for us to use.
One of the things I considered early on was pushing and evangelizing about the idea of a common usernamecheck api system. Where all social networks could share a common system, and the checking could be decentralized and more seamless. If I sign up at twitter when I type in my username it could tell me that that name is already registered in 4 other locations, and am I sure I want to take that name if those locations aren’t me.
Time for some code examples
This is the function we used in the Cake system, that actually did the usernamechecks:
function sitecheck($site = null, $username = null) {
$this->layout = 'ajax'; // turn off the layout
if(!$username || !$site):
exit;
endif;
$this->data = array
(
'Check' => array
(
'username' => $username
)
);
$this->Sites = $this->Site->findByName($site);
$this->resultdata = array
(
'Result' => array
(
'check_id' => $userId,
'site_id' => $this->Sites['Site']['id']
)
);
if($username):
$userAgent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1";
$url = $this->Sites['Site']['url'];
$url = str_replace("<username>", urlencode($username), $url);
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_USERAGENT, $userAgent);
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt($curl, CURLOPT_TIMEOUT, 20 );
$html = curl_exec( $curl );
if (!$html) {
$this->set('outcome', "- <a target='blank' href='".$url."' class='error'>".curl_error($curl).", sorry !</a>");
$this->resultdata['Result']['outcome'] = 3;
}else{
if(stristr($html, $this->Sites['Site']['test'])){
$this->set('outcome', "- <a target='blank' href='".$this->Sites['Site']['register_url']."' class='available'>available</a>");
$this->resultdata['Result']['outcome'] = 1;
}else{
$availableURL = str_replace("<username>", urlencode($username), $this->Sites['Site']['user_url']);
$this->set('outcome', "- <a target='blank' href='".$availableURL."' class='taken'>taken</a>");
$this->resultdata['Result']['outcome'] = 2;
}
}
curl_close( $curl );
else:
// do nothing
endif;
}
The above code is pretty much all you need to replicate the usernamecheck functionality. You will of course need to hook it up to a backend – it could be a database, or even just a hardcoded array in the code. You’ll need to hook up the front end with the backend, make sure no one can hack into your system, make sure no one can leech off your system, ensure that if you store any usernames they are secure, etc. But at the end of the day, the guts of the system, is outlined in the code above.
Update – Some SQL for you
I’m including a link to a small sql file that contains all of the site check urls and test strings we used on the site, some of them no longer work (like pownce, since it died) so you’ll need to QA them first. But hopefully it will help you all out. Some are the super secret API checks we had in place
Why am I sharing this code?
I’m removing a barrier to entry, I would like nothing more than there to be 20 clones on username check all fighting for doing usernamechecks the best way, or carving out a niche in a specific market segment.
One idea that I’ve considered and I’ve been emailed about is making a desktop app. This is a great idea, and probably the future of something like usernamecheck. It would give you decentralized checking and it’s the checking that is the tough bit, the server intensive bit, the bit that gives you big hosting bills. An app written in Air or something would be able to do all the fetch and compare locally with no server overhead at all.
I hope this has all been of interest to at least a few people, if I have time I’ll try and get a downloadable plug and play demo page, that will actually demonstrate a check in effect. But anyone with even rudimentary web development skills should be able to replicate usernamecheck pretty quickly. If you do, drop me an email and I’ll even link to you from this site. Thanks for all the interest, have a great day !! Jon Sykes