“Mining of Massive Datasets”

My earlier work with Social Book Club, and current work with Kirkus Reviews, has me spending a fair amount of time exploring and developing recommendation systems. There are a variety of good books and papers on the subject, but I recently finished reading “Mining of Massive Datasets” (a free ebook that accompanies a Stanford CS course on Data Mining), and it was a surprisingly good read.

The book covers a number of topics that come up frequently in data mining: reworking algorithms into a map-reduce paradigm, finding similar items, mining streams of data, finding frequent items, clustering, and recommending items. Unlike many texts on the subject, you won’t find source-code in this book; but rather, extensive explanations of multiple techniques and algorithms to address each topic. This lends itself to a better understanding of the theory, so that you understand the trade-offs you might be making when implementing your own systems.

There are easier texts to get through, but if you’re getting started with recommendation or data-mining systems, and haven’t read this book, I’d encourage you to do so.

JavaScript on the Server, and conversations at TXJS

We’ve seen various attempts at using JavaScript on the server over the last decade. Mozilla’s Rhino (Java) engine fueled most of it. However, with the release of Google’s V8 (C++) engine (and the networking performance example set by Node.js), the conversation is gaining traction.

The motivation for a 100% JavaScript stack, per conversations at Texas JavaScript Conference (TXJS) last weekend, is the desire to use a single programming language when developing web applications, rather than the mix of technologies we use today. It’s not so much that JavaScript is the best language for application development (contrary to the JS fanboys), but since it’s what we’re stuck with on the client-side, it’s worth considering on the server-side. With a single language, business logic can be reused on the client and the server (think form validation), and you avoid bugs caused by frequent language switching (i.e., using, or forgetting semi-colons, putting commas after the last item in an array, using the wrong comment delimiter, etc.)

The wrinkle in the 100% JavaScript argument, is whether JavaScript is actually the language you want to write your back-end in. The language lacks package management standards (though CommonJS is working to change that); It lacks the standard libraries and tools that the incumbents offer (i.e., no batteries included); Maybe people who use it don’t actually know the language very well; And it suffers from the multitude of bad examples and advice freely available online.

There have been some interesting Node-based applications developed already (i.e., Hummingbird), and the JavaScript on App Engine efforts (i.e., AppEngineJS) will be interesting to watch as well. (I expect both to foster more mature development patterns for large applications written in JavaScript.) However, in the near term, the 100% JavaScript stack will likely remain as niche as the Erlang, Haskel, Lisp, etc. web frameworks (as interesting as they may be.)

The question for you (Mr./Mrs. web developer/web-savvy business person), is whether JavaScript on the back-end offers a competitive advantage. Can you execute on an idea faster/better/cheaper than your competition because of your technology stack?

“Coders at Work”

Coders at Work book cover

I finished reading “Coders at Work last night. In it, author Peter Seibel interviews 15 legendary programmers, discussing how they got started with computers, how they learned to program, how they read and debug code, etc. The interviews cover a wide range of opinions and approaches, and offers a fascinating look at “computer science” history.

The format of the book is a little unusual, in that it’s entirely interview transcripts. No analysis. No author-interpretation. Just recorded conversations. At first it’s a little surprising that one can publish a book like this; But then you get into the content and it’s wonderfully engaging. Analysis and interpretation would just get in the way of letting these folks talk. Reading direct quotes makes the content all the more exciting.

The book isn’t for everyone (obviously), but I rather enjoyed it. There’s some great stories about the history of our profession, and many topics raised that inspired additional research. (I went out and found a number of research papers referenced in the interviews, and bookmarked a lot of content for further exploration.) There’s also a fair amount on the history of different programming languages, and I have a fascination with programming languages, so it was a great fit.

A few take-away themes and ideas:

  • While programming was no easy task in the early days, at least it was possible to fully-understand the hardware and all the software running it (as opposed to modern computers.) The modern computing environment presents very different challenges to present-day programmers, especially those new to the field.
  • Even some of best use print statements.
  • Passion and enthusiasm separate good programmers from great ones.
  • In academia, you have time to think about the “best” solution, without the deadlines imposed on commercial developers.
  • There’s certainly a component of “doing great work” that requires being in the right place at the right time — sometimes it’s just a matter of getting staffed on the right project.
  • There’s some negativity towards C/C++ in here, mostly due to it’s negative impact on compiler and high-level language development. (i.e., one school of thought is that you give people a high-level language and make the compiler smart. The other is that you give people a low-level language and let them do the work. Unfortunately, humans aren’t so good at hand-writing code optimized for concurrency, but once you have a language that let’s them try, it’s hard to fund compiler research.)

Here’s a few of the quotes I highlighted while reading:

“One of the most important things for having a successful project is having people that have enough experience that they build the right thing. And barring that, if it’s something that you haven’t built before, that you don’t know how to do, then the next best thing you can do is to be flexible enough that if you build the wrong thing you can adjust.” — Peter Norvig

“…there are user-interface things where you just don’t know until you build it. You think this interaction will be great but then you show it to the user and half the users just can’t get it.” — Peter Norvig

“I get so much of a thrill bringing things to life that it doesn’t even matter if it’s wrong at first. The point is, that as soon as it comes to life it starts telling you what it is.” — Dan Ingalls

“…a complex algorithm requires complex code. And I’d much rather have a simple algorithm and simple code…” — Ken Thompson

“If you can really work hard and get some little piece of a big program to run twice as fast, then you could have gotten the whole program to run twice as fast if you had just waited a year or two.” — Ken Thompson

“if they’d have asked, ‘How did you fix the bug?’ my answer would have been, ‘I couldn’t understand the code well enough to figure out what it was doing, so I rewrote it.'” — Bernie Cosell

“You have to supplement what your job is asking you to do. If your job requires that you do a Tcl thing, just learning enough Tcl to build the interface for the job is barely adequate. The right thing is, that weekend start hacking up some Tcl things so that by Monday morning you’re pretty well versed in the mechanics of it.” — Bernie Cosell

“…computer-program source code is for people, not for computers. Computers don’t care.” — Bernie Cosell

“if you rewrite a hundred lines of code, you may well have fixed the one bug and introduced six new ones.” — Bernie Cosell

“I had two convictions, which actually served me well: that programs ought to make sense and there are very, very few inherently hard problems. Anything that looks really hard or tricky is probably more the product of the programmer not fully understanding what they needed to do” — Bernie Cosell

“You never, ever fix the bug in the place where you find it. My rule is, ‘If you knew then what you know now about the fact that this piece of code is broken, how would you have organized this piece of the routine?'” — Bernie Cosell

“Part of what I call the artistry of the computer program is how easy it is for future people to be able to change it without breaking it.” — Bernie Cosell

Recovering deleted images from a Nokia N90 (Symbian OS)

Over the holidays we had an accidental deletion of every image on one of our phones (a Nokia N90, Symbian OS device.) Mild panic was quickly replaced with a gentle pondering on the difference between what a normal person would do in this situation vs. what a geek would do. The geek process goes something like this:

Step 1: Get the memory card out of the phone as quickly as possible

Either shut the phone down and pull the card, or use the super-secret combo hidden within the profile-switching shortcut to have the phone un-mount the card.

Step 2: Obtain a USB memory card reader

I’ve needed a reason to buy one of these for a long time. Good thing I had a gift card left from the holidays. I went with a Dynex gazillion-to-one card reader, not for it’s technical superiority, but because it was the only thing the shop nearby had.

Step 3: Stick the memory card into the reader, and plug the reader into your Linux box

Mine happens to run Ubuntu at the moment, but the results will likely be similar on other distros.

Step 4: sudo apt-get install testdisk

Testdisk “was primarily designed to help recover lost data storage partitions…” and includes a utility called “PhotoRec“, which is what you want.

Step 5: Run photorec

PhotoRec is a data recovery tool designed specifically for recovering files from digital camera media. It supports a number of file-system formats, including the FAT format that Symbian OS uses on it’s memory cards. PhotoRec is a text-based, terminal application, but it does the job perfectly.

Select the mounted memory card from the list of drives (which should be easy to spot given how small memory cards are relative to modern hard drives), and send it scanning. PhotoRec can be told to look for specific file types (you want JPG’s, in this case), but by default it will look for just about any media file format that you’re likely to have on your phone. Files will be recovered and written to a local directory.

Step 6: Sigh in relief when you see your beloved cat pictures returned to you

PhotoRec isn’t going to restore the images to the memory card’s file system such that the phone can see them again, but you’ll have the pictures on your Linux box now, and can copy them back over if you choose to. The naming scheme will be different, but that’s an acceptable compromise.

Lily: Visual programming in JavaScript

I have an odd fascination with Visual Programming languages, and while I’ve gotten so far as sketching out some UI concepts and object models for a text-processing focused, web-mashing, visual programming environment, I’m a long way from having anything that works. Much to my surprise then when David Ascher dropped a link to the Lily project on his blog today. Holy cow this is sweet. Think PD or Max/MSP written in JavaScript, running in a browser, with modules for popular Web API’s and JavaScript frameworks (ex., “Amazon, Flickr, Wikipedia, Yahoo; UI modules that wrap widgets from YUI, Scriptaculous, JQuery, Google Maps….”)

Check out one of the demo’s here:

(Via: Lily: JavaScript, visual programming, fun.)

Ubuntu + Hildon UI = in-Car PC UI

Awhile back, Ubuntu announced a mobile and embedded edition of it’s popular Linux distribution. The buzz was around the possibility of Ubuntu Mobile showing up on future UMPCs. The news caught my eye, but didn’t really get my attention until the plans for Ubuntu 7.10 (Gutsy Gibbon) were announced:

“Ubuntu 7.10 will be the first Ubuntu release to offer a complete mobile and embedded edition built with the Hildon user interface components” (developed by Nokia for the Maemo platform.)

Now that’s interesting. Could it be that we’ll see Ubuntu Mobile booting on Nokia N800’s? It’s certainly a possibility — and one that could bring a larger breadth of software to Nokia’s mobile Linux tablets.

However, as interesting as it may be if Nokia adopts Ubuntu, the possibilities for wider Hildon support didn’t hit me until my drive home today. It was one of those obvious moments. I had been using my Nokia N800 while walking to my car, so the touch- and small-screen friendly UI was fresh in my mind. Then I started thinking about my Car PC. It uses a 7″ touch screen and runs Ubuntu (a full distribution, with a UI designed for full-size monitors.) Running Gnome on my cheap, in-car 7″ monitor makes for a pretty lousy experience. Text is hard to read, and everything is too small to click on. However, if this news is right, Ubuntu 7.10 will change all of that. I’ll be able to run Hildon on my Car PC! That’s killer. Imagine having Canola running in-car, sitting on 100GB of multimedia…