“Mining of Massive Datasets”

My earlier work with Social Book Club, and current work with Kirkus Reviews, has me spending a fair amount of time exploring and developing recommendation systems. There are a variety of good books and papers on the subject, but I recently finished reading “Mining of Massive Datasets” (a free ebook that accompanies a Stanford CS course on Data Mining), and it was a surprisingly good read.

The book covers a number of topics that come up frequently in data mining: reworking algorithms into a map-reduce paradigm, finding similar items, mining streams of data, finding frequent items, clustering, and recommending items. Unlike many texts on the subject, you won’t find source-code in this book; but rather, extensive explanations of multiple techniques and algorithms to address each topic. This lends itself to a better understanding of the theory, so that you understand the trade-offs you might be making when implementing your own systems.

There are easier texts to get through, but if you’re getting started with recommendation or data-mining systems, and haven’t read this book, I’d encourage you to do so.

Book: “Reflections on Management”

Reflections on Management: How to Manage Your Software Projects, Your Teams, Your Boss, and Yourself wasn’t the best written/edited book, but has some tasty bits scattered within the random acronyms. It reads like the storytellings of a retiring, experienced, software manager at a large corporation. Someone telling the inside story in a blunt, matter-of-fact approach. Personally, I like that style. It gets to the point without dancing around the subject. The only caveat with this book though, is that some of the advice is a little too specific to the author’s previous corporate environments. Still, if you’re stuck at an airport and this is what the local bookstore has, it’s not a bad choice.

A few quotes:

“Quality work is not done by mistake.”

“When developers are simultaneously assigned to several projects, they have split loyalties and their teammates cannot rely on them for support and assistance.”

“It is hard for someone to feel committed to a project when management is unwilling to make it their principal job.”

“Discipline, in fact, is what separates the experts from the amateurs in any professional field.”

“The team leader must motivate, coach, drive, and urge the members to perform to the best of their abilities.”

“If you don’t change the engineers’ working practices, you can change the organizational structure and all its procedures, but nothing much will really change. Thus, to have a substantial impact on an organization’s performance, you must change the way the engineers actually work.”

“Even when the result is a total business disaster, if the team provided a rewarding personal experience, the team members will view the project as a success.”

“When people say they are working harder, they actually mean they are working longer hours.”

“Designing, coding, reviewing, inspecting, and testing are intensely difficult tasks. To have any hope of producing quality products, we must occasionally take breaks.”

“Often, teams respond to this pressure by taking shortcuts, using poor methods, or gambling on a new (to them) language, tool, or technique.”

“Every day that you wait to act is a day that you can’t use to solve the problem.”

“The most important single asset a software engineer can have is a reputation for meeting commitments.”

“The most successful teams have energetic, enthusiastic, confident, and hard-driving leaders. If you don’t have the required energy and drive, figure out what to change so that you do. If you can’t see how to do that, either your team has a hopeless job or it needs a new leader.”

“A significant part of your leadership job is to keep the team’s goals clear and well defined and to ensure that every team member knows how his or her current tasks contribute to meeting that goal.”

“It is impossible to be an effective leader without being committed to a cause that animates you and motivates your followers.”

“Coders at Work”

Coders at Work book cover

I finished reading “Coders at Work last night. In it, author Peter Seibel interviews 15 legendary programmers, discussing how they got started with computers, how they learned to program, how they read and debug code, etc. The interviews cover a wide range of opinions and approaches, and offers a fascinating look at “computer science” history.

The format of the book is a little unusual, in that it’s entirely interview transcripts. No analysis. No author-interpretation. Just recorded conversations. At first it’s a little surprising that one can publish a book like this; But then you get into the content and it’s wonderfully engaging. Analysis and interpretation would just get in the way of letting these folks talk. Reading direct quotes makes the content all the more exciting.

The book isn’t for everyone (obviously), but I rather enjoyed it. There’s some great stories about the history of our profession, and many topics raised that inspired additional research. (I went out and found a number of research papers referenced in the interviews, and bookmarked a lot of content for further exploration.) There’s also a fair amount on the history of different programming languages, and I have a fascination with programming languages, so it was a great fit.

A few take-away themes and ideas:

  • While programming was no easy task in the early days, at least it was possible to fully-understand the hardware and all the software running it (as opposed to modern computers.) The modern computing environment presents very different challenges to present-day programmers, especially those new to the field.
  • Even some of best use print statements.
  • Passion and enthusiasm separate good programmers from great ones.
  • In academia, you have time to think about the “best” solution, without the deadlines imposed on commercial developers.
  • There’s certainly a component of “doing great work” that requires being in the right place at the right time — sometimes it’s just a matter of getting staffed on the right project.
  • There’s some negativity towards C/C++ in here, mostly due to it’s negative impact on compiler and high-level language development. (i.e., one school of thought is that you give people a high-level language and make the compiler smart. The other is that you give people a low-level language and let them do the work. Unfortunately, humans aren’t so good at hand-writing code optimized for concurrency, but once you have a language that let’s them try, it’s hard to fund compiler research.)

Here’s a few of the quotes I highlighted while reading:

“One of the most important things for having a successful project is having people that have enough experience that they build the right thing. And barring that, if it’s something that you haven’t built before, that you don’t know how to do, then the next best thing you can do is to be flexible enough that if you build the wrong thing you can adjust.” — Peter Norvig

“…there are user-interface things where you just don’t know until you build it. You think this interaction will be great but then you show it to the user and half the users just can’t get it.” — Peter Norvig

“I get so much of a thrill bringing things to life that it doesn’t even matter if it’s wrong at first. The point is, that as soon as it comes to life it starts telling you what it is.” — Dan Ingalls

“…a complex algorithm requires complex code. And I’d much rather have a simple algorithm and simple code…” — Ken Thompson

“If you can really work hard and get some little piece of a big program to run twice as fast, then you could have gotten the whole program to run twice as fast if you had just waited a year or two.” — Ken Thompson

“if they’d have asked, ‘How did you fix the bug?’ my answer would have been, ‘I couldn’t understand the code well enough to figure out what it was doing, so I rewrote it.'” — Bernie Cosell

“You have to supplement what your job is asking you to do. If your job requires that you do a Tcl thing, just learning enough Tcl to build the interface for the job is barely adequate. The right thing is, that weekend start hacking up some Tcl things so that by Monday morning you’re pretty well versed in the mechanics of it.” — Bernie Cosell

“…computer-program source code is for people, not for computers. Computers don’t care.” — Bernie Cosell

“if you rewrite a hundred lines of code, you may well have fixed the one bug and introduced six new ones.” — Bernie Cosell

“I had two convictions, which actually served me well: that programs ought to make sense and there are very, very few inherently hard problems. Anything that looks really hard or tricky is probably more the product of the programmer not fully understanding what they needed to do” — Bernie Cosell

“You never, ever fix the bug in the place where you find it. My rule is, ‘If you knew then what you know now about the fact that this piece of code is broken, how would you have organized this piece of the routine?'” — Bernie Cosell

“Part of what I call the artistry of the computer program is how easy it is for future people to be able to change it without breaking it.” — Bernie Cosell

Finished reading “Even Faster Web Sites”

book cover

I just finished reading “Even Faster Web Sites: Performance Best Practices for Web Developers“, by Steve Souders. It’s technical, and definitely for a limited audience, but it’s certainly relevant for web developers trying to squeeze a few extra milliseconds out of page render times with older browsers. (Yes, many of the techniques are just as applicable for modern browsers, but the performance competition between Firefox, Safari, and Chrome has the latest builds addressing, and solving, some of the common bottlenecks.)

What I liked best about the book were the tests and test results. Souders runs each browser through numerous test scenarios to demonstrate the (sometimes huge) impacts that small authoring decisions can make. (e.g., the surprising relationship between CSS files and inline JavaScript.) Souders also provides implementation details and decision trees for choosing and implementing as much asynchronous loading as possible.

All in all, it was a nice exploration of how different browser implementations approach page loading and painting, and how to exploit this knowledge for speed.

I was out on a business trip again last…

Emergency book cover

I was out on a business trip again last week, and took with me “Emergency: This Book Will Save Your Life“, by Neil Strauss. The book is a first-person account of Strauss’ transformation from a “soft”, urban writer, into a trained survivalist. It’s a wonderfully engaging story.

Here’s the link that put the book on my wishlist:
How to Be Jason Bourne: Multiple Passports, Swiss Banking, and Crossing Borders