One More Time, From The Top

Last time I’ve demonstrated a way to create mosaics from images. I wanted to do a little bit of image processing using javascript and HTML5′s new canvas abilities. As I mentioned, this was only the first of an idea I had about turning social experience into a beautiful, expressive and personal graphic.

Mosaics, Revisited

Canvas is an extremely powerful concept, allowing web developers to create more expressive web pages. The web will soon be filled with rich user experience and we will be dancing like it’s 1995. This has been a long haul, but we are finally WINNING. I just hope we have better designers on the web than we did on the desktop. I never (ever!) want to see the Word 6.0 toolbar sprawl on a web page.

My favorite web sites focus on content. They try to let everything else blend into the background. On the best sites content is king. Google, YouTube and Facebook all use a clean, white background, but there are other examples for beautiful design. TheSixtyOne is both artfully designed, while the functionality just melts away into an incredible user experience.

So, I took the mosaic idea one step further, creating a (very simple) tiling jQuery plugin that allows to recreate a large image from small images. Here’s a demo.

Original Image

Canvas (mouse over to see the small images)

The code is available here. It is by no means complete. You will notice I cheat terribly, as I overlay the miniature images on the colors of the background photo. I’m doing this for two reasons: first, getting a good color metric working for images of significant size isn’t as easy as I hoped it would be (and performance is atrocious), and second, I expect that the selection of photos I will have to use in a real life scenario to be too small for a good match to be found for all the pixels in a photo. All in all, this looks good enough, and was much simpler to code.

During my work, I’ve discovered a few issues I’ll go into the next section. If you have advice, I’d love to hear it. I’d much prefer not going the server-client route and keep this a client-only project.

Performance, To Heck With Thou

Javascript is slow. I didn’t expect it to be so. I knew it was considered relatively slow, however there has been a lot of improvement in javascript performance in the browsers over the past years. It is still not enough.

I don’t know if you noticed, but while creating the mosaic above, your browser gobbled up a large chunk of memory. My laptop’s cooling fan whirred loudly as I waited for the picture to be created. The experience sucks. For serious number crunching and data manipulation, I suppose we still need to fall back to good old server side coding.
Or maybe I write bad code. Care to take a look and throw me a line?

Not that it matters. As much as I could hope to fix the performance issues, there are still a limitation I can see no way around.

Hacker, To Hack With Thou

For some silly reason, security limitations on web developers are much stricter than they are for any other type of development. This could be because client-server architectures are by nature more vulnerable, or maybe it’s because it is so darn easy to steal data using a rouge <script> tag. The architects of the internet over at EMCA, that set up the specs for HTML5 and JS, decided to put precautions against every obscure scenario of possible security risk. Thank you guys. I love you, but you are bringing me down.

While for security as a concept, in this particular case it has been quite limiting. The only way to analyze images in JS is by turning them into giant bitmaps using the canvas. Basically, I draw the image on the canvas, then I read it out as an array of RGB tuples. Then I analyze the photo and decide where to place it against the larger image. The problem arises when I try and analyze images that are not from my domain (or my JS domain).

In many sites we use resources from other sites. We just embed them using an img tag, and never think about it. Apparently, in order to prevent a hacker from taking a snapshot of your password on a canvas and sending the canvas somewhere else, a canvas can only be read from programmatically if it is safe, where safe is defined as: only been manipulated by objects from the same domain.

Short version: this technique could never work with photos that are not served from the same domain, which makes it impossible for me to use them in any social networking context. It may still be possible to use this technique in a browser extension, but I don’t think I want to go down that path. I want a write-once-run-anywhere experience, if possible.

This may still be a learning experience, though. Can anyone recommend a good image processing library for Ruby or Scala? If not, I’ll probably default back to Python. Or maybe I’ll do the extension after all.

Leave the first comment

Messing Around With Images

The Side Dish

If you know me at all, you know that I always have a side project. I mean, I love the main course as it is full of challenges, but sometimes I need to do something else. I need a Gedankenvacation.

Dude, you may ask, why do your vacations include hours of code in front of a small screen? Why not go out, get some fresh air and enjoy the sun? That’s actually a good question. Maybe it’s because I love to code. Maybe I’m just a bit lazy. Maybe I’m a vampire. Who knows?

All I know is that when I work too hard on one problem, I can tap out of resources pretty quickly and then I turn into a Facebook Zombie. Learning a new skill or working on a different problem usually helps me get out of the rut, and allows me to go on for much longer stretches.

My side projects usually have some sort of visual or artistic nature. I’m hardly any good at it, but I love doing it, and I don’t mind failing. One year I have written detective stories in a blog format (think of how a blog of a private investigator would look like). One year I’ve founded (and terminated) an online T-shirt store. One year I have worked on a massive multi-player online game (that we abandoned unfinished). And one year I helped organize reality-hacking events, where I got to meet interesting people and edit lots of videos. This year I want to work on a combination of visualization and social themes, simply because it’s the complete opposite of what I’m working on during my “day job”.

Messing Around With Images

It’s been almost a decade since I last did anything related to image processing. I was in the second year of my computer science degree, and I was looking for any practical course I could find. I never felt engaged by highly abstract mathematical courses such as Computational Complexity Theory. Call me a heretic, but in my mind Turing machines aren’t very practical mental models for computers, and I don’t understand why they are still being taught.

Anyway, digital image processing was exceptionally suited for my practical needs. First, you could actually see the results of your work. Second, it included just enough math so it wasn’t completely technical and boring (unlike, unfortunately, most of the programming tasks in Data Structures) and just enough code as to be useful.  Third, did I mention you could actually see results with your own eyes?

It’s A Boy!

This weekend I’ve spent some time messing with HTML5′s canvas and images. I have created a small jQuery plugin that creates a nice mosaic effect on an image. You can see a few examples here:

Original Image

5x pixel size

10x pixel size

15x pixel size

Note that all the processing is done by the browser, no server side code at all. If you are using Internet Explorer, chances are that your browser doesn’t support this feature, so I don’t know if it displays correctly. I hear IE9 is out and is very good. You can upgrade here.

As I mentioned, this is a very simple example of how images can be manipulated quickly and in the browser. With devices getting stronger, we may be able to push more computation to the clients (and find a better balance of server-client load).

In this demo script I calculate the average color of each section of the image, and then fill a rectangle with this color on a canvas. The JS can be downloaded from here. This is practically the first script I have written from scratch, so it is probably full of newby mistakes. Please, let me know. I want to get better.

I have several thoughts about what I want to use it for, but there are still some features I need to add first. Stay tuned.

4 comments so far, add yours

A Tale of Two Databases

I’ve been working with databases for almost a decade. The databases I’ve worked with varied in size and brand. For more than half of this time, I’ve worked at a big in-house software provider, working with big Oracle deployments. On my spare time, I’ve developed web applications on top of MySQL, which has turned into my database of choice. So, when I’ve left the big company, I’ve started consulting small (and growing) startups in ways to improve their MySQL deployment.

Big companies and small companies are very different in the way they approach infrastructure, but the difference is structural, not technological. Let me explain:

BigCo

In BigCo, there are many employees. Titles and responsibilities are not only well defined, but are also well enforced.

For example, when our team needed to change the database schema, we had to ask a DBA (not our DBA, mind you. The DBAs were not part of our team. They sat in a different part of the building, and used a sort of round-robin mechanism for task assignment I still don’t fully understand).  If we needed more memory, we had to go to the system administrator. If we needed a new database server, we had to go to both.

Developing a new version of our software was full of pain. Development and test environments were always last on the task queue for the infrastructure personnel. It took them from days to weeks to set up a new development environment with a subset of data from production. We never had enough storage, so we had to try and subset parts of the database that we thought were relevant for this version – but we could never get a working copy on the first try, not with the many tables we had. It took us weeks to get a server working. And then more weeks to get another for QA.

All this, before the first line of code was even written. That’s when it really got sticky.

Bugs? Sure. It happens. A wrongly placed parenthesis, and all of the user data in the development database is deleted. Wait a week to get refresh the database.

Automated testing? I’d love to, but unless there was an easy way to refresh the database, it was impossible to get repeatable results that meant much.

Unrepeatable issues? Plenty. Development’s database and code is up-to-date with the latest version. QA are running last week’s drop. There’s no way to repeat and debug issues in development. Code and data are out of sync. We just marked them as wontfix, cantrepeat, and moved on. Then we got the same bug in production.

It was a mess, but I thought it was because I was working in a big company, and there are too many people involved. I was sure once I moved to a smaller place, where I can manage my own database, I’d be fine. Right? Wrong.

SmallCo

Getting things done in a smaller company is easier, in theory. I could install a new server, a new database on my own. Now, I thought, I wouldn’t have to wait a week to get my database ready so I could start developing the next version. An entire copy of the operational database could fit, quite snugly, on my personal laptop. If something happened, I could just copy the database again, and start over. Easy!

Except it isn’t.

It took me a while to figure it out. After a week of work, my development database, and the one I deployed to QA last week didn’t look the same anymore. I still couldn’t reproduce the bugs they found on my own system. I found my self refreshing my development environment over and over, each time to a different version of the application, before I realized what I was missing.

Source Control is Not Enough

In the olden days, when most companies developed desktop applications, someone very smart realized that code had to be managed in a centralized repository. When the code on the developer’s desktop was different from the code in QA, source control tools were used to momentarily shelve the recent changes and reload an older version of the code. Then the developer could debug and reproduce the error, and deploy a fix.

This is not the world we live in today. Almost every new service on the internet, and many internal enterprise systems are built around a client-server architecture. Somewhere, deep down below the layers of code sits a centralized data store, that holds the state of the application. Source control is no longer sufficient to manage the development life cycle of the software. We need data management tools as well.

This is the pain that we try and solve at OffScale. If you have encountered these problems, go to our site, where you can read more and join our beta program.

* This post was also posted on my company’s blog.

2 comments so far, add yours

Honing my Craft

Baby Steps

I’ve been programing for the better part of my life. I’ve started with BASIC, in which I’ve written a small piece of software that allowed me to manually paint a series of bitmaps and animate them. My home computer had a 4-color CGA screen, and the computer at my school was a blue-black Commodore 64, but I knew I was hooked.

In high school I’ve written my first big project as part of a computer graphics course. I ended up creating a (very bad) car racing simulator in Pascal, complete with track editing and several car models. I’ve created all the graphics from scratch, from line rendering to lighting gradients on surfaces, camera tracking and even font display. The year ended just when I was about to add the simplest AI for an opponents car. Pascal was the language I used when I’ve learn the basic concepts of Obeject Oriented Design and the use of pointers.

University is just a blur of C/C++, Java and Matlab, but mostly it was about theory. There’s nothing in the university that prepares you to become a better programmer. I’m not saying that it should, either, just stating a fact. In the basic curriculum, even though many classes require that you program, almost none of them teach best-practices, processes of developing as part of a team, source control, bug tracking, unit testing, etc. Even design patterns are hard to come by. When I left the university, I wasn’t a much better programmer. I was just a little more experienced, and I had a much better resume (Java? Sure, I’m very experienced with Java!).

What did I learn?

Bits of Useful Knowledge

Surprisingly enough, linear algebra rises to the top. Most of the math classes are about learning how to learn and learning how to think. This is how you approach a new subject. This is how you dissect it to pieces, and build knowledge on top of other knowledge. This is how you read a scientific articles. Linear algebra was the only subject I used in later classes, and in my career later on, in subjects such as machine learning and image processing.

Machine learning algorithms, which later I’ve used for data mining. This one course I’ve taken on a whim, was broad enough so I could navigate my way around the subject during my years in BigCo. I have come back to this course again and again, when devising classification and clustering algorithms.

Data structures and basic concepts of complexity. This is the single most important thing that a programmer could get out of the BSc in computer science as it is built today. Even though these data structures are already implemented in most languages, knowing how they work and when to use them is very useful.

Bytes of Outdated Knowledge

Turing machines. Calculus. Outdated models of database design and concepts for relational databases. Classical mechanics.

Things I’ve learned in The Real World and I Wish Someone Would’ve Told Me

I’ve really started honing my craft when I’ve arrived at BigCo, fresh out of the University. The first thing I had to do was read a long and boring book about the working of relational databases and schema design. Most programmers hate the database. They hate the interfaces to the databases and they wish they didn’t have to deal with it. They don’t like the fact that it has it’s own language called SQL. Most of all, they don’t like the  fact that there’s a lot of black magic in dealing with a database, because the thing is so damn smart, you can hardly predict how it will behave.

At this point I realized that most of what I’ve learned in the University is outdated and wrong. Complexity models are a terrible model to anticipate behavior. In the real world, you are not plagued by O(NlogN) problems. You are troubled by the time it takes to the spindle in your hard-drive to spin, the seek time, the read speed, the round trip between servers. You are interested in cache behaviors and what to do when not everything fits in memory.

The basic concepts of schema design don’t work. Normalization? Huh. No wonder programmers are flocking to NoSQL solutions. Everything they think they know about databases is wrong, because Universities teach bad-practices.

There is no Spoon

Once I was free of the notions of academia, I was free to start reading on my own. Google, blogs and Wikipedia really came to the rescue here. I’m not big on books (teach yourself nothing in 21 days), and rather have a constant stream of news, tips and relevant knowledge as I need it. My reading lists change often. C# and .Net gave place to Python. Centralized source control were replaced by decentralized source control. Model-View-Controller (MVC)  is now Model-view-template (MTV). Waterfall became agile.

Until I’ve left BigCo last year, I was always learning new things, and I’ve had the privilege to test new ideas both inside and outside of BigCo. I’ve learned that the most important thing is not taught in schools: It’s about the people. A small, dedicated group can out maneuver a bigger group by adopting better processes, not better technologies. Human interactions and communications are more important to a team’s success than anything you can learn in computer science.

I haven’t learned any new technology this year. Instead, I’m now spending part of my time honing my craft and becoming a better programmer by learning about processes. I think you should too.

2 comments so far, add yours

The Right Hand Side of The Network Effect

We all know about network effects. We understand them intuitively.

In economics and business, a network effect (also called network externality) is the effect that one user of a good or service has on the value of that product to other people. When network effect is present, the value of a product or service increases as more people use it.

The classic example is the telephone. The more people own telephones, the more valuable the telephone is to each owner. This creates a positive externality because a user may purchase their phone without intending to create value for other users, but does so in any case.

- Network Effect, Wikipedia

The Four Stages of Product Adoption

I bet most of you know this image. Dividing customers into early adopters, majority and laggards is Business of Software 101. What we usually don’t discuss is the effect of the accumulated graph, which is the level of technology saturation.

The stages of a product need to fit not only the type of people in the segment, but also the level of market share and saturation in the population. Getting a repeat customer is very different than selling for the first time. Expectations are different and so the product must be different.

Network effects are paramount to the adoption of new technologies, and not only because early adopters talk about them and evangelize them. Network effects have everything to do with usability, and this is something you need to understand when you think about strategy.

1. The Runway (The Chicken and Egg Problem)

Let’s take HDTV, as an example. We don’t usually think of TV as a business with network effects, but in fact it has very strong network effects. The more people buy TVs that support HD, the more incentive there is for media producers to provide HD content. It also goes backwards. The more HD content is out there, the more people will buy HD television sets. Basic rule of network effects applies: value is proportionate to adoption rates.

Early adopters eat eggs for breakfast, if you show them a chicken. The rest, not so much. The runway can be very long. Fax machines took almost twenty years to catch up. Color TV took ten years, etc.

2. The Hockey Stick (Ramping Up)

Network effects are getting stronger. Value of adoption is rising as the product gets more popular. This is what every investor is looking for.

The Hockey Stick growth phase is a wild ride. You hang on for dear life, as you turn (faster and faster) into the next big thing. You will see very steep growth in revenue. You are happy, your investors are happy, your users are happy. This is when DVD becomes available at every Blockbuster, the USB port is suddenly available on every computer or the internet makes newspapers look arcane.

The image below displays adoption rates of some of the most popular technologies of the past century. The X axis is time and the Y axis is percent of penetration. You can see the hockey stick on most product. Then, there are those who seem to rise slower, like cable TV. That also has to do with network effects (or lack thereof). Click on it to see a larger version.

3. Saturation (Super Size Me)

Everyone has a TV, a DVD player, a cellphone. Revenue streams steadies, as there are no more new customers buying the product. Stockholders are anxious. They expect the company’s revenue to continue to grow. Strategy changes drastically.

There are no more new customers. Now you need old customers to replace the product they already have. I replace my cellphone every two years, my DVD player breaks every six months, the TV goes out every five. How do they manage to get me to spend, spend, spend? There are several strategies:

  • Sell more to each customer. A household of six should have four TVs, six cellphones and three cars. Wait. What? When I was your age, we only had the one horse!
  • Sell better features, minor changes within the same technology. This is the age of bigger TVs, super sized McDonald’s meals, smaller cellphones, shiny cars.
  • Sell services and not products. Flickr for your pictures, YouTube for your videos, Facebook for your friends, and very little ownership.
  • Make it to break. Price goes down, and so does quality.
  • Make minor changes to the technology. Faster networks (and the always-connected-bandwidth-guzzling iPhone charged by the kilobit).

4. Decline (So 90′s)

Ye Olde GramophoneVHS was replace by the DVD. Newspapers are replaced by news websites. Books will be replaced by e-readers. Each of these newer technologies had or will have a long runway, but change is inevitable. The old product declines inversely to the hockey stick growth of its replacement.

Some companies predict the change, and move from one technology to the next. Others die. It’s the circle of life.

Technical Addendum

Bass Diffusion Model helps you calculate the adoption rates of your technology:

So if at given point in time t, N(t) consumers have adopted the new technology, and a total of x(t) consumers have adopted it so far, 0<p<1 is the internal influence  and 0<q<1 is the external influence (the network effect) then dx/dt is the rate of adoption of the technology by the market.

For more on this, you can read the materials from a course I’ve taken about diffusion of new technologies, available here.

So can you now predict the growth of your product?

Leave the first comment