A Tale of Two Databases

I’ve been working with databases for almost a decade. The databases I’ve worked with varied in size and brand. For more than half of this time, I’ve worked at a big in-house software provider, working with big Oracle deployments. On my spare time, I’ve developed web applications on top of MySQL, which has turned into my database of choice. So, when I’ve left the big company, I’ve started consulting small (and growing) startups in ways to improve their MySQL deployment.

Big companies and small companies are very different in the way they approach infrastructure, but the difference is structural, not technological. Let me explain:


In BigCo, there are many employees. Titles and responsibilities are not only well defined, but are also well enforced.

For example, when our team needed to change the database schema, we had to ask a DBA (not our DBA, mind you. The DBAs were not part of our team. They sat in a different part of the building, and used a sort of round-robin mechanism for task assignment I still don’t fully understand).  If we needed more memory, we had to go to the system administrator. If we needed a new database server, we had to go to both.

Developing a new version of our software was full of pain. Development and test environments were always last on the task queue for the infrastructure personnel. It took them from days to weeks to set up a new development environment with a subset of data from production. We never had enough storage, so we had to try and subset parts of the database that we thought were relevant for this version – but we could never get a working copy on the first try, not with the many tables we had. It took us weeks to get a server working. And then more weeks to get another for QA.

All this, before the first line of code was even written. That’s when it really got sticky.

Bugs? Sure. It happens. A wrongly placed parenthesis, and all of the user data in the development database is deleted. Wait a week to get refresh the database.

Automated testing? I’d love to, but unless there was an easy way to refresh the database, it was impossible to get repeatable results that meant much.

Unrepeatable issues? Plenty. Development’s database and code is up-to-date with the latest version. QA are running last week’s drop. There’s no way to repeat and debug issues in development. Code and data are out of sync. We just marked them as wontfix, cantrepeat, and moved on. Then we got the same bug in production.

It was a mess, but I thought it was because I was working in a big company, and there are too many people involved. I was sure once I moved to a smaller place, where I can manage my own database, I’d be fine. Right? Wrong.


Getting things done in a smaller company is easier, in theory. I could install a new server, a new database on my own. Now, I thought, I wouldn’t have to wait a week to get my database ready so I could start developing the next version. An entire copy of the operational database could fit, quite snugly, on my personal laptop. If something happened, I could just copy the database again, and start over. Easy!

Except it isn’t.

It took me a while to figure it out. After a week of work, my development database, and the one I deployed to QA last week didn’t look the same anymore. I still couldn’t reproduce the bugs they found on my own system. I found my self refreshing my development environment over and over, each time to a different version of the application, before I realized what I was missing.

Source Control is Not Enough

In the olden days, when most companies developed desktop applications, someone very smart realized that code had to be managed in a centralized repository. When the code on the developer’s desktop was different from the code in QA, source control tools were used to momentarily shelve the recent changes and reload an older version of the code. Then the developer could debug and reproduce the error, and deploy a fix.

This is not the world we live in today. Almost every new service on the internet, and many internal enterprise systems are built around a client-server architecture. Somewhere, deep down below the layers of code sits a centralized data store, that holds the state of the application. Source control is no longer sufficient to manage the development life cycle of the software. We need data management tools as well.

This is the pain that we try and solve at OffScale. If you have encountered these problems, go to our site, where you can read more and join our beta program.

* This post was also posted on my company’s blog.

Honing my Craft

Baby Steps

I’ve been programing for the better part of my life. I’ve started with BASIC, in which I’ve written a small piece of software that allowed me to manually paint a series of bitmaps and animate them. My home computer had a 4-color CGA screen, and the computer at my school was a blue-black Commodore 64, but I knew I was hooked.

In high school I’ve written my first big project as part of a computer graphics course. I ended up creating a (very bad) car racing simulator in Pascal, complete with track editing and several car models. I’ve created all the graphics from scratch, from line rendering to lighting gradients on surfaces, camera tracking and even font display. The year ended just when I was about to add the simplest AI for an opponents car. Pascal was the language I used when I’ve learn the basic concepts of Obeject Oriented Design and the use of pointers.

University is just a blur of C/C++, Java and Matlab, but mostly it was about theory. There’s nothing in the university that prepares you to become a better programmer. I’m not saying that it should, either, just stating a fact. In the basic curriculum, even though many classes require that you program, almost none of them teach best-practices, processes of developing as part of a team, source control, bug tracking, unit testing, etc. Even design patterns are hard to come by. When I left the university, I wasn’t a much better programmer. I was just a little more experienced, and I had a much better resume (Java? Sure, I’m very experienced with Java!).

What did I learn?

Bits of Useful Knowledge

Surprisingly enough, linear algebra rises to the top. Most of the math classes are about learning how to learn and learning how to think. This is how you approach a new subject. This is how you dissect it to pieces, and build knowledge on top of other knowledge. This is how you read a scientific articles. Linear algebra was the only subject I used in later classes, and in my career later on, in subjects such as machine learning and image processing.

Machine learning algorithms, which later I’ve used for data mining. This one course I’ve taken on a whim, was broad enough so I could navigate my way around the subject during my years in BigCo. I have come back to this course again and again, when devising classification and clustering algorithms.

Data structures and basic concepts of complexity. This is the single most important thing that a programmer could get out of the BSc in computer science as it is built today. Even though these data structures are already implemented in most languages, knowing how they work and when to use them is very useful.

Bytes of Outdated Knowledge

Turing machines. Calculus. Outdated models of database design and concepts for relational databases. Classical mechanics.

Things I’ve learned in The Real World and I Wish Someone Would’ve Told Me

I’ve really started honing my craft when I’ve arrived at BigCo, fresh out of the University. The first thing I had to do was read a long and boring book about the working of relational databases and schema design. Most programmers hate the database. They hate the interfaces to the databases and they wish they didn’t have to deal with it. They don’t like the fact that it has it’s own language called SQL. Most of all, they don’t like the  fact that there’s a lot of black magic in dealing with a database, because the thing is so damn smart, you can hardly predict how it will behave.

At this point I realized that most of what I’ve learned in the University is outdated and wrong. Complexity models are a terrible model to anticipate behavior. In the real world, you are not plagued by O(NlogN) problems. You are troubled by the time it takes to the spindle in your hard-drive to spin, the seek time, the read speed, the round trip between servers. You are interested in cache behaviors and what to do when not everything fits in memory.

The basic concepts of schema design don’t work. Normalization? Huh. No wonder programmers are flocking to NoSQL solutions. Everything they think they know about databases is wrong, because Universities teach bad-practices.

There is no Spoon

Once I was free of the notions of academia, I was free to start reading on my own. Google, blogs and Wikipedia really came to the rescue here. I’m not big on books (teach yourself nothing in 21 days), and rather have a constant stream of news, tips and relevant knowledge as I need it. My reading lists change often. C# and .Net gave place to Python. Centralized source control were replaced by decentralized source control. Model-View-Controller (MVC)  is now Model-view-template (MTV). Waterfall became agile.

Until I’ve left BigCo last year, I was always learning new things, and I’ve had the privilege to test new ideas both inside and outside of BigCo. I’ve learned that the most important thing is not taught in schools: It’s about the people. A small, dedicated group can out maneuver a bigger group by adopting better processes, not better technologies. Human interactions and communications are more important to a team’s success than anything you can learn in computer science.

I haven’t learned any new technology this year. Instead, I’m now spending part of my time honing my craft and becoming a better programmer by learning about processes. I think you should too.

The Right Hand Side of The Network Effect

We all know about network effects. We understand them intuitively.

In economics and business, a network effect (also called network externality) is the effect that one user of a good or service has on the value of that product to other people. When network effect is present, the value of a product or service increases as more people use it.

The classic example is the telephone. The more people own telephones, the more valuable the telephone is to each owner. This creates a positive externality because a user may purchase their phone without intending to create value for other users, but does so in any case.

Network Effect, Wikipedia

The Four Stages of Product Adoption

I bet most of you know this image. Dividing customers into early adopters, majority and laggards is Business of Software 101. What we usually don’t discuss is the effect of the accumulated graph, which is the level of technology saturation.

The stages of a product need to fit not only the type of people in the segment, but also the level of market share and saturation in the population. Getting a repeat customer is very different than selling for the first time. Expectations are different and so the product must be different.

Network effects are paramount to the adoption of new technologies, and not only because early adopters talk about them and evangelize them. Network effects have everything to do with usability, and this is something you need to understand when you think about strategy.

1. The Runway (The Chicken and Egg Problem)

Let’s take HDTV, as an example. We don’t usually think of TV as a business with network effects, but in fact it has very strong network effects. The more people buy TVs that support HD, the more incentive there is for media producers to provide HD content. It also goes backwards. The more HD content is out there, the more people will buy HD television sets. Basic rule of network effects applies: value is proportionate to adoption rates.

Early adopters eat eggs for breakfast, if you show them a chicken. The rest, not so much. The runway can be very long. Fax machines took almost twenty years to catch up. Color TV took ten years, etc.

2. The Hockey Stick (Ramping Up)

Network effects are getting stronger. Value of adoption is rising as the product gets more popular. This is what every investor is looking for.

The Hockey Stick growth phase is a wild ride. You hang on for dear life, as you turn (faster and faster) into the next big thing. You will see very steep growth in revenue. You are happy, your investors are happy, your users are happy. This is when DVD becomes available at every Blockbuster, the USB port is suddenly available on every computer or the internet makes newspapers look arcane.

The image below displays adoption rates of some of the most popular technologies of the past century. The X axis is time and the Y axis is percent of penetration. You can see the hockey stick on most product. Then, there are those who seem to rise slower, like cable TV. That also has to do with network effects (or lack thereof). Click on it to see a larger version.

3. Saturation (Super Size Me)

Everyone has a TV, a DVD player, a cellphone. Revenue streams steadies, as there are no more new customers buying the product. Stockholders are anxious. They expect the company’s revenue to continue to grow. Strategy changes drastically.

There are no more new customers. Now you need old customers to replace the product they already have. I replace my cellphone every two years, my DVD player breaks every six months, the TV goes out every five. How do they manage to get me to spend, spend, spend? There are several strategies:

  • Sell more to each customer. A household of six should have four TVs, six cellphones and three cars. Wait. What? When I was your age, we only had the one horse!
  • Sell better features, minor changes within the same technology. This is the age of bigger TVs, super sized McDonald’s meals, smaller cellphones, shiny cars.
  • Sell services and not products. Flickr for your pictures, YouTube for your videos, Facebook for your friends, and very little ownership.
  • Make it to break. Price goes down, and so does quality.
  • Make minor changes to the technology. Faster networks (and the always-connected-bandwidth-guzzling iPhone charged by the kilobit).

4. Decline (So 90’s)

Ye Olde GramophoneVHS was replace by the DVD. Newspapers are replaced by news websites. Books will be replaced by e-readers. Each of these newer technologies had or will have a long runway, but change is inevitable. The old product declines inversely to the hockey stick growth of its replacement.

Some companies predict the change, and move from one technology to the next. Others die. It’s the circle of life.

Technical Addendum

Bass Diffusion Model helps you calculate the adoption rates of your technology:

So if at given point in time t, N(t) consumers have adopted the new technology, and a total of x(t) consumers have adopted it so far, 0<p<1 is the internal influence  and 0<q<1 is the external influence (the network effect) then dx/dt is the rate of adoption of the technology by the market.

For more on this, you can read the materials from a course I’ve taken about diffusion of new technologies, available here.

So can you now predict the growth of your product?

Do You Own Your iPhone?

There were some dark times a few years back, when big studios sued old ladies for downloading music illegally. When this failed, they have tried blocking us with DRM technologies. However, DRM was ineffective and drove customers away, so music stores mostly abandoned it.

I was quite hopeful. I thought customer demand and copyright owners have reached a new working equilibrium. I’m not so sure anymore. Consumers are in a much worse position compared to where we were twenty years ago. Once, when you bought a book, you owned it. When you bought software, you owned it. When you bought hardware, you owned it. Well, this may not be the case anymore.

You Do Not Own Your Books

Sometime in July 2009, buyers of Orwell’s literary masterpiece 1984, found that the e-book they have purchased has been erased from their Kindle devices. Is this legal? Well, it depends:

Amazon’s published terms of service agreement for the Kindle does not appear to give the company the right to delete purchases after they have been made. It says Amazon grants customers the right to keep a “permanent copy of the applicable digital content.”

However, in a lawsuit settled not two months after the incident:

In the settlement, Amazon promises never to repeat its actions, under a few conditions. The retailer will still wipe an e-book if a court or regulatory body orders it, if doing so is necessary to protect consumers from malicious code, if the consumer agrees for any reason to have the e-book removed, or if the consumer fails to pay (for instance, if the credit card issuer doesn’t remit payment).

So, the answer is still “no,” you don’t own the digital books you download. Though I can understand the reasoning behind some of the exceptions Amazon lays out, Amazon still maintains control over your e-books. It is not the same as having a book all to yourself once you leave the bookstore.

This means you do not own your books. You are licensed to read them, of course, but you do not own them. Not in the sense you were used to with our old paper books, the ones you lent to our friends.

You Do Not Own Your Software

I’ve been fuming at the mouth for the past few weeks after I’ve read this little piece on Slashdot:

A federal appeals court ruled today that the first sale doctrine is “unavailable to those who are only licensed to use their copies of copyrighted works.” This reverses a 2008 decision from the Autodesk case, in which a man was selling used copies of AutoCAD that were not currently installed on any computers. Autodesk objected to the sales because their license agreement did not permit the transfer of ownership. Today’s ruling (PDF) upholds Autodesk’s claims: “We hold today that a software user is a licensee rather than an owner of a copy where the copyright owner (1) specifies that the user is granted a license; (2) significantly restricts the user’s ability to transfer the software; and (3) imposes notable use restrictions. Applying our holding to Autodesk’s [software license agreement], we conclude that CTA was a licensee rather than an owner of copies of Release 14 and thus was not entitled to invoke the first sale doctrine or the essential step defense. “

In short: you are licensed to use the software, but it is not yours. Even if this ruling only holds for AutoCAD today, I’m sure the rest of the industry (and gaming in particular) will promptly follow suit with “improved” licenses.

You Do Not Own Your Hardware

Microsoft banned 1 million users from Xbox Live. These users were banned because their machines have been altered to allow them to play pirated games. While this it is legitimate to fight pirates, is the act of modifying the Xbox an act of copyright infringement? That depends:

“With hardware, you can do pretty much anything you want with it. There are very few rules that apply. You buy it, you own, you can take it apart, and that’s perfectly fine,” she explained. The problem is that no one simply modifies the hardware. “It becomes complicated with modern hardware because it’s combined with firmware, the embedded software.”

So, our ownership of the Xbox is partial, at best. This gives Microsoft the right to do more than ban us from the online service they provide after purchase. It gives them the right to cripple your Xbox, or even remove the software remotely, rendering the hardware useless:

Your Banned 360:

  • Cannot go on Xbox Live
  • Cannot install games to the HDD
  • Cannot use Windows Media Centre extender
  • Cannot be used to get achievements from backups without corrupting your profile

Next in stores: Apple kills jail-broken iPhones. LG shuts down TVs that played pirated movies. Toyota pulls the plug on car modifications.

Copyright as in The Right To Copy

Somewhere along the way we have lost the sense of ownership. In the digital world, every use produces a copy. Thus, every use requires consent by the license owner.

I understand the need for copyright protection by law. Piracy, distributing the work as is and wholesale possibly for a profit, is not to be condoned. I just want to point out the price we pay to keep the rights of the few protected. Everything we “buy” is actually lent to us under stern limitations (must remain connected to the Internet,  must enter credit card, must use this hardware USB device, must not sell, must not modify).

Copy rights are important, but what about the rights of ownership and property?

I’ll leave you with Larry Lessig, advocate for copyright reform and the worlds coolest lawyer:

Democratizing Taxes

There’s an old proverb:

Nothing in this world is certain except for death and taxes, but scientists are still trying to prevent death.

Government and taxes

Governments are a way for people to provide certain basic needs at scale. Most commonly, the services best provided by the government are those of natural monopoly: control over natural resources, electricity and water supply, roads and railroads, safety (police and army), health and education. Things that have economy of scale, where competition is ill advised, and privatization will not benefit the people.

Let’s analyze the army, as an example: the army provides security. It prevents threats to the people of this country by the people of other countries. Let’s say, for the sake of example, that the army is privatized. My neighbors pay for protection and I don’t. Can the army protect their country without protecting mine? No. Therefore, I’m incentivized not to pay, not to conscript. I leach on others. Governments are built to prevent such behavior, and it is done by taxes.

A tax is a way of collecting money from the people, for the benefit of the people through these

natural monopolies. And in order to make things easier, all tax money is pooled into one big pool, and then spread back to the different services. Through the centuries of slow information and transaction flow, this provided a certain fairness. Collecting taxes was complicated, spreading the money was complicated. Asking people what they want was complicated – even in relatively modern democracies.

Over the years, technology and changes in moral stands may change what is considered as public service and what is considered for private service, as the example of the privatized firefighters becoming popular again in the US – they will save your life, but will only save your property if you pay extra. However, the amount of control you had about how your tax money was spend was extremely limited.

Charity and taxes

There seems to be a charity boom over the past few years. I think there are two reasons for this:

  1. Increase in available income: there has been a tremendous improvement in the average quality of life over the past hundred years. Over the last decade, a lot of technologies have reached a level of maturity that we get so much bang for the buck, that there are we have spare bucks in the bank. Computers are dirt cheap, cars are smaller and cheaper, overall health is better, etc. This leaves more income for charity – because giving makes us feel good about ourselves.
  2. Internet makes charity direct: take Kiva, for example. Kiva provides you with an interface to control exactly what happens with every dollar you donate. You know who gets it, and what they plan to do with it. This is not an organization governed by a group of old people in suits. This is you helping one other person.

I think that for these two reasons, we will see more and more charity in the coming years.

There are claims that charity is the sign of bad policy. That is, if governments did a better job, we wouldn’t need charity. I think it’s the other way around. Governments were the only way we could manage the basic services in the old days. There just wasn’t any other way. Taxes are built to fit the old days.

Charity is people voting with their money on what services are important and what aren’t. What they believe should be the basic rights of every human being and what isn’t. I believe that we will see more charity-like models enter governments over the next fifty years, as democracy becomes more direct and more open than ever before.

Charitable Taxation

Don’t get me wrong – I think taxes are extremely important. It is crucial that we force everyone to partake in the payment for the public services we consume. What I suggest is a concept of Charitable Taxation, under which you will get a choice of spending some of your income tax directly to charity or government service of your choice. For example, I think most of my income tax should go to health and education services. Other may think the police is more important.

Services that are underfunded will have to change. They will be forced to become private – as it seems that nobody feels they are necessary as government services. Others will bloom, as the people feel they are more important. And what services are provided by the governme

nt will be in constant change, always adapting to the time.

There is a lot of details that need to be brushed out, obviously. I’m not completely certain that this mechanism is, in fact, stable. However, I think that we are best provided when we are more involved in how our governments act. After all, they are the government of the people, by the people and for the people.