Automated Testing with MongoDB and OffScale

Last friday, I attended MongoSF 2012 as a speaker. It was a great event, with about a thousand programmers showing up, and some fantastic and deeply technical talks about advanced topics in MongoDB. And in the midst of all that, 10gen were gracious enough to let me have my own thirty minute slot at the EcoSystem Track to talk about OffScale, and how it can help with automated testing.

As a first time speaker, it was a challenging and rewarding experience. It forced me to step out of my comfort zone and practice my public speaking skills. For OffScale it was a great opportunity to get in touch with more programmers and hear some feedback. As we are in beta, and working on improving our product, listening to people’s questions is sometimes more important than having good answers, as this helps us cristalize our priorities. To this, I want to give out special thanks to the MongoLab guys that attended the talk and asked some great questions about use cases we will definitely address in the future. This was exactly why I wanted to speak.

Here are the slides for the talk I gave. I’ll probably go into more details in future posts, but the basic theme is: do more tests, use real data, make your database a first class citizen in your environment & have lots of jokes in the slides if you are not a funny person.

Go download our beta and try it out.

Schema Management Under Source Control

A long time ago, I’ve written a post for Kaltura, the best video management platform out there, where I worked at as a data warehouse consultant and lead developer. I’ve spend a lot of my time there scaling the system as the company grew, writing ETL processes and code, migrating large databases from one schema to another, and building tools and processes to help make testing and deployment easier for such large databases.

My good friend, Roni Cohen, community manager extraordinaire, has brought to my attention that the post has finally been published:

Schema Management Under Source Control

Deploying new features that require changes to the database is a technical challenge in any application. To quote Jeff Atwood from Coding Horror: “When I ask development teams whether their database is under version control, I usually get blank stares.”

The database is a critical part of your application. If you deploy version 2.0 of your application against version 1.0 of your database, what do you get? A broken application. And that’s why your database should always be under source control right next to your application code. You deploy the app, and you deploy the database. Like peanut butter and chocolate, they are two great tastes that taste great together.

Not coincidentally, it is very relevant to what OffScale is doing today, trying to make the database an integral part of the development cycle and a first class citizen in a source controlled environment.

A Tale of Two Databases

I’ve been working with databases for almost a decade. The databases I’ve worked with varied in size and brand. For more than half of this time, I’ve worked at a big in-house software provider, working with big Oracle deployments. On my spare time, I’ve developed web applications on top of MySQL, which has turned into my database of choice. So, when I’ve left the big company, I’ve started consulting small (and growing) startups in ways to improve their MySQL deployment.

Big companies and small companies are very different in the way they approach infrastructure, but the difference is structural, not technological. Let me explain:

BigCo

In BigCo, there are many employees. Titles and responsibilities are not only well defined, but are also well enforced.

For example, when our team needed to change the database schema, we had to ask a DBA (not our DBA, mind you. The DBAs were not part of our team. They sat in a different part of the building, and used a sort of round-robin mechanism for task assignment I still don’t fully understand).  If we needed more memory, we had to go to the system administrator. If we needed a new database server, we had to go to both.

Developing a new version of our software was full of pain. Development and test environments were always last on the task queue for the infrastructure personnel. It took them from days to weeks to set up a new development environment with a subset of data from production. We never had enough storage, so we had to try and subset parts of the database that we thought were relevant for this version – but we could never get a working copy on the first try, not with the many tables we had. It took us weeks to get a server working. And then more weeks to get another for QA.

All this, before the first line of code was even written. That’s when it really got sticky.

Bugs? Sure. It happens. A wrongly placed parenthesis, and all of the user data in the development database is deleted. Wait a week to get refresh the database.

Automated testing? I’d love to, but unless there was an easy way to refresh the database, it was impossible to get repeatable results that meant much.

Unrepeatable issues? Plenty. Development’s database and code is up-to-date with the latest version. QA are running last week’s drop. There’s no way to repeat and debug issues in development. Code and data are out of sync. We just marked them as wontfix, cantrepeat, and moved on. Then we got the same bug in production.

It was a mess, but I thought it was because I was working in a big company, and there are too many people involved. I was sure once I moved to a smaller place, where I can manage my own database, I’d be fine. Right? Wrong.

SmallCo

Getting things done in a smaller company is easier, in theory. I could install a new server, a new database on my own. Now, I thought, I wouldn’t have to wait a week to get my database ready so I could start developing the next version. An entire copy of the operational database could fit, quite snugly, on my personal laptop. If something happened, I could just copy the database again, and start over. Easy!

Except it isn’t.

It took me a while to figure it out. After a week of work, my development database, and the one I deployed to QA last week didn’t look the same anymore. I still couldn’t reproduce the bugs they found on my own system. I found my self refreshing my development environment over and over, each time to a different version of the application, before I realized what I was missing.

Source Control is Not Enough

In the olden days, when most companies developed desktop applications, someone very smart realized that code had to be managed in a centralized repository. When the code on the developer’s desktop was different from the code in QA, source control tools were used to momentarily shelve the recent changes and reload an older version of the code. Then the developer could debug and reproduce the error, and deploy a fix.

This is not the world we live in today. Almost every new service on the internet, and many internal enterprise systems are built around a client-server architecture. Somewhere, deep down below the layers of code sits a centralized data store, that holds the state of the application. Source control is no longer sufficient to manage the development life cycle of the software. We need data management tools as well.

This is the pain that we try and solve at OffScale. If you have encountered these problems, go to our site, where you can read more and join our beta program.

* This post was also posted on my company’s blog.

On Websites & APIs

A few weeks back, I attended Startup Weekend in Israel. Startup Weekend is a gathering of people of all sorts – coders, designers, marketers and the like – that join forces for one intensive weekend to create something out of nothing. While most groups spent most of their time discussing business plans and polishing presentations (which was a disappointment for some of the more talented developers in the bunch), our team spent almost all of our time developing a new internet service.

What’s In A Service?

Developing a new service in 48 hours is not an simple task, especially for a group of ten people who have only just met and have widely varying skill sets. We wanted to take one commercial area, which we felt was badly served by existing sites, and revamping it. Creating, in two days, the open source seed that could later be used to take over the category. In Israel, the worst served segment is of classified ads, so that is what we were aiming for.

Here’s a list of what we had in mind:

  • Insert a new item (for example, a car or a cellphone)
  • Different displays and properties for each kind of classified ad (cars are different than cellphones)
  • Filter many items (according to properties of the item)
  • Search
  • User registration and login (via existing services, such as Facebook Connect)

To make things more difficult, we had ambitious goals about creating the front end of the service as well:

  • Develop not only a website, but also mobile applications (mainly, iPhone and Android).
  • Translate to several languages.
  • Test several, completely different,  design concepts and user interfaces. We really wanted to make a radically better website. Aside from the graphic design, we had several ideas about comments and Facebook integration that were pretty cool. We also had ideas about mixing UI concepts from price comparison sites as well as classified ads sites.
  • Use real ads, taken from competing sites (which, in Israel, is probably legal since classified ads are considered too utilitarian to be protected by copyright law).

The Open Website API

Building several clients to the service required extreme separation of responsibilities. In most existing frameworks (it’s slightly better in Ruby on Rails than in Django), are built on three layers: the data model, the controller or view and the template.

The data model defines what data objects are used in the system. We’ve built a simple generic model for a classified ad, and a set of meta-models that define the properties expected for each type of item.

The controller or view is responsible for fetching the data required for this action. For example, in order to create a new ad, I need the list of expected properties for this type of item (a car). The controller is responsible of all the heavy lifting (such as fetching data, validating input correctness, etc.). We created the controllers for our main actions, quite simply.

The template defines the way the page layout. It is responsible for rendering the way the website looks like (the HTML). This is what we found most cumbersome. First, each template is tied to a URL. This meant that we couldn’t have two HTML clients without splitting a lot of code. Then, the mobile client needed an API, which required a third branch of controllers and templates to render the objects for the iPhone.

This just wouldn’t do. Instead, we decided to create the API once, and use it for everything else. That means that on the server side we only had to develop the API, and not even a single HTML page. Then, we could have written as many clients as we wanted, each with a completely different flow, look and feel. The web clients were just html with some ajax, that didn’t have to be on the server. The iPhone app was just as simple to develop. No code was duplicated. I was surprised at how fast we have iterated ideas.

What’s it good for:

  • Complete API – by default, there’s an API for everything. There isn’t anything on the website that can’t be easily implemented on any other client. It makes the website very open for developers, without requiring any special treatment.
  • Complete separation of functionality from design – the server side was responsible for authentication, data validation and simple access paths to the data. The client is responsible for the flow, and user interface. Each client can be completely different, tailored for the device it is used on (think: web vs. mobile). There were hardly any limitations on what a client could do, because the API was so basic.
  • DRY code – write once, use everywhere
  • Third party friendly – having an API is important for another reason. Think of the Twitter API and how it helped create the Twitter ecosystem (something they are fighting today). For an open source website, this is an only an advantage. You want as many people plugging in and creating something new on top of it. Over time, the best ideas will merge, and community will benefit from the competition, while not wasting resources duplicating the data layer.

Cons:

  • Slower loading times – pre-rendered HTML will always be faster to load. Do people care that gmail takes a few seconds to load, every time you open it? Not really, because it’s so useful. And with smart client side caching and some clever ajax pre-loading, you can cut this time down significantly.
  • Command & Control issues – does the project include a client? If so, which one? How do you chose which client is “official”, or best? If not, does it mean you need two packages to install the website? How do you manage a list of clients? Where’s the data? Is it free and portable?
  • Security and tampering – when you have a very open API, you are vulnerable. There’s a fine line between being open, and being so open that you endanger the data integrity.

Rapid Development, Distributed Development

There are two special cases where this method can show itself to be especially useful.

The Lean Startup:

A startup in its most early stages is an organization trying to build an unknown solution for an unknown problem. It is a team of people, trying to find both a business problem and a solution to such a problem. This is called the product/market fit. The lean startup mentality dictates that the early stages of the startup should focus on learning.

Having an open website API is a great way to learn. First, it’s a great way for A/B testing and iteration of ideas. Second, there’s a real chance for serendipity. Your users will create clients for themselves, thus telling you what they need, and why they love your service.

The Open Source Website:

We’ve all heard about open source code projects, but an open source website is a much rarer creature. I’ve talked a bit about the reasons why it’s so difficult last year. The open website API liberates the project in several ways. The biggest pain point of the open source API is the data. A website without data is useless. By having the data in one central place, there’s great opportunity for innovation on the client side, having several open source clients developed with ease. In any other way, you wouldn’t be able to fork the website’s look and feel without copying all the data as well (think: Wikipedia).

I’d love to hear what you think. What other pros and cons are there? Would you want to see more open source websites?

Suit Up!

Getting legal

Legal advice is good for you. It’s even better for your lawyer, so you really need to know what you are looking for, if you don’t want to pay too much for too little. That’s why I’m going to dispense what little advice I can give about getting legal and your new startup company.

Do it professionally

Software companies should hire lawyers for QA positions. Honestly, if I could afford it, I’d have a lawyer write all my unit tests. They are experts at finding and defining all the little one-of-a-kind cases that you just wouldn’t bother with normally. One of the founders wants to work part-time? Gee, I wish we’d thought of that before we started working! Good thing your lawyer forced you to discuss these things when you were going over the Founder’s Agreement.

Do it early

The Founder’s Agreement is the first legal paper you need for your company. The Founder’s Agreement covers the basics of the relationship between the founders. It’s a good idea to discuss these things over, and a written agreement is really a good way to get these things settled. Dharmesh Shah has a great list of things that should be covered in this agreement:

  1. How should we divide the  shares?
  2. How will decisions get made?
  3. What happens if one of us leaves the company?
  4. Can any of us be fired?  By whom?  For what reasons?
  5. What are our personal goals for the startup?
  6. Will this be the primary activity for each of us?
  7. What part of our plan are we each unwilling to change?
  8. What contractual terms will each of us sign with the company?
  9. Will any of us be investing cash in the company?  If so, how is this treated?
  10. What will we pay ourselves?  Who gets to change this in the future?

Also, your lawyer can help you incorporate your business. I personally believe in binding contracts between founders at an early stage. This create a kind of commitment that just isn’t possible in any other way. By creating your shared company, you now own a part of something that isn’t just talk and ideas. Best of all, it is a vessel for money. Having a company is the best way to legally get money from customers and it’s also a prerequisite for any funding round (though, truthfully, when you get funded you have plenty of time to deal with legalities).  However, never underestimate the power of paying customers. They are the blood and life of your company. If you can’t get people to pay you, chances are you are building something nobody wants.

If you go to a lawyer who has worked with a few startups, he’ll probably have some templates ready. After all, if I can be totally frank, you guys basically want the same things as the guys he saw yesterday. And it’s a good thing, too, because that means you can get it for free.

Do it for free

Free is the new business model these days, so why can’t your lawyer cut you some slack? Well, apparently, he can. Shop around. Don’t ever pay in advance for legal advice, and don’t give away stock options up front.

Here’s the deal we’ve got, so keep in mind that it’s possible:

The law firm we work with deals with many startups, but they are big and they can afford dealing with us without taking any money up front. They allow us to run a tab for the hours we required for setting up the company. Since the papers are based on templates they’ve had ready, this was no big deal for them (or for us). If and when we manage to raise our round of seed money, they’ll do the legals for the funding agreement and out of that funding round we will pay our tab. We’ve already put down the theoretical money in our theoretical financial sheets.

Also, there’s the possibility of paying them some of the sum in stock options. Options are great, because they align the interests for your lawyer and you, however not everybody is comfortable with giving away equity. They let us chose, at a later time, what is best for our company. And they have to earn our trust before we commit.

Founding a company (here, in Israel, at least) is easy and fairly cheap. I recommend you get your papers straight before you regret it.

Suit Up!

It’s going to be legendary.