How To Open-Source Data

Open-source has changed the world.

I’ve looked for a good definition of open-source and all I could find were references to open-source software. I like to define open-source in a broader manner. In this definition I refer to “a construct” as a target product of an open-source project. Today, most construct are pieces of software, but they can also be toasters, printers or even cars. Also, note that I will use a GNU-like approach to open-source. Obviously, there can be many flavors to it.

“Open-source” is an approach in design, development and distribution of a construct. Anything can be defined as open-source if it follows the following attributes:

  1. The design of the construct is available with any distribution of the construct. The design can be changed by any one, pending that the new design also be available with any distribution of the new construct.
  2. Development methods are available with the distribution of the construct. The development methods can be changed by any one, pending that the new methods also be available with any distribution of the new construct.
  3. Any one can distribute a construct based on the design and development methods stated above.
  4. Use of patents is prohibited, as patents conflict with the notions of free distribution (free as in free speech, not as in free beer).

The reason why software is the most open-sourced field is because it is easiest to open. Software is nothing more than information and information is easily copied, shared, changed and redeployed. Software developers needed a way to share their software, so they’ve developed it. We are lacking the frameworks to open-source pretty much anything else, because software developers don’t need it and the those who need it can’t build it.

We need open-source websites

A friend of mine started an open-source project to data-mine government data. The group of people attracted to this type of project is extremely diverse. Only a small minority of the people are real developers. Some of them will code a little, if they absolutely have to. Most of them just want to help organize the data, tag it, sort it, describe use cases and influence the development. But they can’t.

Here’s what you need to do to use the software: you need to download it, download and install python, django, MySQL, and a bunch of other open-source prerequisites. Then you need to configure everything to work on you computer. Then you need to run the program once to build the initial data sets. Once you have the data in your database, you can either query it directly or open a web server locally and browse it. Oh, and it only works on Linux, of course.

Without a centralized website where the data resides, changes to the data cannot be shared. Data cannot be tagged and it cannot be sorted. Observations about the data (oh, look, there are more environmental actions taken this year than the previous year. How interesting!) are not shared and are therefore lost.

The closest example I have to an open-source website is Wikipedia. The code is free, the site is online, you can change the data, you can change the code, you can download and distribute copies of the data and the code. Also, the community dictates the development of the site. The prominent users are not the developers.

Do it yourself (here’s a start-up idea for free):

An open-source website, unlike open-source software, is the best way to create a project that focuses on a combination of data and functionality.

  1. The project has a community. The community needs tools to collaborate. They need to discuss the current status of the project and describe its future. Today, the best tools we have are wikis. Open one.
  2. Host the data online. Have a centralized database that anyone can reach, take, query and change. The last part is tricky. How can you allow change while preventing damage to the data? Well, I never said it was easy (but nothing worth while ever is). The most important part is querying the data. The community will share information they found in the data on the wiki, but they’ll also share the queries (and improve them over time).
  3. Build a website. Most likely, this website will be similar to many crowd-sourcing websites out there, and for good reasons. This is where most people will look, and some will touch the data. However, your site will differ from others in one important aspect. It will be open. Developers will be able to add new features (a new button, a new page). Designers will be able to change the look and feel of the site, to make it better. People will add new queries to the data. I can’t think of any process or framework that can make this effortless. Can you?
  4. Open it up. Allow any one to take, change and redeploy the site and the data.

I have found one company who provides a platform for crowd sourcing. As far as I can see, the platform is not open-source, which, in my book, puts them under FAIL. Not that I’m against making a buck here and there, it’s just that you have to eat your own dog food.

Does anyone want to start an open-source  project for building open-source projects?