Terminology mapping
- What you and I (and CouchDB) would call a database, Amazon SimpleDB calls a domain.
- CouchDb documents and SimpleDB items will be referred to in this post as records.
- The JSON name:value pairs used in CouchDb documents and the attribute-value pairs in SimpleDB items will be called simply attributes.
A brief explanation: The developer documentation for SimpleDB states that attributes may have multiple values, but that attributes are uniquely identified in an item by their name/value combination. In the same paragraph, the docs give this as an example of an item’s attributes:
{ 'a', '1' }, { 'b', '2'}, { 'b', '3' }
By Amazon terminology, the ‘b’ attribute has two values. I think it clearer to regard this item as having three attributes, two of which have ‘b’ as their key.
What SimpleDB and CouchDB have in common
- Not relational databases
- Schemaless
- CouchDB is built with Erlang. SimpleDB may be, as well.
- support for data replication (this is a very sloppy generalization)
- accessed via HTTP
How SimpleDB and CouchDB Differ
SimpleDB:
- provides SOAP and (what passes at Amazon for) REST interfaces to the API
- REST requests all use HTTP GET, specifying the API method with a query param
- requests specify the database, record, attributes, and modifiers with query params
- record creation, updating, and deletion is tomic, at the level of individual attributes
- all data is considered to be UTF-8 strings
- automatically indexes data, details unknown
- queries
- limited to 5 seconds running time. Queries that take longer “will likely” return a time-out error.
- defined with HTTP query parameters
- composed of Boolean and set operations with some obvious comparison operators(=, !=, >, >=, etc.)
- as all values are UTF-8 strings, there are no sorting options.
- responses are XML
CouchDB:
- all REST, all the time
- requests use HTTP GET, PUT, POST, and DELETE with their usual RESTful semantics
- requests specify the database and record in the URL, with query params used for modifiers
- record creation, updating, and deletion is atomic
- supports all JSON data types (string, number, object, array, true, false, null)
- indexing is under user control, by means of “views”
- defined with arbitrary Javascript functions
- can be stored as documents
- can be run ad hoc, as “temporary views”
- queries are basically views, with the addition of modifiers (start_key, end_key, count, descending) supplied as HTTP query parameters
- sorting is flexible and arbitrarily complex, as it is based on the JSON keys defined in the views. See here for more information
- responses are JSON

24 comments:
Another interesting "feature" of SimpleDB is Eventual Consistency: http://www.satine.org/archives/2007/12/13/amazon-simpledb/
CouchDB's equivalent to that seems to be adding a query param of update=false.
"The update option can be used for higher performance at the cost of possibly not seeing the all latest data. If you set the update option to “false”, CouchDB will not perform any refreshing on the view that may be necessary."
from http://www.couchdbwiki.com/index.php?title=HTTP_View_API
Good article
You should put the differences side by side in a table, so that it's more clear.
The main difference is that CouchDB you have to administer yourself, SimpleDB is a utility computing service. And isn't CouchDB built on top of DETS, with it's known size limitations ? But SimpleDB also has size limitation per user and per "domain" during beta phase.
what about dabbledb? This seems like an interesting db on par with these other two options
Ummm.. perhaps there are more things that differ:
1. Amazon has a widely used, web-scale storage & computing architecture
2. Amazon gets rid of the need for custom hardware and maintenance. It turns the hardware + software scalability problem into just software.
3. Amazon has some massive customers using the system, so you know its going to do the job
4. Couch has to be installed on your own hardware
5. Like "storage" systems before S3, simpledb will take the front spot in no time flat.
Long live amazon, king of the web
Heya Matthew,
thanks for the nice writeup!
Cheers,
Jan
--
anonymous 1: tabular format would be better. I started out doing that, but the blogspot interface seemed to be ignoring the table, and I didn't feel like troubleshooting.
roberto saccon: I don't think it's fair to say the main difference is that you have to manage CouchDB, whereas SimpleDB is run by Amazon. That may be an important difference for the problem domains where either CouchDB or SimpleDB are good solutions. In such cases, the process cost for getting started with SimpleDB is close to nil.
bark madly: DabbleDB is hot, but it belongs to a different phylum of datastore. I had remembered it as being an extra-nifty interface to a relational database. That is incorrect, according to http://news.squeak.org/2006/10/31/ocean-waves-the-applications-built-on-seaside/
The magic of DabbleDB is largely the user interface, and I mean that as a complement, not a slur. SimpleDB and CouchDB are fundamentally concerned with simplifying and streamlining data storage and retrieval.
anonymous 2: I completely neglected to cover any of the differences you mention. That was partly the Purloined Letter effect; they're so obvious that it didn't occur to me to write about it. The neglect may also be attributed to my motive for writing the post, which was that I wanted to highlight the different ways that SimpleDB and CouchDB attempt to solve similar problems. This post focused more on the API differences than on anything else, to the complete exclusion of the deployment differences.
anonymous said:
"3. Amazon has some massive customers using the system, so you know its going to do the job"
Can you name one? Just curious, because I think your idea of 'massive' probably differs from mine.
I know of some folks using as many as 500 EC2 instances and hundreds of TB of storage. Smugmug said 300TB in April, maybe they are at 500TB now. They are probably the top S3 customer. Assuming a power law distribution, there probably aren't many other customers using >100TB on S3.
For me 'massive' is >5,000 machines, >500 TB. By that count Amazon has one 'massive' customer.
What about size of data? In amazon it is 1024 bytes. in Couch DB?
For CouchDB documents, "Elements can be of varying types (text, number, date, time), and there is no set limit to text size or element count."
CouchDB also accepts attachments, so you can store files along with the documents (a.k.a. records). When using SimpleDB, you would store files in S3 and refer to them via URL in a SimpleDB item attribute.
Does anyone know how one would do "joins" using simpledb?
Here's how you do it using couchdb:
http://www.cmlenz.net/blog/2007/10/couchdb-joins.html
That CouchDB pattern is not so much a join as it is an effective use of the arbitrary JSON keys. The view prepares a list of records sorted so that all of the comments follow the blog post they belong to. Then the "slicing" parameters, start_key and end_key, narrow down the results to one specific post plus its comments.
The result is effectively the same as a join across posts and comments tables, but the path taken to get there is quite different.
I don't see any features in the published docs for SimpleDB that would allow for similar patterns of use. It is clearly intended for simple uses, and that does not include relational queries such as joins.
Roberto Saccon:
Damien Katz addressed the DETS issue on the mailing list today:
CouchDB doesn't use DETS, it has it's own storage engine and that
storage engine has no storage limit.
The original issue was that the file IO driver in Erlang didn't
support large files, though either it was fixed or I may have been
mistaken.
Is it true that CouchDB doesn't support passing of parameters to the views effectively making it an extremely difficult to use in real world? (Imagine having to define a view for each value of the parameter you pass to WHERE some_column = some_value) because you can't pass it as a view param ? :)
Karol, have you stopped beating your wife yet?
Also, was the baptism of John from heaven or from man?
Check out sdb manager at www.sdbmanager.com it's a neat way to manage your simpledb databases.
It seems that relational databases are really taking a hit with the increasing popularity of cloud computing.
Another non-relational database to take a look at is MongoDB from 10gen. It is open source, and integrated into 10gen's cloud platform.
Matthew, I invite you to check it out (SDK is at www.10gen.com). I would really like to hear your thoughts and impressions about what's right and wrong with it.
Shane Brauner, thanks for the info and link. Happy to see Mongo is on GitHub.
M/DB (http://www.mgateway.com/mdb.html), a SimpleDB clone, provides an interesting local or other-cloud instance of SimpleDB
SimpleDB *is* written in Erlang.
Something I have been wondering about these new distributed, schema-less databases which have begun to surface all over the place: is there a generic API yet?
It would suck to write your whole application to one, and then find out you need to switch to another one, and have to rewrite a whole bunch of your code.
This happens all the time with SQL databases, but because we have JDBC, ODBC, DBI and so forth, it isn't such a big issue.
Karol, you can pass key parameter to the view (or range of keys (or range of corresponding document ids)).
See http://wiki.apache.org/couchdb/HTTP_view_API
Also I lost 10 minutes or so, trying to apply code from this post to query my current CouchDB v0.10. Specifically, parameters are called "startkey"/"endkey" and not "start_key"/"end_key".
FYI - SimpleDB now supports sorting/ordering.
barry discussion latour erisa retail crossed mindful shubhi djuh null sectorssome
lolikneri havaqatsu
Post a Comment