I was looking into a solution for both log entries and also some simple file hosting solution which would not require logging into the server (http interface). Long ago I looked into distributed DB's, and one of the ones I read about was CouchDB. This is not a "truly" distributed db, but it does have automatic sharding, which I'm still not really sure what that means (jk) but it's apparently important. The one thing that made this DB stand out was it's HTTP API. Meaning, you can run a web site with only the DB - no Tomcat/Nginx/whatever required. This appealed to me, as a concept of simplicity, if nothing else. I assumed that combining your server with your db would be faster than having them separate. So I looked into CouchDB in depth, and this is what I found:
Objects are JSON docs or attachments to those docs (attachments can be anything, generally an image file).
Objects can be served directly as-is, or through a "view" or a "list or a "show".
Direct document acces returns JSON objects of the format {"id": . . . ,"rev": . . . ,"doc": ..the actual doc..}. No flexibility if you don't like that.
"Views" are typically results. In other words, in a sql db, it would be the result set of a query. A view is cached and updated as needed, automatically.
"Shows" are ways to reference single documents and format the response in a custom manner.
"List" is a way to custom format, combine docs, apply templates, etc - basically a "Show" for a "view" instead of a single doc.
Documents can have "binary" files - images, raw JS, etc. This would have to be used for "fast" script serveing where the script is run-able in a browser.
After some tests, it appears that:
"Shows" are semi-fast. Views are faster since I think they are served direct from the cache.
"Lists" are not too fast, they cache the Views which make the list, but they have to JSON.parse the docs, and then stringify them again for EVERY query. This takes time - not disk access time, but process time, I believe. But because this is actually three steps: get the view, get the JSON object, process the objects, there are three places to bottleneck and apparently they DO bottleneck. Response times vary by an order of magnitude.
"Views" are very fast. It appears it's all cached and ready to serve. Too slow to be mem-cached, but I think that's because I use "include_docs" which makes the DB do lookups for each included doc.
Didn't test binary serve, but it's fast too.
1000 queries, using 5 connections (in the time in parentheses, the connections was upped to 20)
- "show" = 1300ms (1180ms)
- "list" = 23 seconds. Fastest single request: 14ms, Slowest single request: 220ms (same)
- "view" = 450ms (430ms)
- direct access = 450ms (430ms)
- direct acces as attachment (binary) = 380ms (same)
- TOMCAT simple file serve = 450ms (190ms)
- ICON SERVER from memory = 200-280ms (same)
Number of connections are sort of a grey area for me - I'm not sure of the technical ability of my OS or Tomcat or CouchDB to actually process or open N number of connections. So I assume by the numbers, which in Tomcat say that opening 2 connections is no slower than opening 20, that I don't have the whole picture. Or maybe I do - Tomcat FILE serving was greatly increased with more connections - so maybe the Icon server Tomcat instance is just already at a bottleneck somewhere, so adding more connections just doesn't do anything. Opening more connections showed negligible improvement in CouchDB, so obviously either there is bottleneck elsewhere or there's just something completely unknown to me going on. However, opening fewer connections to CouchDB got vastly increased times. What this means is that given the strange result of the ICON SERVER response times being the same for 2 connections as for 20, that result will have to be thrown out. Since that is the ONLY result that is meaningful to me, it's a big hit to the value of this test.
But, in my tests of the live icon servers, I find that serving simple files to ALWAYS be slower than serving from memory, so we can use the simple file serve time to represent (an upper boundary of) the real icon server time. Any way you swing it, the icon server will be twice as fast as a CouchDB. This is to be expected though, as no normal DB will be as fast as a information cached in memory. However, there is vast differences for best case vs worse case.
So, since local tests aren't going to work for me, lets try elsewhere.
Tests QA Amazon Server: getting 12kb for the Couch DB, 9kb for the I.S.
(Icon Server from memory cache)
100 queries, 1 connection:
1000 queries, 10 connections
1000 queries, 30 connections
These results reflect more definitively what the tests on the local machine say - at a high number of connections, the performance of the CouchDB decreases until such a point that increasing connections reduces performance. The Icon Server is getting a smaller file size, 25% smaller than the CouchDB, so the numbers are a little off, but we can see the overall trend. Adding a second or two on the IS times doesn't really change the result.
Summary
This is not a high performance DB. This is a feature specific DB. Would be great for repos or small projects.
These results are actually very positive for CouchDB in certain circumstances. It's half the speed of the current Icon Servers - but the icon servers are speed demons and half the speed of them is actually very good, for a non-memory based DB. CouchDB clusters work like GIT repos - there is no master or slave. They work together to update each other and guarantee eventually consistency. This means that ANY one DB can be updated and the update will spread across the cluster automatically, and every DB is an exact duplicate of the others. If one fails, you loose nothing. If all but one fails, you still loose nothing.
The BEST thing about CouchDB is it's independence from a server. Via it's HTTP API. Development time on Couch DB, **if it suffices for the purpose**, can be DRASTICALLY reduced as compared to the normal JAVA EE stack.
What CouchDB is great for:
Dynamic content. Transactions, or dynamically bound content. Low-medium bandwidth. Version control - it would great for a GIT repo type service, CHAT, transcripts, documents, cloud drive.
What CouchDB would be bad for:
Logging DB. High-bandwidth. Static content.
I'd recommend using this DB for independent (standalone) projects, proof of concept or "testing the waters" projects, or in the cases of the above paragraph "great for" section. In particular to my reason for testing this, I wanted a system for dynamically binding content when it's served. Specifically, I wanted to apply a system of imports and modularization of Javascript files. What I found is that the only way to do this (practically) would be to serve a generic bootloader JS from the Icon Server and then have it make secondary requests to this DB, OR apply a secondary system to the Couch DB server itself - I would have to dedicate a DB connection to a listener which, when updates are applied to key files (modules), would update the resultant files (the served JS). This is exactly what I was attempting to avoid by using CouchDB - extra effort using an extra system.