Performant and simple key-value storage in Posgresql. With a Hash-like
interface. Only dependency is pg gem.
PostgresKeyValue tries to get out of your way, by being unopiniated small, and
simple.
It works similar, but not compatible to, Hash. Some features from Hash are implemented, others
deliberately omitted when they don't make sense or would make leaky abstractions.
PostgresKeyValue depends on the pg gem, but
doesn't add this as requirement, so that you can provide your own, your
version, fork or compatible gem instead.
Configuration and usage is done through dependency injection, which makes it
easy for you to test, and to replace with mocks. The design aims to decouple as
much as possible, allowing to integrate in the right place (and only there).
A few tools are included to prepare and optimize the database. Usable in e.g.
your migrations or a deploy script.
PostgresKeyValue is not finished!
Work in Progress. Here are some evident TODOs (will be moved into github issues later)
Fix glaring SQL injection holes. Use prepared statement or params to ensure clean input.
Determine locking and transactional behaviour: who wins on a conflict?
Add proper index to key. Introduce some benchmark tests.
Allow read-only setup so that e.g. workers can read but never write.
Allow "connection" to be passed in from ActiveRecord (and sequel?) so that users can re-use it.
Add tools to use in migrations or deploy scripts to setup database like we do in tests.
Add key?() api to check if a key exists.
Add fetch() api to provide a default and/or raise exception similar to ENV and hash.
Add a default to initializer for the entire store. Maybe with a block, to mimic Hash.new signature?
Add sanitizers and protection for the JSON de- serializers e.g. storage size or formats.
Allow JSON de- serializers to be dependency-injected instead of using JSON.parse and x.to_json.
Use prepared statement or params to improve performance.
Installation
Add this line to your application's Gemfile:
gem'postgres_key_value'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install postgres_key_value
We don't install pg gem for you as dependency, so ensure you add it yourself.
For example:
gem'pg'
Usage
Steps are as follows:
Make a connection to a postgresql database.
Instantiate a PostgresKeyValue object by passing in this connection.
Write to-, and read from this database.
require'pg'require'postgres_key_value'connection=PG::Connection.open(:dbname=>'test')greetings=PostgresKeyValue::Store.new(connection)greetings[:en]="Hello World"greetings[:nl]="Hallo Wereld"greetings[:en]#=> Hello Worldgreetings['DE-de']#=> nilgreetings.fetch('DE-de','No greeting')#=> No greetinggreetings.key?(:nl)#=> true# Can be another process on another machine entirely.Thread.newdoother_greetings=PostgresKeyValue::Store.new(connection)other_greetings[:en]="Hello Mars!"end.joingreetings[:en]#=> Hello Mars!
Utils
Utils to create and prepare the table are provided. For example in your migrations:
Database is configured to store key/value in two columns: key is primary key,
value of type json. Primary is of type string, so PG limitation on keys and string
storage apply.
PostgresKeyValue deliberately tries not to be fully compatible with Hash. But it
does offer a similar interface. Mainly because such an opaque abstraction is leaky:
A table with KV storages can, by design, grow very large, whereas a hash is
memory bound. so features like iterators store.each {|k,v| ... } or
store.to_a require the underlying limits to leak through. We'd then need
logic, config, etc to handle when the database becomes too big for memory to hold.
We allow keys only to be strings, and not "anything" as hash does. The database
stores keys as strings, so if we'd allow "anything" as key, the marshalling or
serializing would not only become complex, it puts a performance hit on all
usage: so the ones using it with strings as keys would become slower too.
The values are serialized using JSON. This is lossy. This is by-design, but for
security reasons. Marshalling code object.marshall retains the entire state,
including methods, or callbacks and allows the provider of data to even
monkeypatch your ruby codebase. We chose for JSON, as that is simplest, and
therefore secured from these attacks (unless JSON.parse is vulnarable, which is
not unthinkable).
Many methods on Hash don't make a lot of sense either. E.g. most methods that
operate on the entire hash, like transform_keys! or compact have little use
in a pure KV lookup system. When in need of such operations, you probably need
an actual database-table (which, not by coincidence, the connection already offers!)
Another reason for not wanting to have feature-parity with Hash, is that it
would grow this gem far beyond "simple", without there being a clear need for
all the added features. Hash is really large! Rather, if there are features you need, raise an issue
(or write a patch) so we can determine if it fits the scope and is worth the
extra code.
The keys are indexed, this is default for Postgresql for primary keys, using
btree. This causes some slowdown on-insert but fast reads.
In cases where you have a small dataset. and many writes but fewer reads,
this causes notable performance penalties. Removing the index may help. But
only for small datasets, as the upsert-behaviour causes every write to perform
a lookup anyway: on large datasets, writing will become slow without this index.
Alternatives
Ruby core has Hash, and DBM. Redis is another alternative that may be better
suited too.
PostgresKeyValue is most helpful when:
You are using Postgres for other storage already and want to avoid extra services (=complexity, cost, sysadmin)
You have your service spread out over multiple servers OR
You cannmot write files on disk
Your KV database is too big to fit in memory
An in-memory database is -by far- the fastest and easiest. Just a Hash.new
and be done with it. Nothing beats this! Except when other processes, threads
or servers need to access it. A singelton-pattern with Hash might help,
but that comes with added complexity and potential race-conditions.
Storing the KV on disk is very easy too. Especially since Ruby comes with
DBM in its
stdlib. This is a dedicated key-value database stored in a file (berkely
database). Downside is that the speed of the disk matters (much, slower than
memory!) and that all services wanting to read and write need access to the
disk. Race-conditions and locking issues must be solved too in a setup where
multiple processes use one such database-file.
Redis is another obvious solution.
Depending on the network, this is often faster than the RDM on disk. But Redis
comes with its own swath of issues. Most of them are logical tradeoffs to keep
things fast and simple. But the biggest Downside is that it requires you to
manage (and/or pay for) an extra service, when you quite often have a postgres
at hand already. Other downsides are that redis isn't fully acid-complient. It
can be configured to write backups to disk, but it really isn't a good use-case
to store data that cannot be re-created. It's perfect for things like cache,
temporary tokens, or projections (which can be re-build from primary storage or
other services) but not very solid for data that doesn't live elsewhere, like a
queue with to-be-sent emails, events or commands: if the database crashes,
there's no way to re-send those emails, re-emit the events or re-schedule the
commands: the data is gone forever. Postgres has this solved with its WAL.
We have a benchmark, that compares Hash, RDM, Redis and PostgresKeyValue. You
can run it with RUN_ALTERNATIVES_BENCHMARKS=true b rake benchmark.
It is clear that PostgresKeyValue is slowest. Especially when writing.
This was run with postgresql and redis in a throttled (1 CPU - standard
settings) docker on localhost. So network overhead can be neglegted, but CPU is
a limiting factor. And writing to PostgresKeyValue database requires indexing,
which is CPU-heavy. This is also the reason why real time is so much higher
for postgres: the ruby side is waiting a lot for postgresql server to finish
writing.
(Please note that for this benchmark you need a redis server running and must
have compiled ruby with DBM support.)
Development
After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.
In order to run the comparison benchmark, which compares this gem to Ruby Hash,
YAML/DBM (both stdlib) and Redis, you need a ruby compiled with dbm, which is
the berkely DB installed. For Ubuntu, this means installing libdb-dev (sudo apt install libdb-dev ) and then recompiling ruby (rbenv install --force if
you use rbenv). And you need a redis server running.
Contributing
Bug reports and pull requests are welcome on GitHub at
https://github.com/berkes/postgres_key_value. This project is intended to be a
safe, welcoming space for collaboration, and contributors are expected to
adhere to the code of conduct.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the PostgresKeyValue project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.
berkes/postgres_key_value
PostgresKeyValue
Key-Value storage for Posgresql
Performant and simple key-value storage in Posgresql. With a Hash-like interface. Only dependency is pg gem.
PostgresKeyValue tries to get out of your way, by being unopiniated small, and simple.
It works similar, but not compatible to, Hash. Some features from Hash are implemented, others deliberately omitted when they don't make sense or would make leaky abstractions.
PostgresKeyValue depends on the pg gem, but doesn't add this as requirement, so that you can provide your own, your version, fork or compatible gem instead.
Configuration and usage is done through dependency injection, which makes it easy for you to test, and to replace with mocks. The design aims to decouple as much as possible, allowing to integrate in the right place (and only there).
A few tools are included to prepare and optimize the database. Usable in e.g. your migrations or a deploy script.
PostgresKeyValue is not finished!
Work in Progress. Here are some evident TODOs (will be moved into github issues later)
key?()
api to check if a key exists.fetch()
api to provide a default and/or raise exception similar to ENV and hash.JSON.parse
andx.to_json
.Installation
Add this line to your application's Gemfile:
And then execute:
Or install it yourself as:
We don't install pg gem for you as dependency, so ensure you add it yourself. For example:
Usage
Steps are as follows:
Utils
Utils to create and prepare the table are provided. For example in your migrations:
And in a hypthetical deployment or provisioning tool
Technical details
Keys can only be strings or symbols. So be sure to convert your object to a string explicitely before using.
TODO: write about
Database is configured to store key/value in two columns: key is primary key, value of type json. Primary is of type string, so PG limitation on keys and string storage apply.
PostgresKeyValue deliberately tries not to be fully compatible with Hash. But it does offer a similar interface. Mainly because such an opaque abstraction is leaky:
A table with KV storages can, by design, grow very large, whereas a hash is memory bound. so features like iterators
store.each {|k,v| ... }
orstore.to_a
require the underlying limits to leak through. We'd then need logic, config, etc to handle when the database becomes too big for memory to hold.We allow keys only to be strings, and not "anything" as hash does. The database stores keys as strings, so if we'd allow "anything" as key, the marshalling or serializing would not only become complex, it puts a performance hit on all usage: so the ones using it with strings as keys would become slower too.
The values are serialized using JSON. This is lossy. This is by-design, but for security reasons. Marshalling code
object.marshall
retains the entire state, including methods, or callbacks and allows the provider of data to even monkeypatch your ruby codebase. We chose for JSON, as that is simplest, and therefore secured from these attacks (unless JSON.parse is vulnarable, which is not unthinkable).Many methods on Hash don't make a lot of sense either. E.g. most methods that operate on the entire hash, like
transform_keys!
orcompact
have little use in a pure KV lookup system. When in need of such operations, you probably need an actual database-table (which, not by coincidence, theconnection
already offers!)Another reason for not wanting to have feature-parity with Hash, is that it would grow this gem far beyond "simple", without there being a clear need for all the added features. Hash is really large! Rather, if there are features you need, raise an issue (or write a patch) so we can determine if it fits the scope and is worth the extra code.
The keys are indexed, this is default for Postgresql for primary keys, using btree. This causes some slowdown on-insert but fast reads. In cases where you have a small dataset. and many writes but fewer reads, this causes notable performance penalties. Removing the index may help. But only for small datasets, as the upsert-behaviour causes every write to perform a lookup anyway: on large datasets, writing will become slow without this index.
Alternatives
Ruby core has Hash, and DBM. Redis is another alternative that may be better suited too.
PostgresKeyValue is most helpful when:
An in-memory database is -by far- the fastest and easiest. Just a
Hash.new
and be done with it. Nothing beats this! Except when other processes, threads or servers need to access it. A singelton-pattern with Hash might help, but that comes with added complexity and potential race-conditions.Storing the KV on disk is very easy too. Especially since Ruby comes with
DBM
in its stdlib. This is a dedicated key-value database stored in a file (berkely database). Downside is that the speed of the disk matters (much, slower than memory!) and that all services wanting to read and write need access to the disk. Race-conditions and locking issues must be solved too in a setup where multiple processes use one such database-file.Redis is another obvious solution. Depending on the network, this is often faster than the RDM on disk. But Redis comes with its own swath of issues. Most of them are logical tradeoffs to keep things fast and simple. But the biggest Downside is that it requires you to manage (and/or pay for) an extra service, when you quite often have a postgres at hand already. Other downsides are that redis isn't fully acid-complient. It can be configured to write backups to disk, but it really isn't a good use-case to store data that cannot be re-created. It's perfect for things like cache, temporary tokens, or projections (which can be re-build from primary storage or other services) but not very solid for data that doesn't live elsewhere, like a queue with to-be-sent emails, events or commands: if the database crashes, there's no way to re-send those emails, re-emit the events or re-schedule the commands: the data is gone forever. Postgres has this solved with its WAL.
We have a benchmark, that compares Hash, RDM, Redis and PostgresKeyValue. You can run it with
RUN_ALTERNATIVES_BENCHMARKS=true b rake benchmark
.It is clear that PostgresKeyValue is slowest. Especially when writing.
This was run with postgresql and redis in a throttled (1 CPU - standard settings) docker on localhost. So network overhead can be neglegted, but CPU is a limiting factor. And writing to PostgresKeyValue database requires indexing, which is CPU-heavy. This is also the reason why
real
time is so much higher for postgres: the ruby side is waiting a lot for postgresql server to finish writing.(Please note that for this benchmark you need a redis server running and must have compiled ruby with DBM support.)
Development
After checking out the repo, run
bin/setup
to install dependencies. Then, runrake test
to run the tests. You can also runbin/console
for an interactive prompt that will allow you to experiment.To install this gem onto your local machine, run
bundle exec rake install
. To release a new version, update the version number inversion.rb
, and then runbundle exec rake release
, which will create a git tag for the version, push git commits and the created tag, and push the.gem
file to rubygems.org.In order to run the comparison benchmark, which compares this gem to Ruby Hash, YAML/DBM (both stdlib) and Redis, you need a ruby compiled with
dbm
, which is the berkely DB installed. For Ubuntu, this means installing libdb-dev (sudo apt install libdb-dev
) and then recompiling ruby (rbenv install --force
if you use rbenv). And you need a redis server running.Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/berkes/postgres_key_value. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the PostgresKeyValue project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.