Have you ever lost critical data due to overwrite? Such a silly question… of course you have. This is the nasty side effect of the U in CRUD. It is not even a design flaw since an update operation is meant to replace you existing data. Hence, it is a very valid common use case.
Most common attempts to mitigate said problem might be limiting. A naive but brave suggestion suggests not to allow access to the DB.
Problem: Prevent accidental updates/deletes on any table
Easy solution: DON'T ALLOW DEVS ACCESS TO PROD!!! We are evil breakers of all things production.
source: https://stackoverflow.com/questions/14438055/prevent-accidental-updates-deletes-on-any-table
You read it right, it is easy peasy, just don't do it!
. It is not a super bad advice. Infact, you probably received such advice as well. It might be useful for some, very limited, use cases. Anyways, let our hero, MGOwen
speak for all of us: If "just don't ever make mistakes" was a valid argument, we wouldn't need seatbelts and firearm safety switches
.
What if preventing update is not an option? Let’s think even more extreme and embrace updating sensitive data everywhere including in production. Imagine that your system requires applications to perform an update but still need to keep track of every goddamn change ever happened to data.
Welcome to my world, ladies, gentlemen and other gender members. I spent most of my career dealing with fintechs and finance companies in general. There is one capability you must have as a financial service provider. Every action must be logged alongside with corresponding data. Otherwise, you are liable for any financial crime occuring at your platform. This is the way of fintech life, you have to suck it up.
“Versioning data? So what? It is not rocket science”, I hear you say and you are right. I mean, let me be honest, I have found many different ways to mitigate this problem in the past. There are myriad of ways to implement a platform with a versioned data model. That is not the point. The cost (both time and money) that such implementations often come attached with give start-ups a big smack in the face especially when they have to be agile. You see your friends enjoying building shiny new APIs backed by simple CRUD systems and start their start-up journey months ahead while you fight with tons of quirky data models to offer a brand new finance app to the world while regulators are stalking you.
Hasura is a perfect tool to add API on top of such data models with ease.
I will be way quicker with this one. I want to, very quickly, show you that what I am capable of doing with this bad boy. You can do way more than what I promised in the begining of the post but I’m keeping it simple. We are going to build a GraphQL API that:
Run these commands and go to http://localhost:8080 in your browser.
$ wget https://raw.githubusercontent.com/hasura/graphql-engine/stable/install-manifests/docker-compose/docker-compose.yaml
$ docker-compose up -d
Have questions? Not my problem. I’m here to tell my story. For Installation related questions follow this link
Here are features of this table that is crucial to me(regardless of actual content):
versioned
Example table has id
(uuid) and v_id
(int) as identifiers. v_id
specificly holds version indentifier within objects with the same id
. id
is generated by default using gen_random_uuid()
and v_id
has auto increment nextval('schema."<table><column>_seq"'::regclass)
created_at
(timestampz) field that has now()
(current time) as default value.deleted
field is added to mark record deletion.A data
(jsonb) field is added to represent the versioned content. There is no limit(except DB column limit) for how many columns you can add to the table. We don’t care. All of these fields will be versioned.
Here is a gif for you. Hasura Console is enough to create this table without any direct access to our beloved DB.
Final table has following DDL:
CREATE TABLE IF NOT EXISTS versioned.example
(
id UUID DEFAULT gen_random_uuid() NOT NULL,
v_id SERIAL NOT NULL
CONSTRAINT example_pkey
PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL ,
data JSONB,
deleted BOOLEAN
);
COMMENT ON TABLE versioned.example IS 'an example table';
This part is the most tricky part of all. We need to create a SQL View
to represent the most current version of the example record sharing the same id
value. Additionally, we need to access the created_at
value of our very first version of a example record with the same id
. If the most recent record has deleted
with true
as value, then we need to hide it so it doesnt appear in this layer.
Bear with me and take a look at this ugly query.
CREATE OR REPLACE VIEW public.example as
WITH last_version AS (
SELECT
v_e.v_id,
v_e.id,
v_e.created_at AS updated_at,
v_e.data
FROM versioned.example v_e
LEFT JOIN versioned.example v_e2 ON v_e.v_id < v_e2.v_id AND v_e.id = v_e2.id
WHERE v_e2.v_id IS NULL AND v_e.deleted IS NULL
), first_version AS (
SELECT
v_e.id,
v_e.created_at
FROM versioned.example v_e
LEFT JOIN versioned.example v_e2 ON v_e.v_id > v_e2.v_id AND v_e.id = v_e2.id
WHERE v_e2.v_id IS NULL
)
SELECT lv.id,
lv.v_id,
lv.updated_at,
lv.data,
fv.created_at
FROM last_version lv
LEFT JOIN first_version fv ON fv.id = lv.id;
COMMENT ON VIEW example is 'an example view:)';
Let’s break this down:
last_version
and first_version
hold first and last version of a record with same id
...ON v_e.v_id < v_e2.v_id AND v_e.id = v_e2.id
we perform a self join with records with same id
and greaterv_id
(or smaller when v_e._id > v_e2.v_id
). If particular record fails to join itself with given conditions, it implies that it is the latest record (or first one when ` v_e.v_id > v_e2.v_id). Finally, we only select that edge record (first or last) with
WHERE v_e2.v_id IS NULL`v_e.deleted IS NULL
eliminates the record with given id
, if the last version is marked as deletedlast_version
’s created_at
value is actually the last update date for a record. So we alias this part using v_e.created_at AS updated_at,
SELECT lv._id,
lv.id,
lv.updated_at,
lv.data,
fv.created_at
FROM last_version lv
LEFT JOIN first_version fv ON fv.id = lv.id;
This is one of many ways of retrieving first/last value within group
. In real life I use more eloborate but more performant ways of doing same job. I just wanted to showcase what I meant with a simple example. Check out these links to get better idea about these patterns(1, 2, 3). Internet is full of them.
Don’t forget to add it to Hasura:
As I promised, our approach will let you see all historical versions along side with current version.
We will connect versioned.example
table with public.example
view on id
column. As each example record potentially has many versions, we need one to many, aka array, relationship. Hasura Relationships comes to our resque:
Now we are able to reference each version of our record alongside with most recent one.
No more steps. We are done and let’s enjoy our new API. Here is a glimpse of joyful use of real-time versioned API we have just created without wasted sprints by a team of engineers. I’m going to add/update/delete/undelete an example record using Hasura GraphiQL UI. In the meantime, all of my actions will be tracked in real-time using a GraphQL subscription query.
Let’s have a look at our query:
// $id -> id of the record we are watching
subscription watchExample($id: uuid) {
// we place our variable to filter example records in true SQL fashion
example(where: {id: {_eq: $id}}) {
id
v_id
updated_at
created_at
data
// Hasura Let's you run aggregate queries using GraphQL.
// Here we are aggregating number of versions per each example record!
versions_aggregate {
aggregate {
count(columns: v_id)
}
}
// We are also getting versions of each record.
versions {
v_id
created_at
deleted
data
}
}
}
insert_versioned_example
mutation automatically created by Hasura. Note that, our schema naming convention comes handy here.mutation addExample($data: jsonb) {
insert_versioned_example(objects: {data: $data}) {
returning {
id
v_id
created_at
data
deleted
}
}
}
Note that as, initially, we have no example recods, our subscription query has no filter value. After first insert, we update our subscription query to watch one particular example record.
id
. But thanks to the way our example view is setup, we only will see last record.mutation updateExample($id:uuid,$data: jsonb) {
insert_versioned_example(objects: {id:$id, data: $data}) {
returning {
id
v_id
created_at
data
deleted
}
}
}
id
and deleted
(with value set to true
).mutation deleteExample($id:uuid) {
insert_versioned_example(objects: {id:$id, deleted: true}) {
returning {
id
v_id
created_at
data
deleted
}
}
}
delete
column is not set to true
for last version of a particular example record, we will see it back with all recious versions retained!mutation updateExample($id:uuid,$data: jsonb) {
insert_versioned_example(objects: {id:$id, data: $data}) {
returning {
id
v_id
created_at
data
deleted
}
}
}
Pretty neat isn’t it?
Consider Hasura as a SQL generator engine for your GraphQL API. It is a very powerful single layer for your application. Investigating generated SQL queries for given operation is mostly enough to understand what is happening under the hood. This transparency is very rare to find for a product that provides such capabilities. However, for high stake tasks, such as finance, utilizing a very trivial use of actions to control and enhance your API layer can be crucial.
With this approach, you are only inserting new records to your DB.
We must take away any UPDATE
, DELETE
, TRUNCATE
privilages from the operating DB user.
In addition to that, Hasura must limit the access to corresponding mutations using various techniques such as Whitelisting,
Authorization & Access Control.
I haven’t touched such aspects within the scope of this tutorial.
However, they are very powerful and useful abstraction mechanisms.
db_schema_dump.sql
and load the metadata using hasura_metadata.json