How to implement schema-less entities in Datomic?

Discussion:

Pinku Surana

2017-04-05 17:58:41 UTC

I need a schema-less immutable database for an application. In a nutshell,
I'm building a platform that runs customer scripts. Those scripts can
create arbitrary data structures. So I need to store them such that they
can be queried. For some weird reason, I also need to show them the state
of their app data in the past.

My current idea is to model it as a multi-tenant database similar to salesforce.com's
application database
<http://www.developerforce.com/media/ForcedotcomBookLibrary/Force.com_Multitenancy_WP_101508.pdf>.
I will define 20-50 attributes for each Datomic type (13, I think). When a
customer defines a datatype, I'll create a mapping from his type to an
implementation in Datomic.

type Person = {
fname : string;
lname : string;
age : int;
}

Could be implemented as:

{:record/type "Person" :record/string1 "Bob" :record/string2 "Dylan"
:record/int1 75}

The mappings will be stored and compiled for execution.

Not sure how to handle deeply nested records. Need to use refs in somehow.

I'd really appreciate any different ideas to achieve this goal. I don't
like having such a complicated model.

Thanks.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alan Thompson

2017-04-05 18:17:12 UTC

Permalink

Your outline shows that you are on the right path. You can simplify it
more by leaving off the keyword namespaces and naming the attributes more
like your original record:

{ :type :person :first-name "Bob" :last-name "Dylan" :age 42 } ; a
regular clojure map

The James Bond example <https://github.com/cloojure/tupelo-datomic> from
the Tupelo Datomic library shows some helper functions that make this
easier. Or, you could use native Datomic functions/structures.
Note that the :weapon/type attribute in the Bond example is a "ref" type,
which is how one points to "nested" entries (more like pointers than
nesting, actually).

Enjoy!
Alan

Post by Pinku Surana
I need a schema-less immutable database for an application. In a nutshell,
I'm building a platform that runs customer scripts. Those scripts can
create arbitrary data structures. So I need to store them such that they
can be queried. For some weird reason, I also need to show them the state
of their app data in the past.
My current idea is to model it as a multi-tenant database similar to salesforce.com's
application database
<http://www.developerforce.com/media/ForcedotcomBookLibrary/Force.com_Multitenancy_WP_101508.pdf>.
I will define 20-50 attributes for each Datomic type (13, I think). When a
customer defines a datatype, I'll create a mapping from his type to an
implementation in Datomic.
type Person = {
fname : string;
lname : string;
age : int;
}
{:record/type "Person" :record/string1 "Bob" :record/string2 "Dylan"
:record/int1 75}
The mappings will be stored and compiled for execution.
Not sure how to handle deeply nested records. Need to use refs in somehow.
I'd really appreciate any different ideas to achieve this goal. I don't
like having such a complicated model.
Thanks.
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Francis Avila

2017-04-06 19:11:30 UTC

Permalink

We have successfully done an entity per entry. Something like this:

{:entity/type :entity.type/record
:record/type :record.type/Person
:record/fields #{{:entity/type :entity.type/field
:field/key "fname"
:field.value/type :field.value/string
:field.value/string "Bob"}
{:entity/type :entity.type/field
:field/key "lname"
:field.value/type :field.value/string
:field.value/string "Dylan"}
{:entity/type :entity.type/field
:field/key "age"
:field.value/type :field.value/long
:field.value/long 75}}}

The essential features of this approach are:

1. Each field is a separate entity with a key and value pair (most likely
isComponent=true). This avoids the :record/string1, :record/string2 etc
explosion of attributes and manual enumeration. (Think about what queries
would look like!)

2. There are pre-created generic :field.value/* for every datomic type and
cardinality (e.g. :field.value/string, :field.value/strings,
:field.value/long, :field.value/longs, etc)

3. There is a known, constant attribute that you join through to get the
value, in this example it is :field.value/type, and it's value is one of
the type attributes you created. (i.e. it is type REF, not keyword! It's
value is an attribute entity). This allows you to have a known field
reference an unknown value field. Example query clauses:

[?field :field/key ?field-name]
[?field :field.value/type ?value-attr]
[?field ?value-attr ?field-value]

Some tweaks to this basic approach revolve around how much higher-level
schematization you want to do. For example:

1. :entity.type/record, :entityt.type/field, etc could be entities (enums)
instead of just plain strings/keywords. The entity could have more attrs on
it which describe the attributes present on entities with that type. This
allows the db to describe the type of its own entities.

2. :record.type/Person could also be an actual (user-created) entity (in
which case it will lack an ident, so reference it with [:record/type-id
"Person"] or something). You could use this to store the field names and
types, for e.g.

3. Going further, instead of each field entity having field name and value
type repeated, you could have a type entity for each field. E.g.:

;; Field type
{:entity/type :entity.type/field-type
:field-type/id "Person/fname"
:field-type/value-type :field.type/string
:field-type/optional? false
,,,}

;; Field instance
{:entity/type :entity.type/field
:field/type [:field-type/id "Person/fname"]
:field.type/string "Bob"}

;; You would get the value of a field instance like this:
[?field :field/type ?ftype]
[?ftype :field-type/value-type ?fval-attr]
[?field ?fval-attr ?field-value]

Of course when you change the schema of Person or its fields you need to
validate/migrate/reconcile all existing field types manually. This could be
a feature (keeps db data type-safe), or a misfeature (you really want
dynamic, truly-schemaless key-value field pairs.) Whether this is desirable
or not is what will dictate how far down the application-level metaschema
route you go.

Alan Moore

2017-04-07 08:02:29 UTC

Permalink

Nice summary, thanks!

The data model used by NuBank did something similar but it has been a few
years since I watched the video:

Their approach might be overkill for this application (not a bank...)
Francis' approach is my favorite - my prior designs went way too meta which
made queries harder to write. It might be good to add refs for data access
authorization and/or auditing metadata:

;; Field type
{:entity/type :entity.type/field-type
:field-type/id "Person/fname"
:field-type/predicate :db/fn
:field-type/authorization :field.type/ref
,,,}

I keep thinking that clojure.spec could be used in some capacity like
(spec/keys :req [...] :opt [...]) - it seems you always want some well
known/required fields with an open set of others. (spec/multi-spec?)

Good luck!

Alan

Post by Francis Avila
{:entity/type :entity.type/record
:record/type :record.type/Person
:record/fields #{{:entity/type :entity.type/field
:field/key "fname"
:field.value/type :field.value/string
:field.value/string "Bob"}
{:entity/type :entity.type/field
:field/key "lname"
:field.value/type :field.value/string
:field.value/string "Dylan"}
{:entity/type :entity.type/field
:field/key "age"
:field.value/type :field.value/long
:field.value/long 75}}}
1. Each field is a separate entity with a key and value pair (most likely
isComponent=true). This avoids the :record/string1, :record/string2 etc
explosion of attributes and manual enumeration. (Think about what queries
would look like!)
2. There are pre-created generic :field.value/* for every datomic type and
cardinality (e.g. :field.value/string, :field.value/strings,
:field.value/long, :field.value/longs, etc)
3. There is a known, constant attribute that you join through to get the
value, in this example it is :field.value/type, and it's value is one of
the type attributes you created. (i.e. it is type REF, not keyword! It's
value is an attribute entity). This allows you to have a known field
[?field :field/key ?field-name]
[?field :field.value/type ?value-attr]
[?field ?value-attr ?field-value]
Some tweaks to this basic approach revolve around how much higher-level
1. :entity.type/record, :entityt.type/field, etc could be entities (enums)
instead of just plain strings/keywords. The entity could have more attrs on
it which describe the attributes present on entities with that type. This
allows the db to describe the type of its own entities.
2. :record.type/Person could also be an actual (user-created) entity (in
which case it will lack an ident, so reference it with [:record/type-id
"Person"] or something). You could use this to store the field names and
types, for e.g.
3. Going further, instead of each field entity having field name and value
;; Field type
{:entity/type :entity.type/field-type
:field-type/id "Person/fname"
:field-type/value-type :field.type/string
:field-type/optional? false
,,,}
;; Field instance
{:entity/type :entity.type/field
:field/type [:field-type/id "Person/fname"]
:field.type/string "Bob"}
[?field :field/type ?ftype]
[?ftype :field-type/value-type ?fval-attr]
[?field ?fval-attr ?field-value]
Of course when you change the schema of Person or its fields you need to
validate/migrate/reconcile all existing field types manually. This could be
a feature (keeps db data type-safe), or a misfeature (you really want
dynamic, truly-schemaless key-value field pairs.) Whether this is desirable
or not is what will dictate how far down the application-level metaschema
route you go.

Post by Pinku Surana
I need a schema-less immutable database for an application. In a
nutshell, I'm building a platform that runs customer scripts. Those scripts
can create arbitrary data structures. So I need to store them such that
they can be queried. For some weird reason, I also need to show them the
state of their app data in the past.
My current idea is to model it as a multi-tenant database similar to salesforce.com's
application database
<http://www.developerforce.com/media/ForcedotcomBookLibrary/Force.com_Multitenancy_WP_101508.pdf>.
I will define 20-50 attributes for each Datomic type (13, I think). When a
customer defines a datatype, I'll create a mapping from his type to an
implementation in Datomic.
type Person = {
fname : string;
lname : string;
age : int;
}
{:record/type "Person" :record/string1 "Bob" :record/string2 "Dylan"
:record/int1 75}
The mappings will be stored and compiled for execution.
Not sure how to handle deeply nested records. Need to use refs in somehow.
I'd really appreciate any different ideas to achieve this goal. I don't
like having such a complicated model.
Thanks.