Exposing datomic.api/q via an API

Discussion:

Max Weber

2016-05-09 10:14:52 UTC

Hi,

I like to expose datomic.api/q via an API. I'm trying to find out the steps
to make this approach "secure".

Let's assume that the database only contains public data or is
appropriately filtered by datomic.api/filter.

The most obvious security flaw of this approach is that a Datalog query is
allowed to execute arbitrary Clojure functions and Java methods:

(datomic.api/q
'[:find ?r .
:where [(System/getenv) ?r]]
nil)

However "safe" functions like clojure.string/starts-with? are very useful
for Datomic queries and should be allowed. A first action to make the
approach more "secure" would be to define a whitelist of all allowed
functions and check fn-expr and pred-expr against this whitelist.

Besides the whitelist I would prefer to write some kind of parser for
Datomic's query grammar (http://docs.datomic.com/query.html#sec-4) and only
allow the elements that are necessary for the corresponding API clients.
This would be quite a lot of effort, therefore I'm looking for alternatives.

Another issue is to constrain the query execution time. But I assume the
:timeout functionality of datomic.api/query should do the job.

Furthermore the arguments for datomic.api/q are quite flexible. The query
can be a map, list, or string. Here it would be necessary to make sure that
a query as string is blocked. A query string is harder to check and it
could yield other security flaws, since maybe Datomic read it with
clojure.core/read-string.

I guess that I overlook some other security flaws? What would be your
recommendations to make the described approach feasible?

Best regards

Max

P.S. I'm aware of other developments in this area like Relay / GraphQL,
Falcor or Datomic DataScript synchronization
(https://github.com/metasoarous/datsync). My motivation is just to expose
the full power of Datomic's datalog to my ClojureScript app and to avoid
wrapping each query requirement in some clumsy API endpoint.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

a***@mail.yu.edu

2016-05-09 16:13:02 UTC

Permalink

This can be done with Datomic's REST API.

http://docs.datomic.com/rest.html

Sorry if you have already seen this, but you didn't mention it so I thought
I would point you to it.

Post by Max Weber
Hi,
I like to expose datomic.api/q via an API. I'm trying to find out the
steps to make this approach "secure".
Let's assume that the database only contains public data or is
appropriately filtered by datomic.api/filter.
The most obvious security flaw of this approach is that a Datalog query is
(datomic.api/q
'[:find ?r .
:where [(System/getenv) ?r]]
nil)
However "safe" functions like clojure.string/starts-with? are very useful
for Datomic queries and should be allowed. A first action to make the
approach more "secure" would be to define a whitelist of all allowed
functions and check fn-expr and pred-expr against this whitelist.
Besides the whitelist I would prefer to write some kind of parser for
Datomic's query grammar (http://docs.datomic.com/query.html#sec-4) and
only allow the elements that are necessary for the corresponding API
clients. This would be quite a lot of effort, therefore I'm looking for
alternatives.
Another issue is to constrain the query execution time. But I assume the
:timeout functionality of datomic.api/query should do the job.
Furthermore the arguments for datomic.api/q are quite flexible. The query
can be a map, list, or string. Here it would be necessary to make sure that
a query as string is blocked. A query string is harder to check and it
could yield other security flaws, since maybe Datomic read it with
clojure.core/read-string.
I guess that I overlook some other security flaws? What would be your
recommendations to make the described approach feasible?
Best regards
Max
P.S. I'm aware of other developments in this area like Relay / GraphQL,
Falcor or Datomic DataScript synchronization (
https://github.com/metasoarous/datsync). My motivation is just to expose
the full power of Datomic's datalog to my ClojureScript app and to avoid
wrapping each query requirement in some clumsy API endpoint.

Max Weber

2016-05-09 16:53:19 UTC

Permalink

Thanks for the addition. As far as I know the Datomic REST API also
requires to be called from a trusted environment (like the Datomic peer
library). I like to call datomic.api/q via an API from an untrusted
environment (in my case a browser which runs a ClojureScript application).

Post by a***@mail.yu.edu
This can be done with Datomic's REST API.
http://docs.datomic.com/rest.html
Sorry if you have already seen this, but you didn't mention it so I
thought I would point you to it.

Post by Max Weber
Hi,
I like to expose datomic.api/q via an API. I'm trying to find out the
steps to make this approach "secure".
Let's assume that the database only contains public data or is
appropriately filtered by datomic.api/filter.
The most obvious security flaw of this approach is that a Datalog query
(datomic.api/q
'[:find ?r .
:where [(System/getenv) ?r]]
nil)
However "safe" functions like clojure.string/starts-with? are very useful
for Datomic queries and should be allowed. A first action to make the
approach more "secure" would be to define a whitelist of all allowed
functions and check fn-expr and pred-expr against this whitelist.
Besides the whitelist I would prefer to write some kind of parser for
Datomic's query grammar (http://docs.datomic.com/query.html#sec-4) and
only allow the elements that are necessary for the corresponding API
clients. This would be quite a lot of effort, therefore I'm looking for
alternatives.
Another issue is to constrain the query execution time. But I assume the
:timeout functionality of datomic.api/query should do the job.
Furthermore the arguments for datomic.api/q are quite flexible. The query
can be a map, list, or string. Here it would be necessary to make sure that
a query as string is blocked. A query string is harder to check and it
could yield other security flaws, since maybe Datomic read it with
clojure.core/read-string.
I guess that I overlook some other security flaws? What would be your
recommendations to make the described approach feasible?
Best regards
Max
P.S. I'm aware of other developments in this area like Relay / GraphQL,
Falcor or Datomic DataScript synchronization (
https://github.com/metasoarous/datsync). My motivation is just to expose
the full power of Datomic's datalog to my ClojureScript app and to avoid
wrapping each query requirement in some clumsy API endpoint.

Casper Clausen

2016-05-11 22:07:20 UTC

Permalink

Just chiming in here, that we are doing something very similar, so I would
like to see some discussion here as well on how to secure or limit the
querying capabilities to some defined subset.

/Casper

Linus Ericsson

2016-05-12 07:24:54 UTC

Permalink

There are some aspects of the query api that are hard to make use of via a
REST API (or similar) while keeping security, less than jailing a JVM
somehow. One is the quite common use-case to ask several datasource, either
dbs or plain data structures.

Another approach is to use similar principles as used in om.next - using
some kind of decorated pull expressions. We use a similar approach in a
somewhat complicated application (although with reagent/posh rather than
om.next) and for a lot of use cases an "encoded queries" approach works
quite well.

I guess a good approach to make more free-form queries would be to extend
the pull expressions in a structured way, for instance with encoded
implicit queries for getting a all items in a certain time-range. I guess
it's easier to "lock down" such a query engine than a real datalog one.

(Using datascript to query the fetched data would maybe make it possible to
make free form queries to some extent, as well, where the pull expression
gives the data subset, which needs some sixe constrains etc).

/Linus

Post by Casper Clausen
Just chiming in here, that we are doing something very similar, so I would
like to see some discussion here as well on how to secure or limit the
querying capabilities to some defined subset.
/Casper

Post by Max Weber
Hi,
I like to expose datomic.api/q via an API. I'm trying to find out the
steps to make this approach "secure".
Let's assume that the database only contains public data or is
appropriately filtered by datomic.api/filter.
The most obvious security flaw of this approach is that a Datalog query
(datomic.api/q
'[:find ?r .
:where [(System/getenv) ?r]]
nil)
However "safe" functions like clojure.string/starts-with? are very useful
for Datomic queries and should be allowed. A first action to make the
approach more "secure" would be to define a whitelist of all allowed
functions and check fn-expr and pred-expr against this whitelist.
Besides the whitelist I would prefer to write some kind of parser for
Datomic's query grammar (http://docs.datomic.com/query.html#sec-4) and
only allow the elements that are necessary for the corresponding API
clients. This would be quite a lot of effort, therefore I'm looking for
alternatives.
Another issue is to constrain the query execution time. But I assume the
:timeout functionality of datomic.api/query should do the job.
Furthermore the arguments for datomic.api/q are quite flexible. The query
can be a map, list, or string. Here it would be necessary to make sure that
a query as string is blocked. A query string is harder to check and it
could yield other security flaws, since maybe Datomic read it with
clojure.core/read-string.
I guess that I overlook some other security flaws? What would be your
recommendations to make the described approach feasible?
Best regards
Max
P.S. I'm aware of other developments in this area like Relay / GraphQL,
Falcor or Datomic DataScript synchronization (
https://github.com/metasoarous/datsync). My motivation is just to expose
the full power of Datomic's datalog to my ClojureScript app and to avoid
wrapping each query requirement in some clumsy API endpoint.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Alan Moore

2016-05-13 07:08:21 UTC

Permalink

You might also look at leveraging tools.analyzer.jvm to do the query parsing. You will still have to walk the generated AST but at least a lot of the heavy lifting will be done for you.

Good luck! Let us know what you end up doing.

Alan

Max Weber

2016-05-13 16:50:07 UTC

Permalink

Thanks everyone for joining the discussion. I'm glad to hear that other
people try to accomplish similar things with Datomic. This is what I use
right now to check the datalog queries:

https://gist.github.com/maxweber/e11ed25ec46ba59c12c05f8052d06ba5

It is far from being complete, but should be sufficient for a first alpha
version / testing of our new product. Would be happy to receive feedback.

Best regards

Max

Post by Alan Moore
You might also look at leveraging tools.analyzer.jvm to do the query
parsing. You will still have to walk the generated AST but at least a lot
of the heavy lifting will be done for you.
Good luck! Let us know what you end up doing.
Alan

Matthieu Béteille

2016-07-20 15:42:25 UTC

Permalink

Hi Max,

I am currently trying to do something really close to what you are doing,
and I am facing the same issue.

Just wanted to know if you are still using the same approach (using the
code snippet provided in your gist), or if you found any other solution
since this?

If you are still using this approach, how has it been working for you so
far?

Thanks!

Matthieu

Post by Max Weber
Thanks everyone for joining the discussion. I'm glad to hear that other
people try to accomplish similar things with Datomic. This is what I use
https://gist.github.com/maxweber/e11ed25ec46ba59c12c05f8052d06ba5
It is far from being complete, but should be sufficient for a first alpha
version / testing of our new product. Would be happy to receive feedback.
Best regards
Max

Max Weber

2016-07-21 14:09:34 UTC

Permalink

Hi Matthieu,

yes, I'm still using the approach from the gist. Works like a charm so far,
directly exposing datalog via the API removes a ton of complexity. But I
still don't know, if the gist prevents every possible injection attack.

Best regards

Max

On Wed, Jul 20, 2016 at 5:42 PM, Matthieu BÃ©teille <

Post by Wesley Hall
Hi Max,
I am currently trying to do something really close to what you are doing,
and I am facing the same issue.
Just wanted to know if you are still using the same approach (using the
code snippet provided in your gist), or if you found any other solution
since this?
If you are still using this approach, how has it been working for you so
far?
Thanks!
Matthieu

You received this message because you are subscribed to a topic in the
Google Groups "Datomic" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/datomic/ghLvN2cSzeo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
For more options, visit https://groups.google.com/d/optout.

z***@gmail.com

2017-07-17 01:31:15 UTC

Permalink

Good to know I'm not the only one who wondered about this! Could anyone
knowledgeable comment on whether the code in the gist is safe? I have only
experimented with Datomic for a few days but already began wondering if I
could safely send datalog queries directly from my frontend.

Wesley Hall

2016-05-12 13:38:28 UTC

Permalink

Doing away with the 'middle-tier' seems to be all the rage at the moment.

I've tried it a few times myself, always with the experience that the layer
you have to build to implement this kind of security blacklisting becomes
massively more "clumsy" than the middle-tier would have been. Blacklisting
by itself is a dangerous approach. Can you write tests to ensure that you
haven't forgotten to strip out some query that might compromise your
system? Donald Rumsfeld's "unknown unknowns" seems, once again, to rear
it's ugly head.

My solution at this point is to build a simple RPC remoting system that
takes POSTed EDN vectors (you can obviously use transit if you prefer) and
applies them to remote functions that have the correct 'expose' metadata
tagged (you can use prismatic/schema etc too). It's really simple to
implement, and your middle tier becomes pure clojure functions. Small piece
of clojurescript magic and you are "just calling functions". You still need
to check and sanitise the arguments coming from the client of course, but
so much easier within the restricted domain of what the function will do
with it.

I know that it is pretty bad form to answer a person's question with a
proposal that they should do something different, so I can only ask that
you excuse me in that respect. I'd just say that is really worth rethinking
whether or not it would be possible to use all the magic of this wonderful
language to prevent your API endpoints from being clumsy (it doesn't have
to be REST for example). There are still (imo) huge advantages to having a
middle tier, not least of which is a whitelist approach to what a client
can and cannot pull out of the remote database.

Cheers

Wes

Max Weber

2016-05-16 16:40:02 UTC

Permalink

I also considered to implement a more classic "middle-tier". I'm
implementing the client and the server anyways, so why should I bother with
realizing some fancy "demand-driven" client API (Om Next, GraphQL / Relay,
Falcor or whatever). I even have a RPC-style library, which only adds
minimal overhead for exposing Clojure functions to the client.

I've tried to use the "classic" approach for the current project and ended
up with some bloated API endpoints. You just keep reinventing the wheel,
trying to express your (datalog) query with some way less powerful API
call. This is an example directly from the ClojureScript client:

(rpc/call
:query/q
'[:find ?e ?label
:in $ [?type ...] ?label-attr ?substr
:where
[?e ?label-attr ?label]
[?e :type ?type]
[(clojure.string/lower-case ?label) ?lower-case-label]
[(clojure.string/includes?
?lower-case-label
?substr)]]
:current
type
label-attr
(str/lower-case substr))

It just searches entities with the given :type(s), which label-attr(ibute)
values contain a certain substring (for an autocomplete control). It is
perfectly fine to move such functions into the "middle-tier", but you have
to come up with some other specification to declare, what you like to have
from the server or rather database. The Datomic datalog query is also
"plain data" and it is already a very powerful specification, so why
inventing something new.

I agree with you that blacklisting is a dangerous approach. Therefore I
would prefer a good "parser" for the Datomic query grammar to exactly
define (whitelist), which parts of a Datalog query I like to allow in the
"demand-driven" API for the clients.

However I also like to emphasize that my initial question is really about
securing the execution of a foreign datalog query, not about how to define:
who is allowed to read what (authorization if you like). The latter can be
solved elegantly with the help of datomic.api/filter (like described in
this talk

also
whitelist-based). Even if you build a more classic "middle-tier" it is
advisable to implement the "authorization" concerns with the help of
datomic.api/filter to keep your Datalog queries clean from all this
additional authorization logic.

As with everything there are different tradeoffs. I don't know if the
"demand-driven" API approach leads to better systems, but at least it feels
more powerful at the moment. I'm going to write about my experiences here,
especially if I had to switched back to classic "middle-tier".

Best regards

Max

Post by Wesley Hall
Doing away with the 'middle-tier' seems to be all the rage at the moment.
I've tried it a few times myself, always with the experience that the
layer you have to build to implement this kind of security blacklisting
becomes massively more "clumsy" than the middle-tier would have been.
Blacklisting by itself is a dangerous approach. Can you write tests to
ensure that you haven't forgotten to strip out some query that might
compromise your system? Donald Rumsfeld's "unknown unknowns" seems, once
again, to rear it's ugly head.
My solution at this point is to build a simple RPC remoting system that
takes POSTed EDN vectors (you can obviously use transit if you prefer) and
applies them to remote functions that have the correct 'expose' metadata
tagged (you can use prismatic/schema etc too). It's really simple to
implement, and your middle tier becomes pure clojure functions. Small piece
of clojurescript magic and you are "just calling functions". You still need
to check and sanitise the arguments coming from the client of course, but
so much easier within the restricted domain of what the function will do
with it.
I know that it is pretty bad form to answer a person's question with a
proposal that they should do something different, so I can only ask that
you excuse me in that respect. I'd just say that is really worth rethinking
whether or not it would be possible to use all the magic of this wonderful
language to prevent your API endpoints from being clumsy (it doesn't have
to be REST for example). There are still (imo) huge advantages to having a
middle tier, not least of which is a whitelist approach to what a client
can and cannot pull out of the remote database.
Cheers
Wes

Post by Max Weber
Hi,
I like to expose datomic.api/q via an API. I'm trying to find out the
steps to make this approach "secure".
Let's assume that the database only contains public data or is
appropriately filtered by datomic.api/filter.
The most obvious security flaw of this approach is that a Datalog query
(datomic.api/q
'[:find ?r .
:where [(System/getenv) ?r]]
nil)
However "safe" functions like clojure.string/starts-with? are very useful
for Datomic queries and should be allowed. A first action to make the
approach more "secure" would be to define a whitelist of all allowed
functions and check fn-expr and pred-expr against this whitelist.
Besides the whitelist I would prefer to write some kind of parser for
Datomic's query grammar (http://docs.datomic.com/query.html#sec-4) and
only allow the elements that are necessary for the corresponding API
clients. This would be quite a lot of effort, therefore I'm looking for
alternatives.
Another issue is to constrain the query execution time. But I assume the
:timeout functionality of datomic.api/query should do the job.
Furthermore the arguments for datomic.api/q are quite flexible. The query
can be a map, list, or string. Here it would be necessary to make sure that
a query as string is blocked. A query string is harder to check and it
could yield other security flaws, since maybe Datomic read it with
clojure.core/read-string.
I guess that I overlook some other security flaws? What would be your
recommendations to make the described approach feasible?
Best regards
Max
P.S. I'm aware of other developments in this area like Relay / GraphQL,
Falcor or Datomic DataScript synchronization (
https://github.com/metasoarous/datsync). My motivation is just to expose
the full power of Datomic's datalog to my ClojureScript app and to avoid
wrapping each query requirement in some clumsy API endpoint.

--
You received this message because you are subscribed to a topic in the
Google Groups "Datomic" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/datomic/ghLvN2cSzeo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
For more options, visit https://groups.google.com/d/optout.

Wesley Hall

2016-05-16 20:34:02 UTC

Permalink

Hi Max,

I am not really advocating doing away with datalog or doing anything fancy
with the comms.

Assuming the example query that you provide here was to be wrapped in a
function with the arguments to the datalog query provided as function
arguments, you could simply place this function in a clj rather than cljs
file and invoke it from the clojurescript side with a little minimal RPC
plumbing. The data returned would be the same clojure data structure you'd
get from sending the query along with.

The initial problem you describe is the reason we have clojure.edn/read and
use that rather than clojure.core/read. There are risks associated with
accepting forms that will be evaluated as function invocations that come
from untrusted sources. It's certainly possible to knock up an algorithm
that goes peeking through the query as submitted to ensure that any
functions included are in the allowed list. You probably just traverse
looking for lists, and checking the first element when you find one. Might
be some additional details. Maybe you want to validate the arguments too in
some cases.

It's a problem I would tend to avoid by just placing the queries on the clj
side and just letting the client pass just the arguments over the wire
rather than anything that could be processed as executable code.

There is an advantage with your approach that you don't need to update the
server-side part to support new queries. I tend to develop my applications
together and deploy them as a single unit so I don't have this problem
really, but I can appreciate that it could well be a factor under different
models / approaches.

I would be very interested to read your experiences.

For now, I have a date with the missus in Westeros :).

Post by Max Weber
I also considered to implement a more classic "middle-tier". I'm
implementing the client and the server anyways, so why should I bother with
realizing some fancy "demand-driven" client API (Om Next, GraphQL / Relay,
Falcor or whatever). I even have a RPC-style library, which only adds
minimal overhead for exposing Clojure functions to the client.
I've tried to use the "classic" approach for the current project and ended
up with some bloated API endpoints. You just keep reinventing the wheel,
trying to express your (datalog) query with some way less powerful API
(rpc/call
:query/q
'[:find ?e ?label
:in $ [?type ...] ?label-attr ?substr
:where
[?e ?label-attr ?label]
[?e :type ?type]
[(clojure.string/lower-case ?label) ?lower-case-label]
[(clojure.string/includes?
?lower-case-label
?substr)]]
:current
type
label-attr
(str/lower-case substr))
It just searches entities with the given :type(s), which label-attr(ibute)
values contain a certain substring (for an autocomplete control). It is
perfectly fine to move such functions into the "middle-tier", but you have
to come up with some other specification to declare, what you like to have
from the server or rather database. The Datomic datalog query is also
"plain data" and it is already a very powerful specification, so why
inventing something new.
I agree with you that blacklisting is a dangerous approach. Therefore I
would prefer a good "parser" for the Datomic query grammar to exactly
define (whitelist), which parts of a Datalog query I like to allow in the
"demand-driven" API for the clients.
However I also like to emphasize that my initial question is really about
who is allowed to read what (authorization if you like). The latter can be
solved elegantly with the help of datomic.api/filter (like described in
this talk http://youtu.be/7lm3K8zVOdY also
whitelist-based). Even if you build a more classic "middle-tier" it is
advisable to implement the "authorization" concerns with the help of
datomic.api/filter to keep your Datalog queries clean from all this
additional authorization logic.
As with everything there are different tradeoffs. I don't know if the
"demand-driven" API approach leads to better systems, but at least it feels
more powerful at the moment. I'm going to write about my experiences here,
especially if I had to switched back to classic "middle-tier".
Best regards
Max

Post by Max Weber
Hi,
I like to expose datomic.api/q via an API. I'm trying to find out the
steps to make this approach "secure".
Let's assume that the database only contains public data or is
appropriately filtered by datomic.api/filter.
The most obvious security flaw of this approach is that a Datalog query
(datomic.api/q
'[:find ?r .
:where [(System/getenv) ?r]]
nil)
However "safe" functions like clojure.string/starts-with? are very
useful for Datomic queries and should be allowed. A first action to make
the approach more "secure" would be to define a whitelist of all allowed
functions and check fn-expr and pred-expr against this whitelist.
Besides the whitelist I would prefer to write some kind of parser for
Datomic's query grammar (http://docs.datomic.com/query.html#sec-4) and
only allow the elements that are necessary for the corresponding API
clients. This would be quite a lot of effort, therefore I'm looking for
alternatives.
Another issue is to constrain the query execution time. But I assume the
:timeout functionality of datomic.api/query should do the job.
Furthermore the arguments for datomic.api/q are quite flexible. The
query can be a map, list, or string. Here it would be necessary to make
sure that a query as string is blocked. A query string is harder to check
and it could yield other security flaws, since maybe Datomic read it with
clojure.core/read-string.
I guess that I overlook some other security flaws? What would be your
recommendations to make the described approach feasible?
Best regards
Max
P.S. I'm aware of other developments in this area like Relay / GraphQL,
Falcor or Datomic DataScript synchronization (
https://github.com/metasoarous/datsync). My motivation is just to
expose the full power of Datomic's datalog to my ClojureScript app and to
avoid wrapping each query requirement in some clumsy API endpoint.