Discussion:
Pipeling transactions for higher throughput question
Matthew Gretton
2017-03-29 08:13:20 UTC
Permalink
Hi,

I have a channel of transactions that I want to transact in order due to
dependencies between transactions. I'm trying to work out if I can use the
pipelining example in the best practices guide to do this. I can't quite
see how it will however, as there are no guarantees around the order
transact-async will be called when running with parallelism. The only
guarantee is that the results will go onto the output channel in the same
order they came in, but this does not ensure transactions are executed in
order as a require.

Hopefully, I'm misunderstanding how pipeline-blocking or transact-async are
work here, but any advice would be greatly appreciated.

Thanks,

Matt.
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Leon Grapenthin
2017-03-31 16:48:32 UTC
Permalink
Your assertions are correct.

I. e. only transactions that are independent from each other can be sent in
non-deterministic order.

If you are able to create independent batches (sequences) of transactions,
you can modify the pipeline code so that it executes the batches in
parallel, i. e.

...
(doseq [tx data]
@(d/transact-async conn tx))
...
Post by Matthew Gretton
Hi,
I have a channel of transactions that I want to transact in order due to
dependencies between transactions. I'm trying to work out if I can use the
pipelining example in the best practices guide to do this. I can't quite
see how it will however, as there are no guarantees around the order
transact-async will be called when running with parallelism. The only
guarantee is that the results will go onto the output channel in the same
order they came in, but this does not ensure transactions are executed in
order as a require.
Hopefully, I'm misunderstanding how pipeline-blocking or transact-async
are work here, but any advice would be greatly appreciated.
Thanks,
Matt.
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Francis Avila
2017-03-31 21:02:55 UTC
Permalink
I think it is safe to use the tx-pipeline function from the documentation
<http://docs.datomic.com/best-practices.html#pipeline-transactions> to run
transactions which need to be run in the order given. pipeline-blocking
executes the transformation on each input in the order the input is given.
The transformation here is a call to transact-async, which for a given peer
will submit transactions to the transactor in the order that transact-async
is called in the peer process. The end result is the transactions in your
input vector will not arrive at the transactor out-of-order relative to one
another, meaning it is safe for a later transaction to assume an earlier
one completed.

This is messed up by errors, though. If an earlier transaction fails, a
later transaction may still be in-flight and run on the transactor
afterwards. (Not-yet submitted transactions will not be submitted after the
error.) Your transactions still need to be resilient to this possibility,
i.e. ensure they don't leave the database in an invalid state, for whatever
your domain's notion of "valid data" might be. However, this is true of any
pipelining scheme, not just this tx-pipeline implementation.
Post by Leon Grapenthin
Your assertions are correct.
I. e. only transactions that are independent from each other can be sent
in non-deterministic order.
If you are able to create independent batches (sequences) of transactions,
you can modify the pipeline code so that it executes the batches in
parallel, i. e.
...
(doseq [tx data]
@(d/transact-async conn tx))
...
Post by Matthew Gretton
Hi,
I have a channel of transactions that I want to transact in order due to
dependencies between transactions. I'm trying to work out if I can use the
pipelining example in the best practices guide to do this. I can't quite
see how it will however, as there are no guarantees around the order
transact-async will be called when running with parallelism. The only
guarantee is that the results will go onto the output channel in the same
order they came in, but this does not ensure transactions are executed in
order as a require.
Hopefully, I'm misunderstanding how pipeline-blocking or transact-async
are work here, but any advice would be greatly appreciated.
Thanks,
Matt.
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Matthew Gretton
2017-03-31 22:56:02 UTC
Permalink
Hi Francis,

Thanks for the reply, but this is not correct (as useful as it would be to
me!). To prove it, see the below code, which shows that the order the
tranducer is called in pipeline is different to the order the results are
returned when using pipeline with parallelism (at least on my machine).

(def call-order (atom []))

(def result-order (atom []))

(def from (a/to-chan (range 0 100)))

(def to (a/chan))

;; transducer adds item from channel to call-order coll, then returns item.
(a/pipeline 100 to (map (fn [x] (swap! call-order conj x) x)) from)

;; just loop through results outputting into result-order coll
(loop [x (a/<!! to)]
(when-not (nil? x)
(swap! result-order conj x)
(recur (a/<!! to))))

;; This fails for me, as the call order is different to the result order
(assert (= @call-order @result-order))

So in my original us case there is no guarantee that transact-async will be
called in order. Please take a look and see if I've made any mistakes in
the above code, but I'm confident the logic is sound.

Thanks again for the reply,

Matt.
Post by Francis Avila
I think it is safe to use the tx-pipeline function from the documentation
<http://docs.datomic.com/best-practices.html#pipeline-transactions> to
run transactions which need to be run in the order given. pipeline-blocking
executes the transformation on each input in the order the input is given.
The transformation here is a call to transact-async, which for a given peer
will submit transactions to the transactor in the order that transact-async
is called in the peer process. The end result is the transactions in your
input vector will not arrive at the transactor out-of-order relative to one
another, meaning it is safe for a later transaction to assume an earlier
one completed.
This is messed up by errors, though. If an earlier transaction fails, a
later transaction may still be in-flight and run on the transactor
afterwards. (Not-yet submitted transactions will not be submitted after the
error.) Your transactions still need to be resilient to this possibility,
i.e. ensure they don't leave the database in an invalid state, for whatever
your domain's notion of "valid data" might be. However, this is true of any
pipelining scheme, not just this tx-pipeline implementation.
Post by Leon Grapenthin
Your assertions are correct.
I. e. only transactions that are independent from each other can be sent
in non-deterministic order.
If you are able to create independent batches (sequences) of
transactions, you can modify the pipeline code so that it executes the
batches in parallel, i. e.
...
(doseq [tx data]
@(d/transact-async conn tx))
...
Post by Matthew Gretton
Hi,
I have a channel of transactions that I want to transact in order due to
dependencies between transactions. I'm trying to work out if I can use the
pipelining example in the best practices guide to do this. I can't quite
see how it will however, as there are no guarantees around the order
transact-async will be called when running with parallelism. The only
guarantee is that the results will go onto the output channel in the same
order they came in, but this does not ensure transactions are executed in
order as a require.
Hopefully, I'm misunderstanding how pipeline-blocking or transact-async
are work here, but any advice would be greatly appreciated.
Thanks,
Matt.
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Matthew Gretton
2017-03-31 23:00:07 UTC
Permalink
Hi Leon,

Thanks for the reply, unfortunately, there is no way I can identify the
dependencies in general, so think I'll have transact each item one after
another in one process to ensure ordering.

Thanks,

Matt.
Post by Leon Grapenthin
Your assertions are correct.
I. e. only transactions that are independent from each other can be sent
in non-deterministic order.
If you are able to create independent batches (sequences) of transactions,
you can modify the pipeline code so that it executes the batches in
parallel, i. e.
...
(doseq [tx data]
@(d/transact-async conn tx))
...
Post by Matthew Gretton
Hi,
I have a channel of transactions that I want to transact in order due to
dependencies between transactions. I'm trying to work out if I can use the
pipelining example in the best practices guide to do this. I can't quite
see how it will however, as there are no guarantees around the order
transact-async will be called when running with parallelism. The only
guarantee is that the results will go onto the output channel in the same
order they came in, but this does not ensure transactions are executed in
order as a require.
Hopefully, I'm misunderstanding how pipeline-blocking or transact-async
are work here, but any advice would be greatly appreciated.
Thanks,
Matt.
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...