Discussion:
Transactor not available
'Matt Bossenbroek' via Datomic
2015-10-13 15:18:09 UTC
Permalink
We've been seeing the following error crop up semi-frequently recently. Restarting the peer fixes it, but that's obviously not a long term solution.

This is the error we see on the peer:

clojure.lang.ExceptionInfo: :db.error/transactor-unavailable Transactor not available {:db/error :db.error/transactor-unavailable}
at datomic.peer$transactor_unavailable.invoke(peer.clj:186)
at datomic.peer.Connection.transactAsync(peer.clj:349)
at datomic.peer.Connection.transact(peer.clj:332)
at datomic.api$transact.invoke(api.clj:90)


And this is the error we see on the transactor (ip address changed, but it's the peer):

2015-10-12 02:06:55.505 WARN default org.hornetq.core.client - HQ212037: Connection failure has been detected: HQ119014: Did not receive data from /10.0.0.0:59725. It is likely the client has exited or crashed without closing its connection, or the network between the server and client has failed. You also might have configured connection-ttl and client-failure-check-period incorrectly. Please check user manual for more information. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
2015-10-12 02:06:55.505 WARN default org.hornetq.core.server - HQ222061: Client connection failed, clearing up resources for session 4520159d-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.505 WARN default org.hornetq.core.server - HQ222107: Cleared up resources for session 4520159d-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.506 WARN default org.hornetq.core.server - HQ222061: Client connection failed, clearing up resources for session 4524f79e-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.507 WARN default org.hornetq.core.server - HQ222107: Cleared up resources for session 4524f79e-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.507 WARN default org.hornetq.core.server - HQ222061: Client connection failed, clearing up resources for session 465decd0-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.507 WARN default org.hornetq.core.server - HQ222107: Cleared up resources for session 465decd0-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.508 WARN default org.hornetq.core.server - HQ222061: Client connection failed, clearing up resources for session 466084e1-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.508 WARN default org.hornetq.core.server - HQ222107: Cleared up resources for session 466084e1-7085-11e5-8b24-1fc2cdcb56a5



Some googling of the error turned up two possibilities: 1) the laptop sleep problem (not applicable here because these are both AWS instances), and 2) the transactor is under some pressure.

Looking into the second, here's a graph of AvailableMB over the past week, where you can see it oscillating between 15-20GB:



This looks fine to me & the only errors I see on the transactor are the above client warnings. There's literally nothing logged as ERROR.

Also weird is that a peer restart fixes the issue. If it truly were the transactor under pressure, I would expect that to not remedy the issue.

Any thoughts where to look next? This is with datomic pro 0.9.5130.

Thanks,
Matt
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ben Kamphaus
2015-10-13 15:34:18 UTC
Permalink
Hi Matt,

It looks like it may actually be the peer that's running out of memory. Can
you provide peer and transactor logs to me off list? bkamphaus at cognitect

Best,
Ben
Post by 'Matt Bossenbroek' via Datomic
We've been seeing the following error crop up semi-frequently recently.
Restarting the peer fixes it, but that's obviously not a long term
solution.
clojure.lang.ExceptionInfo: :db.error/transactor-unavailable Transactor
not available {:db/error :db.error/transactor-unavailable}
at datomic.peer$transactor_unavailable.invoke(peer.clj:186)
at datomic.peer.Connection.transactAsync(peer.clj:349)
at datomic.peer.Connection.transact(peer.clj:332)
at datomic.api$transact.invoke(api.clj:90)
2015-10-12 02:06:55.505 WARN default org.hornetq.core.client -
HQ212037: Connection failure has been detected: HQ119014: Did not receive
data from /10.0.0.0:59725. It is likely the client has exited or crashed
without closing its connection, or the network between the server and
client has failed. You also might have configured connection-ttl and
client-failure-check-period incorrectly. Please check user manual for more
information. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
2015-10-12 02:06:55.505 WARN default org.hornetq.core.server -
HQ222061: Client connection failed, clearing up resources for session
4520159d-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.505 WARN default org.hornetq.core.server -
HQ222107: Cleared up resources for session
4520159d-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.506 WARN default org.hornetq.core.server -
HQ222061: Client connection failed, clearing up resources for session
4524f79e-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.507 WARN default org.hornetq.core.server -
HQ222107: Cleared up resources for session
4524f79e-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.507 WARN default org.hornetq.core.server -
HQ222061: Client connection failed, clearing up resources for session
465decd0-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.507 WARN default org.hornetq.core.server -
HQ222107: Cleared up resources for session
465decd0-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.508 WARN default org.hornetq.core.server -
HQ222061: Client connection failed, clearing up resources for session
466084e1-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.508 WARN default org.hornetq.core.server -
HQ222107: Cleared up resources for session
466084e1-7085-11e5-8b24-1fc2cdcb56a5
Some googling of the error turned up two possibilities: 1) the laptop
sleep problem (not applicable here because these are both AWS instances),
and 2) the transactor is under some pressure.
Looking into the second, here's a graph of AvailableMB over the past week,
This looks fine to me & the only errors I see on the transactor are the
above client warnings. There's literally nothing logged as ERROR.
Also weird is that a peer restart fixes the issue. If it truly were the
transactor under pressure, I would expect that to not remedy the issue.
Any thoughts where to look next? This is with datomic pro 0.9.5130.
Thanks,
Matt
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Matt Bossenbroek' via Datomic
2015-10-13 16:19:41 UTC
Permalink
Thanks Ben - sent full logs off-list.

Looking closer at the peer logs, I do see a lot of these also:

2015-10-13 16:01:15,391 WARN datomic.slf4j$caused_by:67 [clojure-agent-send-off-pool-29527] [invoke] ... caused by ...
HornetQException[errorType=INTERNAL_ERROR message=HQ119001: Failed to create session]
at org.hornetq.core.client.impl.ClientSessionFactoryImpl.createSessionInternal(ClientSessionFactoryImpl.java:962)
at org.hornetq.core.client.impl.ClientSessionFactoryImpl.createSessionInternal(ClientSessionFactoryImpl.java:787)
at org.hornetq.core.client.impl.ClientSessionFactoryImpl.createSession(ClientSessionFactoryImpl.java:324)
at datomic.hornet.SessionFactoryBundle.start_session_STAR_(hornet.clj:193)
at datomic.hornet$start_session.doInvoke(hornet.clj:165)
at clojure.lang.RestFn.invoke(RestFn.java:464)
at datomic.connector.TransactorHornetConnector$fn__8588.invoke(connector.clj:228)
at datomic.connector.TransactorHornetConnector.admin_request_STAR_(connector.clj:226)
at datomic.peer.Connection$fn__8890.invoke(peer.clj:237)
at datomic.peer.Connection.create_connection_state(peer.clj:221)
at datomic.peer$create_connection$reconnect_fn__8951.invoke(peer.clj:488)
at clojure.lang.AFn.applyToHelper(AFn.java:154)
at clojure.lang.AFn.applyTo(AFn.java:144)
at clojure.core$apply.invoke(core.clj:626)
at clojure.core$partial$fn__4228.doInvoke(core.clj:2468)
at clojure.lang.RestFn.invoke(RestFn.java:397)
at datomic.common$retry_fn$fn__250.invoke(common.clj:461)
at datomic.common$retry_fn.doInvoke(common.clj:461)
at clojure.lang.RestFn.invoke(RestFn.java:713)
at datomic.peer$create_connection$fn__8953.invoke(peer.clj:492)
at datomic.reconnector2.Reconnector$fn__8312.invoke(reconnector2.clj:57)
at clojure.core$binding_conveyor_fn$fn__4145.invoke(core.clj:1910)
at clojure.lang.AFn.call(AFn.java:18)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Connection is null
at org.hornetq.core.client.impl.ClientSessionFactoryImpl.createSessionInternal(ClientSessionFactoryImpl.java:836)
... 26 more


This is the AvailableMB for the peer:


These are the java options we run the peer with:

JAVA_OPTS=" \
-server \
-verbose:gc \
-XX:+PrintGCDetails \
-XX:+PrintGCDateStamps \
-XX:+PrintAdaptiveSizePolicy \
-XX:+PrintTenuringDistribution \
-XX:+PrintCommandLineFlags \
-Xloggc:$GCLOG \
-Xms27101m \
-Xmx27101m \
-Xss512k \
-XX:ReservedCodeCacheSize=240m \
-XX:MetaspaceSize=256m \
-XX:MaxMetaspaceSize=256m \
-XX:+UseParNewGC \
-XX:+UseConcMarkSweepGC \
-XX:+CMSConcurrentMTEnabled \
-XX:+CMSScavengeBeforeRemark \
-XX:+CMSClassUnloadingEnabled \
-XX:+ExplicitGCInvokesConcurrent \
-Dsun.rmi.dgc.client.gcInterval=28800000 \
-Dsun.rmi.dgc.server.gcInterval=28800000 \
-Djava.net.preferIPv4Stack=true \
-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false \
-Dnetflix.datacenter=cloud \
$JAVA_ENV_ARGS \
-Ddatomic.metricsCallback=com.netflix.dagobah.model.datomic.nflx-atlas-publish/update-metrics \
"


And both machines are m3.2xl's.

Thanks,
Matt
Post by Ben Kamphaus
Hi Matt,
It looks like it may actually be the peer that's running out of memory. Can you provide peer and transactor logs to me off list? bkamphaus at cognitect
Best,
Ben
Post by 'Matt Bossenbroek' via Datomic
We've been seeing the following error crop up semi-frequently recently. Restarting the peer fixes it, but that's obviously not a long term solution.
clojure.lang.ExceptionInfo: :db.error/transactor-unavailable Transactor not available {:db/error :db.error/transactor-unavailable}
at datomic.peer$transactor_unavailable.invoke(peer.clj:186)
at datomic.peer.Connection.transactAsync(peer.clj:349)
at datomic.peer.Connection.transact(peer.clj:332)
at datomic.api$transact.invoke(api.clj:90)
2015-10-12 02:06:55.505 WARN default org.hornetq.core.client - HQ212037: Connection failure has been detected: HQ119014: Did not receive data from /10.0.0.0:59725 (http://10.0.0.0:59725). It is likely the client has exited or crashed without closing its connection, or the network between the server and client has failed. You also might have configured connection-ttl and client-failure-check-period incorrectly. Please check user manual for more information. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
2015-10-12 02:06:55.505 WARN default org.hornetq.core.server - HQ222061: Client connection failed, clearing up resources for session 4520159d-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.505 WARN default org.hornetq.core.server - HQ222107: Cleared up resources for session 4520159d-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.506 WARN default org.hornetq.core.server - HQ222061: Client connection failed, clearing up resources for session 4524f79e-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.507 WARN default org.hornetq.core.server - HQ222107: Cleared up resources for session 4524f79e-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.507 WARN default org.hornetq.core.server - HQ222061: Client connection failed, clearing up resources for session 465decd0-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.507 WARN default org.hornetq.core.server - HQ222107: Cleared up resources for session 465decd0-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.508 WARN default org.hornetq.core.server - HQ222061: Client connection failed, clearing up resources for session 466084e1-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.508 WARN default org.hornetq.core.server - HQ222107: Cleared up resources for session 466084e1-7085-11e5-8b24-1fc2cdcb56a5
Some googling of the error turned up two possibilities: 1) the laptop sleep problem (not applicable here because these are both AWS instances), and 2) the transactor is under some pressure.
This looks fine to me & the only errors I see on the transactor are the above client warnings. There's literally nothing logged as ERROR.
Also weird is that a peer restart fixes the issue. If it truly were the transactor under pressure, I would expect that to not remedy the issue.
Any thoughts where to look next? This is with datomic pro 0.9.5130.
Thanks,
Matt
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Matt Bossenbroek' via Datomic
2015-12-11 22:21:41 UTC
Permalink
Just wanted to circle back on this one in case anyone googles it in the future with the same issue...

Turns out there were a couple of different things going on here. First off, we had allocated way too much memory for the transactor. I guess that even if you have a big machine for it, don't give it all the memory unless you really need it. The excess memory just causes longer GC pauses when it does collect.


The HornetQException on the peer was a red herring. I've been told that it's innocuous and actually causes no harm. This continues to report periodically and is accompanied by a AlarmUnhandledException metric until the peer is restarted.


And the real issue was large transactions. In our case, not large individual datoms, but lots of them. We had some individual transactions with hundreds of thousands of datoms in them, some getting into the tens of MB in size. They occurred infrequently, but were enough to put the transactor into a bad state while it chewed through the whole thing. If you suspect this might be an issue for you, look at the max of the TransactionBytes and TransactionDatoms metrics produced by the transactor for spikes.


The fix (and we're still working on part of this) was to reduce the size of the transaction and potentially break it up into a series of smaller transactions. We still occasionally see these errors, but far less frequently now.


Thanks to the Datomic team for helping to debug this one.

-Matt
Post by 'Matt Bossenbroek' via Datomic
Thanks Ben - sent full logs off-list.
2015-10-13 16:01:15,391 WARN datomic.slf4j$caused_by:67 [clojure-agent-send-off-pool-29527] [invoke] ... caused by ...
HornetQException[errorType=INTERNAL_ERROR message=HQ119001: Failed to create session]
at org.hornetq.core.client.impl.ClientSessionFactoryImpl.createSessionInternal(ClientSessionFactoryImpl.java:962)
at org.hornetq.core.client.impl.ClientSessionFactoryImpl.createSessionInternal(ClientSessionFactoryImpl.java:787)
at org.hornetq.core.client.impl.ClientSessionFactoryImpl.createSession(ClientSessionFactoryImpl.java:324)
at datomic.hornet.SessionFactoryBundle.start_session_STAR_(hornet.clj:193)
at datomic.hornet$start_session.doInvoke(hornet.clj:165)
at clojure.lang.RestFn.invoke(RestFn.java:464)
at datomic.connector.TransactorHornetConnector$fn__8588.invoke(connector.clj:228)
at datomic.connector.TransactorHornetConnector.admin_request_STAR_(connector.clj:226)
at datomic.peer.Connection$fn__8890.invoke(peer.clj:237)
at datomic.peer.Connection.create_connection_state(peer.clj:221)
at datomic.peer$create_connection$reconnect_fn__8951.invoke(peer.clj:488)
at clojure.lang.AFn.applyToHelper(AFn.java:154)
at clojure.lang.AFn.applyTo(AFn.java:144)
at clojure.core$apply.invoke(core.clj:626)
at clojure.core$partial$fn__4228.doInvoke(core.clj:2468)
at clojure.lang.RestFn.invoke(RestFn.java:397)
at datomic.common$retry_fn$fn__250.invoke(common.clj:461)
at datomic.common$retry_fn.doInvoke(common.clj:461)
at clojure.lang.RestFn.invoke(RestFn.java:713)
at datomic.peer$create_connection$fn__8953.invoke(peer.clj:492)
at datomic.reconnector2.Reconnector$fn__8312.invoke(reconnector2.clj:57)
at clojure.core$binding_conveyor_fn$fn__4145.invoke(core.clj:1910)
at clojure.lang.AFn.call(AFn.java:18)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Connection is null
at org.hornetq.core.client.impl.ClientSessionFactoryImpl.createSessionInternal(ClientSessionFactoryImpl.java:836)
... 26 more
JAVA_OPTS=" \
-server \
-verbose:gc \
-XX:+PrintGCDetails \
-XX:+PrintGCDateStamps \
-XX:+PrintAdaptiveSizePolicy \
-XX:+PrintTenuringDistribution \
-XX:+PrintCommandLineFlags \
-Xloggc:$GCLOG \
-Xms27101m \
-Xmx27101m \
-Xss512k \
-XX:ReservedCodeCacheSize=240m \
-XX:MetaspaceSize=256m \
-XX:MaxMetaspaceSize=256m \
-XX:+UseParNewGC \
-XX:+UseConcMarkSweepGC \
-XX:+CMSConcurrentMTEnabled \
-XX:+CMSScavengeBeforeRemark \
-XX:+CMSClassUnloadingEnabled \
-XX:+ExplicitGCInvokesConcurrent \
-Dsun.rmi.dgc.client.gcInterval=28800000 \
-Dsun.rmi.dgc.server.gcInterval=28800000 \
-Djava.net.preferIPv4Stack=true \
-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false \
-Dnetflix.datacenter=cloud \
$JAVA_ENV_ARGS \
-Ddatomic.metricsCallback=com.netflix.dagobah.model.datomic.nflx-atlas-publish/update-metrics \
"
And both machines are m3.2xl's.
Thanks,
Matt
Post by Ben Kamphaus
Hi Matt,
It looks like it may actually be the peer that's running out of memory. Can you provide peer and transactor logs to me off list? bkamphaus at cognitect
Best,
Ben
Post by 'Matt Bossenbroek' via Datomic
We've been seeing the following error crop up semi-frequently recently. Restarting the peer fixes it, but that's obviously not a long term solution.
clojure.lang.ExceptionInfo: :db.error/transactor-unavailable Transactor not available {:db/error :db.error/transactor-unavailable}
at datomic.peer$transactor_unavailable.invoke(peer.clj:186)
at datomic.peer.Connection.transactAsync(peer.clj:349)
at datomic.peer.Connection.transact(peer.clj:332)
at datomic.api$transact.invoke(api.clj:90)
2015-10-12 02:06:55.505 WARN default org.hornetq.core.client - HQ212037: Connection failure has been detected: HQ119014: Did not receive data from /10.0.0.0:59725 (http://10.0.0.0:59725). It is likely the client has exited or crashed without closing its connection, or the network between the server and client has failed. You also might have configured connection-ttl and client-failure-check-period incorrectly. Please check user manual for more information. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
2015-10-12 02:06:55.505 WARN default org.hornetq.core.server - HQ222061: Client connection failed, clearing up resources for session 4520159d-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.505 WARN default org.hornetq.core.server - HQ222107: Cleared up resources for session 4520159d-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.506 WARN default org.hornetq.core.server - HQ222061: Client connection failed, clearing up resources for session 4524f79e-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.507 WARN default org.hornetq.core.server - HQ222107: Cleared up resources for session 4524f79e-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.507 WARN default org.hornetq.core.server - HQ222061: Client connection failed, clearing up resources for session 465decd0-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.507 WARN default org.hornetq.core.server - HQ222107: Cleared up resources for session 465decd0-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.508 WARN default org.hornetq.core.server - HQ222061: Client connection failed, clearing up resources for session 466084e1-7085-11e5-8b24-1fc2cdcb56a5
2015-10-12 02:06:55.508 WARN default org.hornetq.core.server - HQ222107: Cleared up resources for session 466084e1-7085-11e5-8b24-1fc2cdcb56a5
Some googling of the error turned up two possibilities: 1) the laptop sleep problem (not applicable here because these are both AWS instances), and 2) the transactor is under some pressure.
This looks fine to me & the only errors I see on the transactor are the above client warnings. There's literally nothing logged as ERROR.
Also weird is that a peer restart fixes the issue. If it truly were the transactor under pressure, I would expect that to not remedy the issue.
Any thoughts where to look next? This is with datomic pro 0.9.5130.
Thanks,
Matt
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
pete windle
2016-01-07 21:57:16 UTC
Permalink
Do you have a feeling for how big your reduced transactions are? We still need to do some testing around this but anecdotes are nice...
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ben Kamphaus
2016-01-12 14:23:16 UTC
Permalink
Hi Pete,

Without getting into the specifics of Matt B's problems, transactions in
the large 100's of K or MB's will be problems, though I want to stress that
these are practical limits rather than hard boundaries. That said, the
ideal size for i.e. breaking up imports is ~40K or so. If you're looking at
really large transactional boundaries, using something like a saga to
handle this is one of the use cases that Tim Ewald runs through
here: http://www.datomic.com/videos.html in his talk on reified
transactions.

Best,
Ben
Post by pete windle
Do you have a feeling for how big your reduced transactions are? We still
need to do some testing around this but anecdotes are nice...
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Matt Bossenbroek' via Datomic
2016-01-22 18:01:37 UTC
Permalink
For the last week here are the numbers we're seeing:

Avg TransactionBytes: 88.333k
Max TransactionBytes: 52.611M

Avg TransactionDatoms: 809.135
Max TransactionDatoms: 791.074k

We still see the occasional transactor unavailable when these spike, but they're far fewer these days.

Ben, when you say these are practical limits rather than hard boundaries, what do you mean? Does that imply that if we gave the transactor a beefier machine it would perform better? What system stats should be improved to allow for this?

Thanks,
Matt
Post by Ben Kamphaus
Hi Pete,
Without getting into the specifics of Matt B's problems, transactions in the large 100's of K or MB's will be problems, though I want to stress that these are practical limits rather than hard boundaries. That said, the ideal size for i.e. breaking up imports is ~40K or so. If you're looking at really large transactional boundaries, using something like a saga to handle this is one of the use cases that Tim Ewald runs through here: http://www.datomic.com/videos.html in his talk on reified transactions.
Best,
Ben
Post by pete windle
Do you have a feeling for how big your reduced transactions are? We still need to do some testing around this but anecdotes are nice...
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ben Kamphaus
2016-01-27 14:30:41 UTC
Permalink
Hi Matt,

We're currently investigating this further. I'll get back to you with more
detailed answers to your concerns.

Best,
Ben
Post by 'Matt Bossenbroek' via Datomic
Avg TransactionBytes: 88.333k
Max TransactionBytes: 52.611M
Avg TransactionDatoms: 809.135
Max TransactionDatoms: 791.074k
We still see the occasional transactor unavailable when these spike, but
they're far fewer these days.
Ben, when you say these are practical limits rather than hard boundaries,
what do you mean? Does that imply that if we gave the transactor a beefier
machine it would perform better? What system stats should be improved to
allow for this?
Thanks,
Matt
Hi Pete,
Without getting into the specifics of Matt B's problems, transactions in
the large 100's of K or MB's will be problems, though I want to stress that
these are practical limits rather than hard boundaries. That said, the
ideal size for i.e. breaking up imports is ~40K or so. If you're looking at
really large transactional boundaries, using something like a saga to
http://www.datomic.com/videos.html in his talk on reified transactions.
Best,
Ben
Do you have a feeling for how big your reduced transactions are? We still
need to do some testing around this but anecdotes are nice...
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ben Kamphaus
2016-02-09 21:55:34 UTC
Permalink
Hi Matt,

Just want to let you know that despite the silence, this is still a
priority for us. We're still exploring the issue and solutions.

Best,
Ben
Post by Ben Kamphaus
Hi Matt,
We're currently investigating this further. I'll get back to you with more
detailed answers to your concerns.
Best,
Ben
Post by 'Matt Bossenbroek' via Datomic
Avg TransactionBytes: 88.333k
Max TransactionBytes: 52.611M
Avg TransactionDatoms: 809.135
Max TransactionDatoms: 791.074k
We still see the occasional transactor unavailable when these spike, but
they're far fewer these days.
Ben, when you say these are practical limits rather than hard boundaries,
what do you mean? Does that imply that if we gave the transactor a beefier
machine it would perform better? What system stats should be improved to
allow for this?
Thanks,
Matt
Hi Pete,
Without getting into the specifics of Matt B's problems, transactions in
the large 100's of K or MB's will be problems, though I want to stress that
these are practical limits rather than hard boundaries. That said, the
ideal size for i.e. breaking up imports is ~40K or so. If you're looking at
really large transactional boundaries, using something like a saga to
http://www.datomic.com/videos.html in his talk on reified transactions.
Best,
Ben
Do you have a feeling for how big your reduced transactions are? We still
need to do some testing around this but anecdotes are nice...
--
You received this message because you are subscribed to the Google Groups
"Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Matt Bossenbroek' via Datomic
2016-02-10 18:39:16 UTC
Permalink
Thanks for the update!

-Matt
Post by Ben Kamphaus
Hi Matt,
Just want to let you know that despite the silence, this is still a priority for us. We're still exploring the issue and solutions.
Best,
Ben
Post by Ben Kamphaus
Hi Matt,
We're currently investigating this further. I'll get back to you with more detailed answers to your concerns.
Best,
Ben
Post by 'Matt Bossenbroek' via Datomic
Avg TransactionBytes: 88.333k
Max TransactionBytes: 52.611M
Avg TransactionDatoms: 809.135
Max TransactionDatoms: 791.074k
We still see the occasional transactor unavailable when these spike, but they're far fewer these days.
Ben, when you say these are practical limits rather than hard boundaries, what do you mean? Does that imply that if we gave the transactor a beefier machine it would perform better? What system stats should be improved to allow for this?
Thanks,
Matt
Post by Ben Kamphaus
Hi Pete,
Without getting into the specifics of Matt B's problems, transactions in the large 100's of K or MB's will be problems, though I want to stress that these are practical limits rather than hard boundaries. That said, the ideal size for i.e. breaking up imports is ~40K or so. If you're looking at really large transactional boundaries, using something like a saga to handle this is one of the use cases that Tim Ewald runs through here: http://www.datomic.com/videos.html in his talk on reified transactions.
Best,
Ben
Post by pete windle
Do you have a feeling for how big your reduced transactions are? We still need to do some testing around this but anecdotes are nice...
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Kenneth Kalmer
2017-04-13 13:40:27 UTC
Permalink
Hi Ben et al

Apologies for resurrecting such an old thread, but I have a question
directly related to this ~40K practical limit.

For completeness, I'm using Clojure 1.9.0-alpha14 with
datomic-free-0.9.5544 on Mac OS Sierra and Java 8.

So the 40K, is this the entire payload bytesize, the bytesize of all the
values only, or the number of datoms in the transaction?

The reason I ask is that I'm seeing this a lot during development,
especially when sending multiple transactions in quick succession (doseq or
loop). I'm already batching transactions to only 10,000 maps (variable
number of attributes so it could be well over 40K datoms), and I've
set -Ddatomic.txTimeoutMsec=60000 in project.clj as well. If the 40K
applies to datoms I can spend the time to split the transaction apart at
more or less 40K datoms, if it applies to the bytesize it might be a little
more effort.

I also make heavy use of indexed attributes (:db.index/value), probably
more than I'm comfortable with, but that is what the domain requires...
Should I be using d/sync or d/sync-index during the batching process? I've
wondered before if the indexing couldn't be to blame for the errors I'm
seeing.

I've optimized to generate as few transactions as possible during
ingestion, and then break them up in batches, rather than just firing off a
gazillion small transactions (I like to keep the logs clean and together).

Another important aside is that I'm doing most of the ingestion is done on
my MacBook before sending the data to prod (via backup/restore). In my case
prod is simply viewing the data, not changing it in any way. I know this
isn't optimal, but it is extremely practical for what I'm doing now. I can
imagine other developers eager to play with Datomic might do the same thing
and be getting a huge fright when their first "massive" ETL attempt blows
up in a similar way.

Kind regards
Post by Ben Kamphaus
Hi Pete,
Without getting into the specifics of Matt B's problems, transactions in
the large 100's of K or MB's will be problems, though I want to stress that
these are practical limits rather than hard boundaries. That said, the
ideal size for i.e. breaking up imports is ~40K or so. If you're looking at
really large transactional boundaries, using something like a saga to
http://www.datomic.com/videos.html in his talk on reified transactions.
Best,
Ben
Post by pete windle
Do you have a feeling for how big your reduced transactions are? We still
need to do some testing around this but anecdotes are nice...
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...