Stu's 10 Billion Datoms

Discussion:

Stu's 10 Billion Datoms

Benjamin Yu

2015-01-21 23:10:13 UTC

I watched all the datomic training videos, and found the operational
capacity planning part interesting. The slide said "up to 10 billion
datoms" in a single datomic database. Stu made mention of some places going
up to 20-30 billion datoms.

As I understand it, that's all datoms including transaction datoms and
historical datoms (log)?

Not that I'll reach that limit anytime soon, but I am still curious about
it. This is just a purely hypothetical question that popped in my mind
while watching the video.

What strategies exist for being able to handle even more datoms? Vertical
scaling of just more powerful machines (peers)? Partitioning/Sharding?
Garbage collecting logs (like kafka)? I'm still coming from a world of lots
of lots of mysql.

Thanks,
Ben

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mike Haney

2015-01-22 03:58:47 UTC

I'm sure Stu can give some better answers, but one option that comes to mind is multiple databases. Probably not ideal, but maybe not as bad as it first sounds. At least Datomic can query across multiple databases, which could make this manageable.

I remember in the Conj talk from the Nubank guys, they said they had designed their app to be partitioned across multiple databases in case they ever needed it, but they didn't think it would ever actually come to that. I'm paraphrasing from memory - the actual talk is here

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robin Heggelund Hansen

2015-01-22 08:37:41 UTC

I think the reason Stu said "10 Billion Datoms" has to do with the peers
cache. At 10 Billion Datoms you will have data spread all over memory, and
you will have so much data, that the chances are pretty big that you won't
have a given dataset in cache, at that point Datomic will have to retrieve
a lot from storage, and at that point, Datomic will be slower than a
traditional database. You millage may vary of course, as it depends on your
dataset and workload, which i guess is why there are someone who does just
fine with 20-30 billion datoms.

When it comes to strategies, I guess the obvious thing to do is garbage
collection every once in a while. I personally garbage collect anything
older than a year.

Post by Benjamin Yu
I watched all the datomic training videos, and found the operational
capacity planning part interesting. The slide said "up to 10 billion
datoms" in a single datomic database. Stu made mention of some places going
up to 20-30 billion datoms.
As I understand it, that's all datoms including transaction datoms and
historical datoms (log)?
Not that I'll reach that limit anytime soon, but I am still curious about
it. This is just a purely hypothetical question that popped in my mind
while watching the video.
What strategies exist for being able to handle even more datoms? Vertical
scaling of just more powerful machines (peers)? Partitioning/Sharding?
Garbage collecting logs (like kafka)? I'm still coming from a world of lots
of lots of mysql.
Thanks,
Ben

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gary Verhaegen

2015-01-22 10:45:57 UTC

I was at a Datomic training in November and that question came up. Here is
my understanding of what Stu explained.

The 10 billion datoms is a hard limit, and it includes all datoms in the
database: current values, history, transactions, everything.

It's an architectural limitation: the structure of the Datomic index (or at
least one of them) is a tree with a 100 000 branching factor and two
levels. This makes the tree able to store 10 billion datoms, no more.

Stu did briefly talk about how they could lift the limitation from Datomic
itself, but increasing the branching factor or adding a level (or
conditionally changing the number of levels depending on the amount of
data) would reduce performances for every use case. At the time, he
mentioned that Datomic does not really try to fit every single possible use
case, and in particular they did not target so-called Big Data
applications. They were therefore reluctant to slow Datomic down,
especially since this is apparently not a problem (i.e. not an interesting
use-case) for the vast majority of their customers.

I think I do remember him saying that they kept their eyes out for new
ideas that could help them lift that limitation without negatively
impacting their more usual use-cases, but it was not a priority at the
moment.

Concerning what to do if you do indeed reach 10 billion datoms, he said the
best option would be to have multiple databases. This is really not as bad
as it sounds. I think all you loose there is some granularity in how the
peer fetches parts of the index; i.e. for constraints that span databases,
it will probably fetch a bit more data than would be needed had it been a
single database.

Please remember that this is just what I remember; it might be all wrong.

Post by Robin Heggelund Hansen
I think the reason Stu said "10 Billion Datoms" has to do with the peers
cache. At 10 Billion Datoms you will have data spread all over memory, and
you will have so much data, that the chances are pretty big that you won't
have a given dataset in cache, at that point Datomic will have to retrieve
a lot from storage, and at that point, Datomic will be slower than a
traditional database. You millage may vary of course, as it depends on your
dataset and workload, which i guess is why there are someone who does just
fine with 20-30 billion datoms.
When it comes to strategies, I guess the obvious thing to do is garbage
collection every once in a while. I personally garbage collect anything
older than a year.

Post by Benjamin Yu
I watched all the datomic training videos, and found the operational
capacity planning part interesting. The slide said "up to 10 billion
datoms" in a single datomic database. Stu made mention of some places going
up to 20-30 billion datoms.
As I understand it, that's all datoms including transaction datoms and
historical datoms (log)?
Not that I'll reach that limit anytime soon, but I am still curious about
it. This is just a purely hypothetical question that popped in my mind
while watching the video.
What strategies exist for being able to handle even more datoms? Vertical
scaling of just more powerful machines (peers)? Partitioning/Sharding?
Garbage collecting logs (like kafka)? I'm still coming from a world of lots
of lots of mysql.
Thanks,
Ben

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Benjamin Yu

2015-01-23 21:59:41 UTC

Hi All,

Thanks for the answers.

So, basically, I'll try to be smart about what **structured** data
(and how it's modeled) I put into an individual database. And thus,
the data would be condense, and have a more manageable growth profile
to avoid the massive writes characteristics of certain types of big
data apps. So that if I ever start approaching the limit, then (at
that point in time) hopefully, I may be able to more easily separate
the domain model's data (via exporting) from the single massive
database and import into different databases. And the strategy of
which would depend on the problem domain (modeling) in partitioning
the data by type or maybe sharding by account/user/client. And thus
the querying would have to change as such, but cross DB querying would
be very doable depending on how the schemas.

Thanks!
Ben

On Fri, Jan 23, 2015 at 11:51 AM, Stuart Halloway

1. The Datomic team currently tests up to 10 billion datoms.
2. At ten billion datoms, the root nodes of Datomic data structures start to
be significantly oversized. This puts memory pressure on every processes.
3. A ten billion datom database takes a long time (hours) to fully back up
or restore.
4. Robin's concern about performance is (a) partially correct: with a bigger
database, you will probably spend more time worrying about partitions,
caches, etc -- but also (b) partially wrong: nothing about this is
particular to Datomic, which is designed to work well with datasets
substantially larger than memory.
10 billion datoms is *not* a hard limit, but all of the reasons above you
should think twice before putting significantly more than 10 billion datoms
in a single database.
Stu

Post by Gary Verhaegen
I was at a Datomic training in November and that question came up. Here is
my understanding of what Stu explained.
The 10 billion datoms is a hard limit, and it includes all datoms in the
database: current values, history, transactions, everything.
It's an architectural limitation: the structure of the Datomic index (or
at least one of them) is a tree with a 100 000 branching factor and two
levels. This makes the tree able to store 10 billion datoms, no more.
Stu did briefly talk about how they could lift the limitation from Datomic
itself, but increasing the branching factor or adding a level (or
conditionally changing the number of levels depending on the amount of data)
would reduce performances for every use case. At the time, he mentioned that
Datomic does not really try to fit every single possible use case, and in
particular they did not target so-called Big Data applications. They were
therefore reluctant to slow Datomic down, especially since this is
apparently not a problem (i.e. not an interesting use-case) for the vast
majority of their customers.
I think I do remember him saying that they kept their eyes out for new
ideas that could help them lift that limitation without negatively impacting
their more usual use-cases, but it was not a priority at the moment.
Concerning what to do if you do indeed reach 10 billion datoms, he said
the best option would be to have multiple databases. This is really not as
bad as it sounds. I think all you loose there is some granularity in how the
peer fetches parts of the index; i.e. for constraints that span databases,
it will probably fetch a bit more data than would be needed had it been a
single database.
Please remember that this is just what I remember; it might be all wrong.
On Thursday, 22 January 2015, Robin Heggelund Hansen

Post by Robin Heggelund Hansen
I think the reason Stu said "10 Billion Datoms" has to do with the peers
cache. At 10 Billion Datoms you will have data spread all over memory, and
you will have so much data, that the chances are pretty big that you won't
have a given dataset in cache, at that point Datomic will have to retrieve a
lot from storage, and at that point, Datomic will be slower than a
traditional database. You millage may vary of course, as it depends on your
dataset and workload, which i guess is why there are someone who does just
fine with 20-30 billion datoms.
When it comes to strategies, I guess the obvious thing to do is garbage
collection every once in a while. I personally garbage collect anything
older than a year.

Post by Benjamin Yu
I watched all the datomic training videos, and found the operational
capacity planning part interesting. The slide said "up to 10 billion datoms"
in a single datomic database. Stu made mention of some places going up to
20-30 billion datoms.
As I understand it, that's all datoms including transaction datoms and
historical datoms (log)?
Not that I'll reach that limit anytime soon, but I am still curious
about it. This is just a purely hypothetical question that popped in my mind
while watching the video.
What strategies exist for being able to handle even more datoms?
Vertical scaling of just more powerful machines (peers)?
Partitioning/Sharding? Garbage collecting logs (like kafka)? I'm still
coming from a world of lots of lots of mysql.
Thanks,
Ben

--
You received this message because you are subscribed to the Google Groups
"Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Timothy Pote

2015-04-02 12:29:22 UTC

Stu,

So, to be clear, breaking a single database into multiple databases is a
valid strategy for dealing with the 10 billion datom limit?

Tim

1. The Datomic team currently tests up to 10 billion datoms.
2. At ten billion datoms, the root nodes of Datomic data structures start
to be significantly oversized. This puts memory pressure on every
processes.
3. A ten billion datom database takes a long time (hours) to fully back up
or restore.
4. Robin's concern about performance is (a) partially correct: with a
bigger database, you will probably spend more time worrying about
partitions, caches, etc -- but also (b) partially wrong: nothing about this
is particular to Datomic, which is designed to work well with datasets
substantially larger than memory.
10 billion datoms is *not* a hard limit, but all of the reasons above you
should think twice before putting significantly more than 10 billion datoms
in a single database.
Stu

Post by Gary Verhaegen
I was at a Datomic training in November and that question came up. Here
is my understanding of what Stu explained.
The 10 billion datoms is a hard limit, and it includes all datoms in the
database: current values, history, transactions, everything.
It's an architectural limitation: the structure of the Datomic index (or
at least one of them) is a tree with a 100 000 branching factor and two
levels. This makes the tree able to store 10 billion datoms, no more.
Stu did briefly talk about how they could lift the limitation from
Datomic itself, but increasing the branching factor or adding a level (or
conditionally changing the number of levels depending on the amount of
data) would reduce performances for every use case. At the time, he
mentioned that Datomic does not really try to fit every single possible use
case, and in particular they did not target so-called Big Data
applications. They were therefore reluctant to slow Datomic down,
especially since this is apparently not a problem (i.e. not an interesting
use-case) for the vast majority of their customers.
I think I do remember him saying that they kept their eyes out for new
ideas that could help them lift that limitation without negatively
impacting their more usual use-cases, but it was not a priority at the
moment.
Concerning what to do if you do indeed reach 10 billion datoms, he said
the best option would be to have multiple databases. This is really not as
bad as it sounds. I think all you loose there is some granularity in how
the peer fetches parts of the index; i.e. for constraints that span
databases, it will probably fetch a bit more data than would be needed had
it been a single database.
Please remember that this is just what I remember; it might be all wrong.

Post by Robin Heggelund Hansen
I think the reason Stu said "10 Billion Datoms" has to do with the peers
cache. At 10 Billion Datoms you will have data spread all over memory, and
you will have so much data, that the chances are pretty big that you won't
have a given dataset in cache, at that point Datomic will have to retrieve
a lot from storage, and at that point, Datomic will be slower than a
traditional database. You millage may vary of course, as it depends on your
dataset and workload, which i guess is why there are someone who does just
fine with 20-30 billion datoms.
When it comes to strategies, I guess the obvious thing to do is garbage
collection every once in a while. I personally garbage collect anything
older than a year.

Post by Benjamin Yu
I watched all the datomic training videos, and found the operational
capacity planning part interesting. The slide said "up to 10 billion
datoms" in a single datomic database. Stu made mention of some places going
up to 20-30 billion datoms.
As I understand it, that's all datoms including transaction datoms and
historical datoms (log)?
Not that I'll reach that limit anytime soon, but I am still curious
about it. This is just a purely hypothetical question that popped in my
mind while watching the video.
What strategies exist for being able to handle even more datoms?
Vertical scaling of just more powerful machines (peers)?
Partitioning/Sharding? Garbage collecting logs (like kafka)? I'm still
coming from a world of lots of lots of mysql.
Thanks,
Ben

--
You received this message because you are subscribed to the Google
Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Eric Lavigne

2015-04-04 19:08:24 UTC

Regarding memory pressure around 10 billion datoms... Is that an issue that
arises fairly suddenly around 10 billion datoms? Or is this memory pressure
fairly linear with respect to the number of datoms in the database? In
other words, if I take a large database with 18 billion datoms that needs 9
GB of active memory per peer, and divide it into 3 databases with 6 billion
datoms each, do I still need 9GB/3 + 9GB/3 + 9GB/3 = 9 GB of active memory
or did I save a lot of peer memory by performing that division?

A related question regarding memory pressure... I imagine that the amount
of active memory would be much higher on a peer that is performing queries
that involve a large portion of a database's datom's, but that some minimal
amount of active memory will be required just to connect to the database
and fetch a few entities. In the context of memory pressure caused by large
databases, are we talking primarily about the memory pressure associated
with connecting and fetching a few entities?

Of course it would be great if someone could provide ballpark numbers for
the memory needs of peers and transactors at various database sizes. The
capacity doc talks about using 1 GB for development and 4 GB for production
on both the transactor and peers. If 4 GB turns out to be the expected
memory usage for a 10 billion datom database, then 200 billion datoms could
be split across 20 databases, requiring 20*4GB = 80 GB of memory which
would require an r3.4xlarge instance for $1k per month (per peer or
transactor). Yes, that's a lot to pay per peer, but considering the extreme
scale we're discussing... that price seems reasonable enough.

I watched the NuBank video that Mike Haney linked. It is overall a very
glowing review of datomic, a small part of which is about how well Datomic
handles sharding. The comments about sharding are in two places, around
33:43 (superpowers 8 and 9) and a short comment at the end. The video gives
me a lot of hope that Datomic is useable at scale, but also raises some
questions.

http://youtu.be/7lm3K8zVOdY

NuBank's reason for sharding was to overcome the write throughput
limitations of DynamoDB, which they used as Datomic's storage backend. They
sharded into multiple Datomic databases, each using a separate DynamoDB on
a separate ec2 instance as a backend. All of these databases shared the
same transactor. There was also a claim about "inter-shard ACID
transactions" which seems to be partially explained by using one transactor
across all of these databases. They also had multiple transactors for high
availability, so I guess one of those transactors served all of those
databases while another just sat idle to take over if needed.

How can a transactor be configured to serve multiple databases?

Since the usual way to achieve high availability is to point two
transactors at the same database, how can we ensure that one of the
transactors handles all of the databases, rather than the two transactors
dividing up the databases randomly?

Since the transactor is only performing one transaction at a time, across
all those databases, I expect that only one of the databases would actually
be writing to DynamoDB at any given time. So how does this configuration
lead to higher write throughput if only one DynamoDB is actually being
written to at a time?

Transaction functions typically take "the database" as an argument. How
does this work in the context of "inter-shard ACID transactions"? Do they
end up taking multiple databases as arguments but only inserting into one
of them? Take multiple databases as arguments and somehow choose which of
those databases will receive various inserted datoms? Take one database as
argument, and explicitely connect to a few others just to peek?

Followup possibility, inspired by the video... If I have five datomic
databases, can I assign one transactor to three of them for inter-shard
transactions between those shards, and give the other two their own
transactors for increased throughput? Can I also change my mind later about
which shards should be served by a single transactor? That would be an
amazing option to have available if that is really possible.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lucas Cavalcanti

2015-04-06 04:09:56 UTC

To explain a little bit more about the Nubank use case for sharding:

- Our services don't have high write volume (relative to reads), so the
inter-shard ACID transactions means that we would still have one transactor
(the second is just for HA, it will be idle until promoted to active), and
N peers, where N is the number of shards. In other words, we would be only
sharding reads, both for load balancing and for benefiting the object
cache, so we could have the whole shard fitting in memory of a peer
instance. Also, it's a single dynamodb table, so the read and write
throughputs should be proportional on the number of shards.
Maybe sharding is not the best word to describe it. Datomic doesn't work
like most databases, where all reads and writes are centralized in one node
or a db cluster. In Datomic's Peer/Transactor model, all writes are
centralized on the transactor, but the reads are made by the Peers. And the
peer can run on the application instance, which is our case. Because of
this, sharding only reads makes sense if your application is read bound.

How can a transactor be configured to serve multiple databases?
You point it to a storage (e.g DynamoDB). Then you can create as many
databases as you want, no need to do anything special on the transactor
config.

Post by Eric Lavigne
Since the usual way to achieve high availability is to point two
transactors at the same database, how can we ensure that one of the
transactors handles all of the databases, rather than the two transactors
dividing up the databases randomly?

AFAIK only the active transactor will handle transactions, even if you have
multiple databases running on it.

Since the transactor is only performing one transaction at a time, across

Post by Eric Lavigne
all those databases, I expect that only one of the databases would actually
be writing to DynamoDB at any given time. So how does this configuration
lead to higher write throughput if only one DynamoDB is actually being
written to at a time?

It doesn't. To get more write throughput you'd have to have one DynamoDB
table and one transactor pair per database. Several databases on the same
DynamoDB table/transactor will share the write throughput.

Post by Eric Lavigne
Transaction functions typically take "the database" as an argument. How
does this work in the context of "inter-shard ACID transactions"? Do they
end up taking multiple databases as arguments but only inserting into one
of them? Take multiple databases as arguments and somehow choose which of
those databases will receive various inserted datoms? Take one database as
argument, and explicitely connect to a few others just to peek?

Only one database as arguments. Again, the sharding we mentioned on the
video is just for read scalability, it's still a single database. If you
need to have multiple databases, you wouldn't have inter-shard ACID
transactions. You'd have to select the proper db connection for each shard
and do one transaction per shard.

Followup possibility, inspired by the video... If I have five datomic

Post by Eric Lavigne
databases, can I assign one transactor to three of them for inter-shard
transactions between those shards, and give the other two their own
transactors for increased throughput? Can I also change my mind later about
which shards should be served by a single transactor? That would be an
amazing option to have available if that is really possible.

Only one database per transaction, you wouldn't get multi database ACID
transactions.

[]'s
Lucas Cavalcanti

Post by Eric Lavigne
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Eric Lavigne

2015-04-14 02:15:05 UTC

Stu and Lucas,

Thanks for clarifying. In summary, it sounds like...

1) Sharding with around 10 billion datoms per database is a good approach
to dealing with the per-database size limit.

2) The main problem caused by sharding is that transactions are limited to
one shard. This issue should be fairly rare (try to keep related entities
in the same shard), and can be handled with application-level locking.

Overall, my impression is that Datomic will work well in the 100-200
billion datom range, while still requiring fewer gymnastics than the "big
data" alternatives.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

e***@aviso.io

2017-05-19 15:05:54 UTC

Hi,

Sorry to drag up this old thread, but a quick clarification would help me.
Is that an American billion (a thousand million) or a European one (a
million million)?

Many thanks,

Ed

Post by Eric Lavigne
Stu and Lucas,
Thanks for clarifying. In summary, it sounds like...
1) Sharding with around 10 billion datoms per database is a good approach
to dealing with the per-database size limit.
2) The main problem caused by sharding is that transactions are limited to
one shard. This issue should be fairly rare (try to keep related entities
in the same shard), and can be handled with application-level locking.
Overall, my impression is that Datomic will work well in the 100-200
billion datom range, while still requiring fewer gymnastics than the "big
data" alternatives.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alan Thompson

2017-05-19 16:07:21 UTC

While I don't work at Cognitect I'm quite certain it is 10^9. Few people
here have even heard of the alternate terminology system.
Alanâ

Post by e***@aviso.io
Hi,
Sorry to drag up this old thread, but a quick clarification would help me.
Is that an American billion (a thousand million) or a European one (a
million million)?
Many thanks,
Ed

Post by Eric Lavigne
Stu and Lucas,
Thanks for clarifying. In summary, it sounds like...
1) Sharding with around 10 billion datoms per database is a good approach
to dealing with the per-database size limit.
2) The main problem caused by sharding is that transactions are limited
to one shard. This issue should be fairly rare (try to keep related
entities in the same shard), and can be handled with application-level
locking.
Overall, my impression is that Datomic will work well in the 100-200
billion datom range, while still requiring fewer gymnastics than the "big
data" alternatives.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alan Thompson

2017-05-19 16:09:24 UTC

âA good overview & historical comparisonâ for anyone interested:

Post by Alan Thompson
While I don't work at Cognitect I'm quite certain it is 10^9. Few people
here have even heard of the alternate terminology system.
Alanâ

Post by e***@aviso.io
Hi,
Sorry to drag up this old thread, but a quick clarification would help
me. Is that an American billion (a thousand million) or a European one (a
million million)?
Many thanks,
Ed

Post by Eric Lavigne
Stu and Lucas,
Thanks for clarifying. In summary, it sounds like...
1) Sharding with around 10 billion datoms per database is a good
approach to dealing with the per-database size limit.
2) The main problem caused by sharding is that transactions are limited
to one shard. This issue should be fairly rare (try to keep related
entities in the same shard), and can be handled with application-level
locking.
Overall, my impression is that Datomic will work well in the 100-200
billion datom range, while still requiring fewer gymnastics than the "big
data" alternatives.

--
You received this message because you are subscribed to the Google Groups
"Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ed Bowler

2017-05-19 16:12:42 UTC

I figured it would be, but best to ask. It's the difference between 1 and
1000 years in our capacity planning.

Thanks,

Ed

Post by Alan Thompson
http://youtu.be/C-52AI_ojyQ

Post by Alan Thompson
While I don't work at Cognitect I'm quite certain it is 10^9. Few people
here have even heard of the alternate terminology system.
Alanâ

Post by e***@aviso.io
Hi,
Sorry to drag up this old thread, but a quick clarification would help
me. Is that an American billion (a thousand million) or a European one (a
million million)?
Many thanks,
Ed

Post by Eric Lavigne
Stu and Lucas,
Thanks for clarifying. In summary, it sounds like...
1) Sharding with around 10 billion datoms per database is a good
approach to dealing with the per-database size limit.
2) The main problem caused by sharding is that transactions are limited
to one shard. This issue should be fairly rare (try to keep related
entities in the same shard), and can be handled with application-level
locking.
Overall, my impression is that Datomic will work well in the 100-200
billion datom range, while still requiring fewer gymnastics than the "big
data" alternatives.

--
You received this message because you are subscribed to the Google
Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "Datomic" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/datomic/iZHvQfamirI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matan Safriel

2017-07-21 22:37:30 UTC

The alternate terminology system is actually well known outside the colony
gone awry, but anyway...
I find 10 billion datoms to be not that much for an application. Sharding
reads in application code sounds a little off to me, not that I've
developed a better distributed database myself. Is there any work or plan
to provide a better path for scalability?

Post by Alan Thompson
While I don't work at Cognitect I'm quite certain it is 10^9. Few people
here have even heard of the alternate terminology system.
Alanâ

Post by e***@aviso.io
Hi,
Sorry to drag up this old thread, but a quick clarification would help
me. Is that an American billion (a thousand million) or a European one (a
million million)?
Many thanks,
Ed

Post by Eric Lavigne
Stu and Lucas,
Thanks for clarifying. In summary, it sounds like...
1) Sharding with around 10 billion datoms per database is a good
approach to dealing with the per-database size limit.
2) The main problem caused by sharding is that transactions are limited
to one shard. This issue should be fairly rare (try to keep related
entities in the same shard), and can be handled with application-level
locking.
Overall, my impression is that Datomic will work well in the 100-200
billion datom range, while still requiring fewer gymnastics than the "big
data" alternatives.

--
You received this message because you are subscribed to the Google Groups
"Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Compton

2017-07-21 23:48:20 UTC

There is a feature request open for archiving out slices of databases by
time: https://receptive.io/app/#/case/17710. I'm not sure if that would
work for your application domain or not, but you can vote for it if it
would be useful.

Post by Matan Safriel
The alternate terminology system is actually well known outside the colony
gone awry, but anyway...
I find 10 billion datoms to be not that much for an application. Sharding
reads in application code sounds a little off to me, not that I've
developed a better distributed database myself. Is there any work or plan
to provide a better path for scalability?

Post by Alan Thompson
While I don't work at Cognitect I'm quite certain it is 10^9. Few people
here have even heard of the alternate terminology system.
Alanâ

Hi,

Post by Alan Thompson

Post by e***@aviso.io
Sorry to drag up this old thread, but a quick clarification would help
me. Is that an American billion (a thousand million) or a European one (a
million million)?
Many thanks,
Ed

Post by Eric Lavigne
Stu and Lucas,
Thanks for clarifying. In summary, it sounds like...
1) Sharding with around 10 billion datoms per database is a good
approach to dealing with the per-database size limit.
2) The main problem caused by sharding is that transactions are limited
to one shard. This issue should be fairly rare (try to keep related
entities in the same shard), and can be handled with application-level
locking.
Overall, my impression is that Datomic will work well in the 100-200
billion datom range, while still requiring fewer gymnastics than the "big
data" alternatives.

--
You received this message because you are subscribed to the Google
Groups "Datomic" group.

To unsubscribe from this group and stop receiving emails from it, send an

Post by e***@aviso.io
For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christopher Cook

2017-12-08 19:17:36 UTC

Is there a particular command to find the number of datoms in a current
database?

Thanks,
Chris

Post by Daniel Compton
There is a feature request open for archiving out slices of databases by
time: https://receptive.io/app/#/case/17710. I'm not sure if that would
work for your application domain or not, but you can vote for it if it
would be useful.

Post by Matan Safriel
The alternate terminology system is actually well known outside the
colony gone awry, but anyway...
I find 10 billion datoms to be not that much for an application. Sharding
reads in application code sounds a little off to me, not that I've
developed a better distributed database myself. Is there any work or plan
to provide a better path for scalability?

Post by Alan Thompson
While I don't work at Cognitect I'm quite certain it is 10^9. Few
people here have even heard of the alternate terminology system.
Alanâ

Hi,

Post by Alan Thompson

Post by e***@aviso.io
Sorry to drag up this old thread, but a quick clarification would help
me. Is that an American billion (a thousand million) or a European one (a
million million)?
Many thanks,
Ed

Post by Eric Lavigne
Stu and Lucas,
Thanks for clarifying. In summary, it sounds like...
1) Sharding with around 10 billion datoms per database is a good
approach to dealing with the per-database size limit.
2) The main problem caused by sharding is that transactions are
limited to one shard. This issue should be fairly rare (try to keep related
entities in the same shard), and can be handled with application-level
locking.
Overall, my impression is that Datomic will work well in the 100-200
billion datom range, while still requiring fewer gymnastics than the "big
data" alternatives.

--
You received this message because you are subscribed to the Google
Groups "Datomic" group.

To unsubscribe from this group and stop receiving emails from it, send

Post by e***@aviso.io
For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups
"Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christopher Cook

2017-12-08 19:19:25 UTC

Is there a particular command to find the number of datoms in a current
database?

Thanks,
Chris

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Linus Ericsson

2017-12-09 00:25:05 UTC

Maybe

(count (seq (d/datoms (d/history (db conn)) :eavt)))

works even for larger databases.

Post by Christopher Cook
Is there a particular command to find the number of datoms in a current
database?
Thanks,
Chris
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Timothy Baldridge

2017-12-09 03:42:45 UTC

The number of Datoms in the DB is reported regularly via the metrics
system: http://docs.datomic.com/monitoring.html See the metrics entry for
"Datoms".

Post by Linus Ericsson
Maybe
(count (seq (d/datoms (d/history (db conn)) :eavt)))
works even for larger databases.

Post by Christopher Cook
Is there a particular command to find the number of datoms in a current
database?
Thanks,
Chris
--
You received this message because you are subscribed to the Google Groups
"Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

--
âOne of the main causes of the fall of the Roman Empire was thatâlacking
zeroâthey had no way to indicate successful termination of their C
programs.â
(Robert Firth)

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dustin Getz

2017-12-09 23:29:49 UTC

is it true that a better metric would be count of datoms in an index rather than in the log?

--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

19 Replies
24 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Benjamin Yu 2015-01-21 23:10:13 UTC

Mike Haney 2015-01-22 03:58:47 UTC

Robin Heggelund Hansen 2015-01-22 08:37:41 UTC

Gary Verhaegen 2015-01-22 10:45:57 UTC

Benjamin Yu 2015-01-23 21:59:41 UTC

Timothy Pote 2015-04-02 12:29:22 UTC

Eric Lavigne 2015-04-04 19:08:24 UTC

Lucas Cavalcanti 2015-04-06 04:09:56 UTC

Eric Lavigne 2015-04-14 02:15:05 UTC

e***@aviso.io 2017-05-19 15:05:54 UTC

Alan Thompson 2017-05-19 16:07:21 UTC

Alan Thompson 2017-05-19 16:09:24 UTC

Ed Bowler 2017-05-19 16:12:42 UTC

Matan Safriel 2017-07-21 22:37:30 UTC

Daniel Compton 2017-07-21 23:48:20 UTC

Christopher Cook 2017-12-08 19:17:36 UTC

Christopher Cook 2017-12-08 19:19:25 UTC

Linus Ericsson 2017-12-09 00:25:05 UTC

Timothy Baldridge 2017-12-09 03:42:45 UTC

Dustin Getz 2017-12-09 23:29:49 UTC

about - legalese

Loading...