Discussion:
gc-storage question
m***@wormbase.org
2017-05-22 13:11:45 UTC
Permalink
Hi,
We have an import job that takes approx. 7 hours to transact the EDN
generated from our legacy database, the resulting size on is currently
~140GB, which when backed-up and restored shrinks this down to 16GB.

The import process necessitated storing ancillary information in a custom
entity :import/temp.
Because this information is not needed after the import job has concluded,
we excise this data then attempt to reclaim storage at the end of the
process.

I'm experiencing an issue where datomic.api/gc-storage seems to never
return after performing the import job;
I'm seeing datomic.garbage :garbage/collected event in the log when I'd
expect (roughtly the same time the import job takes to run),
but the code we run to excise this temporary data (sync-index,
gc-storage) does not seem to return (I've left it running for > 3 days
before giving up waiting).

If someone could point out anything that I'm doing wrong with regard to the
strategy for performing gc-storage I'd be most grateful!

The code we run to do this can be seen here:
https://github.com/WormBase/pseudoace/blob/master/src/pseudoace/cli.clj#L319-L329


I wasn't sure of the "correct" approach, but stumbled upon:
https://hashrocket.com/blog/posts/bulk-imports-with-datomic
which is where the code was taken (verbatim!) from (seemed to make sense to
me!)

Many thanks,
Matt
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Jaret Binford
2017-05-22 14:58:26 UTC
Permalink
Matt,

Excision is not the recommended approach to reclaim database space as
excision has a very specific use case (.i.e. legal requirements to wipe out
an identity). In this case, I would recommend that you create a secondary
database for the temporary data required for your import and then delete
that database. After excision, the indexing job will have to rewrite any
segment that contains one of those excised datoms. So in your example, you
are waiting on in the index job to rewrite however many segments contain
:import/temp.

Thanks,
Jaret
Post by m***@wormbase.org
Hi,
We have an import job that takes approx. 7 hours to transact the EDN
generated from our legacy database, the resulting size on is currently
~140GB, which when backed-up and restored shrinks this down to 16GB.
The import process necessitated storing ancillary information in a custom
entity :import/temp.
Because this information is not needed after the import job has concluded,
we excise this data then attempt to reclaim storage at the end of the
process.
I'm experiencing an issue where datomic.api/gc-storage seems to never
return after performing the import job;
I'm seeing datomic.garbage :garbage/collected event in the log when I'd
expect (roughtly the same time the import job takes to run),
but the code we run to excise this temporary data (sync-index,
gc-storage) does not seem to return (I've left it running for > 3 days
before giving up waiting).
If someone could point out anything that I'm doing wrong with regard to
the strategy for performing gc-storage I'd be most grateful!
https://github.com/WormBase/pseudoace/blob/master/src/pseudoace/cli.clj#L319-L329
https://hashrocket.com/blog/posts/bulk-imports-with-datomic
which is where the code was taken (verbatim!) from (seemed to make sense
to me!)
Many thanks,
Matt
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
m***@wormbase.org
2017-05-22 15:29:39 UTC
Permalink
Jaret,
thanks for the advice, makes sense. Looking into making that change now.

Cheers,
Matt
Post by Jaret Binford
Matt,
Excision is not the recommended approach to reclaim database space as
excision has a very specific use case (.i.e. legal requirements to wipe out
an identity). In this case, I would recommend that you create a secondary
database for the temporary data required for your import and then delete
that database. After excision, the indexing job will have to rewrite any
segment that contains one of those excised datoms. So in your example, you
are waiting on in the index job to rewrite however many segments contain
:import/temp.
Thanks,
Jaret
Post by m***@wormbase.org
Hi,
We have an import job that takes approx. 7 hours to transact the EDN
generated from our legacy database, the resulting size on is currently
~140GB, which when backed-up and restored shrinks this down to 16GB.
The import process necessitated storing ancillary information in a custom
entity :import/temp.
Because this information is not needed after the import job has
concluded, we excise this data then attempt to reclaim storage at the end
of the process.
I'm experiencing an issue where datomic.api/gc-storage seems to never
return after performing the import job;
I'm seeing datomic.garbage :garbage/collected event in the log when I'd
expect (roughtly the same time the import job takes to run),
but the code we run to excise this temporary data (sync-index,
gc-storage) does not seem to return (I've left it running for > 3 days
before giving up waiting).
If someone could point out anything that I'm doing wrong with regard to
the strategy for performing gc-storage I'd be most grateful!
https://github.com/WormBase/pseudoace/blob/master/src/pseudoace/cli.clj#L319-L329
https://hashrocket.com/blog/posts/bulk-imports-with-datomic
which is where the code was taken (verbatim!) from (seemed to make sense
to me!)
Many thanks,
Matt
--
You received this message because you are subscribed to the Google Groups "Datomic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datomic+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...