Over your storage limit? Archiving may not be your friend

Janel
Kilo Sage

This is a continuation of the mini-series "Over your storage limit? Reducing our storage footprint; our lessons learned."

https://www.servicenow.com/community/developer-forum/over-your-storage-limit-reducing-our-storage-fo...

 

This post is about how we learned that archiving records actually consumed more space and where we stopped archiving in favor of doing straight deletions.

 

Disclaimer:

This post is about the deletion of records to reduce the storage footprint size at our company.  This is what worked for us and it may not work for you.  Know your policies and retention periods and as always, review and make sure you test everything in a sub-prod instance first.

 

 

Archiving may not be your friend

An archive table is any table that starts with ar_.  Archiving records is for performance purposes and will actually increase your storage footprint.  It has a few extra tables that go with it. 

  • sys_archive_log, this works as a middle-man for restoring records, but is not the actual archived record itself.  It is an XML of the original record that is separate from the ar_ version of the record.  Without this record you would not be able to restore an archived record.  This table grows substantially in size the more you archive since a sys_archive_log record is created in addition to each archived record.  This is our third largest table.
  • sys_archive_destroy_log, that contains the logging of records that were deleted via an archive destroy rule.  We generally found these pointless to keep after a few months.

Archived records caused us the most amount of pain.  You cannot delete archived records by any sort of conditions.  You can only delete archived records with an archive destroy rule.  Destroy rules are only based on how long the record has been archived.  There are no options for conditions.

 

Be careful how you setup archiving!

Here is the dilemma we keep running into due to the way we initially set up archiving.  Let's say we have 10,000 records that are anywhere from now to 4 years old and we start archiving them at 1 year old.  So our first round of archiving contains records that are 2-4 years old.  We find out later that we can delete any of these records that are 3 years or older, neat.  How do we delete the archived records that are only 3 years or older?  We can't. 

 

Since all the records were archived at the same time, they all have the same archive duration.  There is no way to pick and choose which of those records to delete.  It is all or nothing.  Again, destroy roles only give you the option to delete based on the archived duration, there are no options for conditions.

 

In these situations, Support can delete archived records based on a condition.

 

 

If you're thinking about archiving something for the first time, don't do a massive batch.

Begin archiving records in smaller batches (based on the original records create date) with a few days in between so they can be identified by a date.  Then once you have caught up to the duration you planned on using, we just turned it on and let it run.  E.G. only archive records that are 4 years old (or as far back as you need to).  Then a few days later, archive records that are 3 years old, a few days later do records 2 years old, then finally run steady-state at 1 years old.

 

What also trapped us were the number of auxiliary records we needed to archive as well that we actually didn't need to.  A requested item for example, needs to have catalog tasks archived, workflow contexts, approvals, the request, etc. 

But does it?

 

Our new approach is to try and delete as much as possible before we need to archive, thinking about multiple retention policies that could apply to a table.  We posed questions like,

  • Do we need those workflow contexts with requested items?
  • Are the approvals tracked somewhere else, can we the delete approval records?
  • Are there multiple retention policies that apply to this table?  Does it affect when some records can be deleted?
    • Can self-service incidents be deleted before priority 1 incidents?  Can they be deleted instead of archiving?
  • What scenarios would we need to restore a record?  If we restore a record, is there enough related records for the requestor to get what they need?

 

sys_email is a lesson in archiving tables that could be cross-processes or have multiple retention policies (sys_email).

This is what burned us the most.  We were archiving 2 different sets of data from sys_email.  One set at 7 years for HR related things and pretty much everything else after only 36 months.  Because we had multiple retention policies happening, we couldn't delete close to 40million archived emails, due to retention of other emails we needed to keep. 

 

After getting help from support to delete the ar_sys_emails, we did setup a job to delete as much email as we possible could before any archive rules run.  More information on wrangling sys_email can be found here:

https://www.servicenow.com/community/developer-forum/over-your-storage-limits-sys-email-is-loud-and-...

 

I hope this helps you get thinking about how you should be using archiving.

3 REPLIES 3

SørenC
Tera Contributor

Hi

I'm a bit confused about above. "We find out later that we can delete any of these records that are 3 years or older, neat.  How do we delete the archived records that are only 3 years or older?  We can't." 

 

In the archive rule, why can you just changed the destroy rule to 3 years or older? I'm not under the impression that the "archive duration" is stored on each record in the archive table, but controlled by the destroy rule. And if you change the destroy rule, then you can change the archive duration - but it is for all records 🙂

We had things mixed into the archive that were only 1-2 years old and archive destroy rules are based on the archived date.  So when we found out we could delete anything with a created date older than 3 years, we weren't able to because everything had the same archived date.   Had we archived in chunks (7+ years, 6-7 years, 5-6 years, etc.), then it would have been possible.

 

We thought about restoring the records and then deleting or archiving again, but we quickly found out the insanity of related records was not worth it (because when you restore, it doesn't restore everything).

 

I haven't looked in a while, but I hope SN has updated destroy rules to be more robust.

abhishek_s
Tera Contributor

Well, for us, there has been one step back with respect to the Archive Destroy Rules. I believe years ago, you could simply specify the Archived date as the condition on any archived table and set up a new Archive Destroy Rule to destroy everything before that date. Now it appears that on the Archive Destroy Rule you need to specify the reference to the Archive Rule so that the destruction happens on the basis of the corresponding sys_archive_log records. This is supposedly for better performance and better cleanup of related tables. sys_archive_log table gets crazy big and we had a Table Cleaner rule to keep it under control. So now our Archive Destroy Rules cannot destroy anything because we have deleted the corresponding sys_archive_log records!

 

But there is a step forward as well. You can destroy archives using the Table Cleaner where you can specify whatever conditions you want provided they are efficient.