Finding and deleting orphaned attachments
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2023 11:04 AM - edited 02-26-2025 05:22 AM
Looks like I can't create articles anymore so I guess I'll post it as a question.
2025-01-29 Edit: As of Washington, there is now an out-of-box way to delete orphaned attachments...
...which ServiceNow is keeping really quiet about. Maybe this article will resurface one day.
https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB1706592
In the mean time, Mark Roethof was able to cover some of the out-of-box job here: https://www.servicenow.com/community/now-platform-articles/purge-orphan-attachments-custom-solution-...
Recently, we got to go through a lovely exercise of drastically reducing our storage footprint. I'd like to share how we are able to identify orphaned records on sys_attachment and delete them.
The biggest issue with orphans on this (or any) table is that it could happen at any time to any attachment record. A record from 2018 could be orphaned in 2024 and there is no way to know about via the sys_attachment record. There is no value change to the record, it still has a table_name and table_sys_id populated, but that table_sys_id is a deleted record. Not a chance you could find those in a list view.
So how do you know what is legit and what should be deleted?
Check every record.
First, we wanted to know where the orphans were coming from and fix those processes where we could. What tables are we commonly finding orphans on?
Since sys_attachment is so big, I manually ran this script in 6-month intervals and at the end, I had it updating a record so I had somewhere to hold on to the data.
Unfortunately, because an attachment could be orphaned at any time, I had to keep repeating this from the oldest record to the new one to get a few days worth of data.
//Capture how many attachments we have that have a table_name and table_sys_id populated, but not valid records.
var start_date = '2023-06-27';
var end_date = '2023-07-06';
var valid_attachments = new GlideRecord('sys_attachment');
valid_attachments.addEncodedQuery("table_nameNOT LIKEZZ_YY^table_nameNOT LIKEinvisible^table_name!=placeholder^sys_created_onBETWEENjavascript:gs.dateGenerate('" + start_date + "','00:00:00')@javascript:gs.dateGenerate('" + end_date + "','23:59:59')");
valid_attachments.orderBy('sys_created_on');
valid_attachments.query();
var skipped = 0;
var deleted = 0;
var orphaned_tables = []; //Capture number of times a table appears
var orphaned_ids = ''; //Only use for testing
while (valid_attachments.next()) {
if (!gs.tableExists(valid_attachments.table_name)) { //If the table_name is invalid, so nothing
//orphaned_tables.push('!table-' +valid_attachments.table_name); //Or capture it!
continue;
}
var record_check = new GlideRecord(valid_attachments.table_name);
if (record_check.get(valid_attachments.table_sys_id) && record_check.isValidRecord()) {
skipped++;
continue;
} else {
deleted++;
orphaned_tables.push(valid_attachments.getValue('table_name'));
//orphaned_ids += valid_attachments.sys_id + ','; //Uncomment for a SMALL amount if you want to look at some specific test records.
//valid_attachments.deleteRecord(); //We chose to delete the records as we went.
}
}
orphaned_tables.sort();
//We want to know how many times the same tables appear in our orphaned_table list.
var table = null; //The table we want to count on.
var table_count = 0;
var final_list = '';
for (var i = 0; i < orphaned_tables.length; i++) {
if (orphaned_tables[i] != table) { //If the current element is not equal to the currently set table we're checking on, that means we moved on to the next table and we capture the final count now.
if (table_count > 0) {
final_list += table + ': ' + table_count + '\n';
}
table = orphaned_tables[i]; //Reset this to the next table to start counting on.
table_count = 1; //And go back to one
} else {
table_count++; //If the current element is the same table, add to the counter.
}
}
if (table_count > 0) {
final_list += table + ': ' + table_count + '\n';
}
var text = '\n' + gs.nowDateTime() + " Valid attachment check: " + start_date + " to " + end_date + "\nValid: " + skipped + '\nDeleted: ' + deleted + '\n\n' + final_list + '-------- -------- -------- -------- \n'; // +orphaned_ids;
var updateStory = new GlideRecord('rm_story'); //Instead of trying to log all this, update the Description in STRY0057314
updateStory.get('544229f7c3a2a9d4caa7b1deb00131db');
updateStory.description += text;
updateStory.update();
The above helped us learn where we were seeing orphans come from and we were able to start addressing them. (mostly from archiving and ml_model_artifact).
Now that we figured out and fixed where some of them were coming from, we knew this was just going to be a thing going forward and needed to address them.
Next, I made a scheduled job that crawls the sys_attachment_table in increments of 200,000 records every hour and verifies if the record the attachment is linked to is valid or not. If it isn't, then delete it. This is very similar to above, just running in automated chunks (now that I think about it, I could have used this same setup below to automate the above with dates instead of manually adjusting the dates).
Adjust the values for what works for you and as always, test in a sub-prod first!
You will need 2 system properties to keep track of which chunk of records to look at (yes, this will shift slightly as records are added or deleted through normal means).
Both properties should be integers.
Property 1 (called "orphaned.attachment.cleanup.window1") should be set to zero
Property 2 (called "orphaned.attachment.cleanup.window2") should be set to the first chunk of records you want to delete (I started ours at 200,000).
Now you'll need the scheduled job.
Name: Delete orphaned attachments.
Run: Periodically
Repeat Interval 1 hour
//We get many orphaned attachments on sys_attachment where table_sys_id left with a dead sys_id. Find these records and attempt to look them up. If there is not a valid record found, delete the attachment.
//We know archiving leaves behind attachments and so does ml_model_artifact (there is a PRB for that).
//When an attachment gets orphaned there is no update to the attachment record so we can't do something simple like, look for updates made yesterday and only check those. An attachment from 2018 can get orphaned in 2024 and we'd never know about it unless we look at all the attachments. So we do that hourly and in chunks using chooseWindow to do paging. This is better than setLimit because setLimit will just look at the same first chunk of records over and over. We need to paginate.
var attach_total = new GlideAggregate('sys_attachment');
attach_total.addEncodedQuery("table_nameNOT LIKEZZ_YY^table_nameNOT LIKEinvisible^table_name!=placeholder");
attach_total.addAggregate('COUNT');
attach_total.query();
var total = 0;
attach_total.next();
total = attach_total.getAggregate('COUNT'); //We need to get the total so we know when to reset our windows sizes.
var chunky = 200000; //This is the number of records were will do at a time.
var window1 = Number(gs.getProperty('orphaned.attachment.cleanup.window1')); //This will be where we left off from last run
var window2 = Number(gs.getProperty('orphaned.attachment.cleanup.window2')); //This will be the next number of pages set by the last run.
var valid_attachments = new GlideRecord('sys_attachment');
valid_attachments.addEncodedQuery("table_nameNOT LIKEZZ_YY^table_nameNOT LIKEinvisible^table_name!=placeholder"); //ZZ_YY tables are fields that contain images (like catalog item list images), invisible is for OOB things, and placeholder has a ton of records that we have a table cleaner for already.
valid_attachments.chooseWindow(window1, window2);
valid_attachments.orderBy('sys_created_on');
valid_attachments.query();
while (valid_attachments.next()) {
if (!gs.tableExists(valid_attachments.table_name)) { //If the table doesn't exist, abort. This is up to you how you want to handle these.
continue;
}
var record_check = new GlideRecord(valid_attachments.table_name);
if (record_check.get(valid_attachments.table_sys_id) && record_check.isValidRecord()) { //If the record is valid...
continue; //Do nothing and continue to the next one.
} else {
valid_attachments.deleteRecord(); //If the record is not valid, then delete it.
}
} //End of while
//When window2 exceeds the total number of records, we reset the system properties.
//We get the properties via GlideRecord so we can use autoSysFields(false) and not flood the versions list with a zillion things since it runs hourly.
//If you don't care that the versions list will be huge, you can instead delete this and just do gs.setProperty('property.name', value);
var prop1 = new GlideRecord('sys_properties');
prop1.get('name', 'orphaned.attachment.cleanup.window1');
var prop2 = new GlideRecord('sys_properties');
prop2.get('name', 'orphaned.attachment.cleanup.window2');
if (total - window2 < 0) { //Once our total number of records minus our lagest pagnation number is less than zero, we restart everything.
prop1.value = 0;
prop2.value = chunky;
} else { //If we're not restart, then set our proeprties to the updated windows.
prop1.value = window2.toFixed(0);
prop2.value = (window2 + chunky).toFixed(0);
}
//If you don't care about the versions list, you can just use setProperty above instead.
prop1.autoSysFields(false); //Don't track as an update
prop1.update();
prop2.autoSysFields(false); //Don't track as an update
prop2.update();
And this has been working pretty well for us. 200,000 records is quick to get through and we haven't seen any sort of performance impact.
I hope this helps you identify and manage orphaned attachments!
- 3,932 Views
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2023 11:12 AM
Thanks for sharing @Janel , this was worth reading and taking home good stuff 🙂
Regards,Sushant Malsure
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2023 11:34 AM
A few tips if you like to improve on performance even further.
- avoid defining a new GlideRecord('..') inside of a loop.
every time you call the "new" keyword you are re-initializing a new class. This is very noticable in performance and time of execution. Specially when you do queries of 200 000 records.
You can define all the GlideRecords for all tables at start. and then empty the gliderecord for next loop with gr.initialize().
- Avoid multiple trips between the database and the server.
When you need to get multiple records by ID in a single query. Get the ID's in an array. and then query the records all a once.
Every time you do a gr.get() or gr.query()
Your server needs to go to the database, fetch the record and then come back to the server.
If you do this 200 000 times. that is 400 000 trips back and forward.
Do this instead:
1: get all the sys_id's that you want to query in an array.
2: do a single gliderecord query like this:
gr.addQuery('sys_id', 'IN', myArrayOfIDs);
The difference i got for 10 000 records is
Around 40 seconds for a gr.get() loop.
And around 2 seconds with this method.
So it's an insane improvement!
Good luck! 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2023 11:55 AM
Thanks for the info! I'll give it a go tomorrow and report back (and edit) what I find.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2023 12:51 PM
After rereading this, I don't think I understand what you're talking about. Putting it into an array is just moving the loop from a while to a for. The orphans could be across multiple tables so it isn't feasible to declare all the possible tables we need to hit up.
Do you mean something like this would be better because we aren't declaring a new GlideRecord each time?
This does work when I did a quick test in a background script.
var record_check = new GlideRecord('task'); //Delcare it once and reassign it to new tables as we go.
while (valid_attachments.next()) {
if (!gs.tableExists(valid_attachments.table_name)) {
continue;
}
record_check = GlideRecord(valid_attachments.table_name);
if (record_check.get(valid_attachments.table_sys_id) && record_check.isValidRecord()) {
continue;
} else {
valid_attachments.deleteRecord();
}
} //End of while