MyBB Community Forums

Full Version: Clean up >400K attachments
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Forum has more than 400K attachments after 8 years. Running the existing AdminCP / Forum and Posts / Attachments / Find Orphaned Attachments does not seem practical on a volume this large. It has never finished even after hours of time. With nary an indication of progress beyond "Scanning Step 2, Please Wait".

I know there are some missing attachments due to incomplete transfer during server migrations the past 5 years. I'd like to clean up the concordance between the database and the saved files. So, some questions:

1. Code in admin/modules/forum/attachment.php for orphans goes through the entire set. Is there a way to modify existing core code to apply a date range to the search?
2. What query with date range would generate a suitable list for manual comparison?
3. Or perhaps a thread-based indication of missing attachments for moderation?
4. Or perhaps a UserCP-based indicator in Manage Attachments that looks to see if there is a file in the system?

Thoughts?

How does a Big Board manage this?
I have > 700k attachments and would love feedback on this subject as well.
Is anyone able to offer feedback on this? I need to find/remove orphaned attachments and clean things up a bit.
Had over 10k at one time, which is no where near the number you're talking.  Pre-upgrade, our board was 1806 - had a devil's time cleaning up about 11GB of attachments.

If I remember right, one strategy I used was searching by download count - sorted ascending.  Seems to me files without a poster or thread associated would be auto-selected for deletion.  I just deleted them, then always seemed to be able to run orphaned attachments.  Which could be a number/resource thing.

Since you control the number of results per page - might be worth experimenting.  Running the query shouldn't do any harm.  Until someone from MyBB weighs in with a better answer. Wink

good luck...
In my original post, it turns out the first option yields results.

I had previously installed MyConversations plugin (an excellent extension of the standard private message system), which already includes code to ensure MyConversations attachments are searched for orphans.

The biggest problem is the original code searches for *all* orphans before returning results which can be evaluated and/or deleted. This takes a seemingly interminable amount of time on a forum like mine with 443K attachments. Long enough time that I was convinced there was a flaw in code or a server limit was reached. Abort the process would be someone's first response.

Working with @Laird, the solution was to first break the core two-step orphan search into independent processes, add a progress indicator to show the process was continuing, and to allow a limit on results. This last item because even if you found and listed every orphan, the max_input_vars establishes the limit on how many of those will be deleted, and you'd have to start the search again until the result was "No orphans found."

My production server has 455K attachments in the database, somewhat less than that in the file system. Over the last few years, I've introduced orphans (missing files) during forum migration process to another server (changing hosts and servers maybe 5 times in 6 years). Additionally, files are uploaded but never incorporated into a post, or a contaning post is deleted.

Development process was tested on a similar server with 15% of uploads from production, plus several thousand attachment files added but not in the database.

Screenshot shows the current work in progress. This is a process which can be performed and completed satisfactorily.

[attachment=44802]

For a big board, the core code is far less capable of finishing the task of clearing out orphans. Modifications were made to an existing paid plugin which is already valuable to my forum members. The added value of clearing orphans on a big board pays off in server resources.

Moving to the production server, there were 6,495 orphans (files on the server but not in the database) found in step 1 using approximately 2.5GB of server file space. Additionally, there were 12,808 records in the database which did not have a corresponding file.

Autodelete cycled through and deleted orphans 985 orphans at a time, and was efficient because the query started where the previous cycle left off. On the test server, the on-screen progress counter worked, but not so on the production server. I did not implement any htaccess changes because I could see progress without waiting very long. At about 4 seconds or less, scanning for orphans reached the 985 limit, delayed for 4 seconds, then deleted the found orphans in about the same amount of time.

As a final check, when all orphans were detected and deleted, I ran a final scan. It took about 10 seconds to finish the step 1 scan, and about 30 seconds to complete the step 2 scan to find "No orphans found."

[attachment=44803]

This plugin development took a while to complete and I think it will work for the average server. We tested with a VPS, changing various php configurations, and I think this will solve the problem or big board orphans. It works on my server.

Thank you @Laird. Job well done. The value for a paid plugin is worth it.
This solution was a collaborative effort. Thanks to @Laird for his excellent myBB programming skillset.
We have put the finishing touches on the plugin and description which can be seen here.
https://creativeandcritical.net/myconver...ts-support

"It just works."