Comment to 'UNA 14 Timeline Cleanup Script – Expert Review Requested'
  • Hi @Michael Newton

    Thank you so much for the comprehensive code review and for being on standby during this critical period. Your timing couldn't have been better—I really appreciate the detailed analysis.

    Current Status

    I did test the script on a dev environment (pulled from production backup), and it successfully reduced the record count as expected. However, I ran into display issues on the dev site due to symlinked storage that I forgot to account for when setting up the test environment. This was compounded by production hitting critical disk space (90%+ full), which pulled my attention away from the test script that I'd already started.

    The good news: Production is now stable (55% disk usage) after moving media files to remote NFS storage and fixing some symlink/permission issues. The bad news: I haven't yet run the cleanup script on production due to the symlink complications discovered during testing.

    Addressing Your Review Points

    You're absolutely right about the high-risk items:

    1. The type='post' filter - I need to run your distribution query first. I was making assumptions about the schema without verifying.
    2. Using bx_srv('bx_timeline', 'delete', [$iId]) - This is exactly the kind of version-stable approach I should be using instead of calling internal methods directly.
    3. Web accessibility - The CLI-only check is a must. I hadn't considered this could be triggered accidentally via web.

    Next Steps Before Production Run

    Before I run this on production, I'll:

    1. Run the distribution query to verify actual type/action combinations:

    sql

       SELECT `type`, `action`, COUNT(*) c
       FROM `bx_timeline_events`
       GROUP BY `type`, `action`
       ORDER BY c DESC;
    
    1. Add the exclusion filters for system/pinned/promoted events (after verifying column names exist in my schema)
    2. Switch to the documented service call (bx_srv) instead of direct method calls
    3. Implement CLI-only restriction and improve dry-run logging clarity
    4. Start with a very small batch (10-20 records) in dry-run mode to verify targeting

    Question for You

    My production environment is UNACMS 14 (I'll confirm exact version and post it). Once I run the distribution query, would you mind helping me craft the precise WHERE clause for my specific setup?

    Also, regarding the symlinked storage issue I encountered in testing—when Timeline events are deleted via bx_srv('bx_timeline', 'delete', [$iId]), should it properly clean up associated media files even when they're symlinked to remote NFS storage? Or do I need additional cleanup steps?

    Appreciation

    Your tightened-up "safe version" is exactly what I needed. The improvements around:

    • Logging type/action/owner for better audit trails
    • Excluding system events
    • Using documented APIs
    • Misleading dry-run wording

    ...are all spot-on and will make this much more maintainable going forward.

    Thanks again for the thorough review and for standing by. I'll post the distribution query results shortly.

    Regards Chris

    • Understood on UNA CMS 14.x. Yes—once you paste the (type, action) distribution output, I can help you narrow it to a precise, low-risk WHERE clause that matches your instance’s real event taxonomy (and avoids collateral deletes).

      Below is the exact workflow I recommend so the WHERE clause is grounded in your data, not assumptions.

      1) Crafting the precise WHERE clause from your distribution query

      Step A — Confirm what you’re actually deleting (taxonomy + sample rows)

      Run the distribution query you mentioned:

      SELECT `type`, `action`, COUNT(*) AS `cnt`
      FROM `bx_timeline_events`
      GROUP BY `type`, `action`
      ORDER BY `cnt` DESC;
      

      Then, for the top 5–10 combos (and any combo you think are user “posts”), pull a small sample to validate they are the intended target:

      SELECT `id`, `type`, `action`, `owner_id`, `object_id`, `content`, `date`
      FROM `bx_timeline_events`
      WHERE `type` = 'X' AND `action` = 'Y'
      ORDER BY `date` DESC
      LIMIT 20;
      

      This is what prevents the classic failure mode: deleting things that look like “posts” but are actually system-generated events, module activity, profile changes, etc.

      Step B — Build a “candidate delete set” filter, then tighten it

      In almost every UNA Timeline cleanup, the final WHERE ends up being:

      1. Age cutoff
      2. One or more specific (type, action) pairs
      3. Explicit exclusions for system/admin/sticky/promoted/special events (based on what columns/flags exist in your schema)

      A safe template (to be adapted to your actual columns/taxonomy) looks like:

      WHERE `date` < {CUTOFF_UNIX_TS}
        AND (
              (`type` = '...' AND `action` = '...')
           OR (`type` = '...' AND `action` = '...')
           OR (`type` = '...' AND `action` = '...')
        )
        AND `owner_id` > 0
        -- Optional, only if these columns exist in your install:
        -- AND `system` = 0
        -- AND `sticked` = 0
        -- AND `promoted` = 0
        -- AND `pinned` = 0
      ;
      

      Step C — Prove the WHERE clause before deleting anything

      Before you run the deleter, validate with:

      SELECT COUNT(*) AS `to_delete`
      FROM `bx_timeline_events`
      <WHERE_CLAUSE>;
      

      And then eyeball a stratified sample:

      SELECT `id`, `type`, `action`, `owner_id`, `date`
      FROM `bx_timeline_events`
      <WHERE_CLAUSE>
      ORDER BY `date` DESC
      LIMIT 200;
      

      If you paste your distribution output, I’ll translate it into the minimal, correct WHERE for your environment (including exclusions that match what’s actually present in your schema).

      2) Symlinked/NFS storage: will bx_srv('bx_timeline','delete',[$iId]) remove media?

      What should happen

      In UNA, deleting via the module service (bx_srv('bx_timeline','delete',[$eventId])) is the right call because it is expected to:

      • delete the timeline event row
      • delete any module-specific media references
      • call into the storage layer to remove stored files (and their DB records)

      So, in principle, the service delete is the mechanism that should properly clean up associated media—regardless of whether the underlying storage path is “local” or happens to be a symlink pointing at NFS.

      Why symlink/NFS can still leave leftovers

      Even with the correct service call, you can still end up with orphaned files if any of the following are true:

      • Permission/ownership mismatch on the NFS mount (PHP user can delete DB rows but cannot unlink the remote file).
      • realpath/symlink path validation inside the storage engine behaves unexpectedly (less common, but possible in hardened filesystem checks).
      • NFS latency / transient I/O errors during unlink cause the physical delete to fail while the DB transaction succeeds (or vice-versa).
      • Your timeline attachments are stored in a different storage object than you think (e.g., separate storage objects for photos/videos/files), and some objects are not being invoked by that specific deletion path (version/module variations).

      The practical answer: rely on service delete, but add verification + “orphan sweep”

      I recommend treating this as two-phase:

      1. Primary deletion via bx_srv('bx_timeline','delete', ...) (what you are already doing)
      2. Post-run verification that catches orphaned storage records/files, and optionally deletes them via the storage API (not manual unlink())

      Verification approach (DB-level)

      After deleting a small test batch, check whether storage DB rows tied to timeline are decreasing as expected.

      Because table/object names can vary by install, the most robust approach is:

      • Identify the storage object(s) used by timeline attachments via:
      SELECT *
      FROM `sys_objects_storage`
      WHERE `object` LIKE '%timeline%';
      

      Then for any relevant storage object, inspect its files table (commonly sys_storage_files or bx_storage_files, depending on UNA generation/customization). If your install uses sys_storage_files, you can usually filter by storage_id and confirm rows aren’t piling up without owners.

      If you paste the result of the sys_objects_storage query (just the rows that match timeline), I can give you the exact orphan-detection SQL for your schema.

      If you do need additional cleanup

      Do not delete by filesystem path. Instead, delete orphans by calling the storage object’s delete method on file IDs. That ensures thumbnails/transcodings/variants and DB rows are handled correctly.

      The “right” cleanup step is therefore:

      • find orphaned file IDs (DB query)
      • for each file ID, call the storage API delete for that storage object

      I can provide a small, safe “orphan sweeper” routine once you confirm:

      • the storage object name(s) for timeline, and
      • the storage files table name in your install

      What to paste next (and I’ll turn it into the exact WHERE)

      1. Output of:
      SELECT `type`, `action`, COUNT(*) AS `cnt`
      FROM `bx_timeline_events`
      GROUP BY `type`, `action`
      ORDER BY `cnt` DESC;
      
      1. Output of:
      SELECT `object`, `engine`, `params`
      FROM `sys_objects_storage`
      WHERE `object` LIKE '%timeline%';
      

      With those two, I’ll:

      • produce the precise WHERE clause for your setup (including the safest exclusions), and
      • tell you whether your symlink/NFS case should be fully covered by service delete or whether an orphan sweep is advisable in your environment.