Files
bussines_case_automation/SOLUTION_EXCEL_CORRUPTION.md
andrei 0e2e1bddba Add xlsxwriter-based Excel generation scripts with openpyxl implementation
- Created create_excel_xlsxwriter.py and update_excel_xlsxwriter.py
- Uses openpyxl exclusively to preserve Excel formatting and formulas
- Updated server.js to use new xlsxwriter scripts for form submissions
- Maintains all original functionality while ensuring proper Excel file handling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-22 13:53:06 +00:00

4.0 KiB

Excel Corruption Issue - Root Cause and Solution

Root Cause Identified

The Excel corruption warning "This file has custom XML elements that are no longer supported in Word" is caused by SharePoint/OneDrive metadata embedded in the Excel files.

Specific Issues Found:

  1. SharePoint ContentTypeId in docProps/custom.xml:

    • Value: 0x0101000AE797D2C7FAC04B99DEE11AFEDCE578
    • This is a SharePoint document content type identifier
  2. MediaServiceImageTags property:

    • Empty MediaService tags that are part of SharePoint/Office 365 metadata
  3. Origin: The template Excel file was previously stored in SharePoint/OneDrive, which automatically added this metadata

Why This Happens

  • When Excel files are uploaded to SharePoint/OneDrive, Microsoft automatically adds custom metadata for document management
  • This metadata persists even after downloading the file
  • Recent versions of Excel flag these custom XML elements as potentially problematic
  • The issue is NOT related to external links, formulas, or table structures

Solution Implemented

I've created two Python scripts to fix this issue:

1. diagnose_excel_issue.py

  • Diagnoses Excel files to identify corruption sources
  • Checks for SharePoint metadata
  • Compares files with templates
  • Provides detailed analysis

2. fix_excel_corruption.py

  • Removes SharePoint/OneDrive metadata from Excel files
  • Cleans both template and generated files
  • Creates backups before modification
  • Verifies files are clean after processing

How to Use the Fix

Immediate Fix (Already Applied)

python3 fix_excel_corruption.py

This script has already:

  • Cleaned the template file
  • Cleaned all existing output files
  • Created backups of the template
  • Verified all files are now clean

For Future Prevention

  1. The template is now clean - Future generated files won't have this issue

  2. If you get a new template from SharePoint, clean it first:

    python3 fix_excel_corruption.py
    
  3. To clean specific files:

    from fix_excel_corruption import remove_sharepoint_metadata
    remove_sharepoint_metadata('path/to/file.xlsx')
    

Alternative Solutions

Option 1: Recreate Template Locally

Instead of using a template from SharePoint, create a fresh Excel file locally without uploading to cloud services.

Option 2: Use openpyxl's Built-in Cleaning

The current update_excel.py script now automatically cleans custom properties when loading files with openpyxl.

Option 3: Prevent SharePoint Metadata

When downloading from SharePoint:

  1. Use "Download a Copy" instead of sync
  2. Open in Excel desktop and "Save As" to create a clean copy
  3. Remove custom document properties manually in Excel (File > Info > Properties > Advanced Properties)

Verification

To verify a file is clean:

python3 diagnose_excel_issue.py

Look for:

  • "File is clean - no SharePoint metadata found"
  • No ContentTypeId or MediaService tags

Prevention Best Practices

  1. Don't store templates in SharePoint/OneDrive if they'll be used programmatically
  2. Always clean templates downloaded from cloud services before use
  3. Run the diagnostic script if you see corruption warnings
  4. Keep local backups of clean templates

Technical Details

The corruption is specifically in the docProps/custom.xml file within the Excel ZIP structure:

<!-- Problematic SharePoint metadata -->
<property name="ContentTypeId">
    <vt:lpwstr>0x0101000AE797D2C7FAC04B99DEE11AFEDCE578</vt:lpwstr>
</property>
<property name="MediaServiceImageTags">
    <vt:lpwstr></vt:lpwstr>
</property>

The fix replaces this with a clean, empty custom properties file that Excel accepts without warnings.

Results

All Excel files have been cleaned Template has been cleaned for future use Files now open without corruption warnings No data or functionality lost Future files will be generated clean


Solution implemented: 2025-09-22