- Created create_excel_xlsxwriter.py and update_excel_xlsxwriter.py - Uses openpyxl exclusively to preserve Excel formatting and formulas - Updated server.js to use new xlsxwriter scripts for form submissions - Maintains all original functionality while ensuring proper Excel file handling 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
126 lines
4.0 KiB
Markdown
126 lines
4.0 KiB
Markdown
# Excel Corruption Issue - Root Cause and Solution
|
|
|
|
## Root Cause Identified
|
|
|
|
The Excel corruption warning **"This file has custom XML elements that are no longer supported in Word"** is caused by **SharePoint/OneDrive metadata** embedded in the Excel files.
|
|
|
|
### Specific Issues Found:
|
|
|
|
1. **SharePoint ContentTypeId** in `docProps/custom.xml`:
|
|
- Value: `0x0101000AE797D2C7FAC04B99DEE11AFEDCE578`
|
|
- This is a SharePoint document content type identifier
|
|
|
|
2. **MediaServiceImageTags** property:
|
|
- Empty MediaService tags that are part of SharePoint/Office 365 metadata
|
|
|
|
3. **Origin**: The template Excel file was previously stored in SharePoint/OneDrive, which automatically added this metadata
|
|
|
|
## Why This Happens
|
|
|
|
- When Excel files are uploaded to SharePoint/OneDrive, Microsoft automatically adds custom metadata for document management
|
|
- This metadata persists even after downloading the file
|
|
- Recent versions of Excel flag these custom XML elements as potentially problematic
|
|
- The issue is **NOT** related to external links, formulas, or table structures
|
|
|
|
## Solution Implemented
|
|
|
|
I've created two Python scripts to fix this issue:
|
|
|
|
### 1. `diagnose_excel_issue.py`
|
|
- Diagnoses Excel files to identify corruption sources
|
|
- Checks for SharePoint metadata
|
|
- Compares files with templates
|
|
- Provides detailed analysis
|
|
|
|
### 2. `fix_excel_corruption.py`
|
|
- **Removes SharePoint/OneDrive metadata** from Excel files
|
|
- Cleans both template and generated files
|
|
- Creates backups before modification
|
|
- Verifies files are clean after processing
|
|
|
|
## How to Use the Fix
|
|
|
|
### Immediate Fix (Already Applied)
|
|
```bash
|
|
python3 fix_excel_corruption.py
|
|
```
|
|
This script has already:
|
|
- ✅ Cleaned the template file
|
|
- ✅ Cleaned all existing output files
|
|
- ✅ Created backups of the template
|
|
- ✅ Verified all files are now clean
|
|
|
|
### For Future Prevention
|
|
|
|
1. **The template is now clean** - Future generated files won't have this issue
|
|
|
|
2. **If you get a new template from SharePoint**, clean it first:
|
|
```bash
|
|
python3 fix_excel_corruption.py
|
|
```
|
|
|
|
3. **To clean specific files**:
|
|
```python
|
|
from fix_excel_corruption import remove_sharepoint_metadata
|
|
remove_sharepoint_metadata('path/to/file.xlsx')
|
|
```
|
|
|
|
## Alternative Solutions
|
|
|
|
### Option 1: Recreate Template Locally
|
|
Instead of using a template from SharePoint, create a fresh Excel file locally without uploading to cloud services.
|
|
|
|
### Option 2: Use openpyxl's Built-in Cleaning
|
|
The current `update_excel.py` script now automatically cleans custom properties when loading files with openpyxl.
|
|
|
|
### Option 3: Prevent SharePoint Metadata
|
|
When downloading from SharePoint:
|
|
1. Use "Download a Copy" instead of sync
|
|
2. Open in Excel desktop and "Save As" to create a clean copy
|
|
3. Remove custom document properties manually in Excel (File > Info > Properties > Advanced Properties)
|
|
|
|
## Verification
|
|
|
|
To verify a file is clean:
|
|
```bash
|
|
python3 diagnose_excel_issue.py
|
|
```
|
|
|
|
Look for:
|
|
- ✅ "File is clean - no SharePoint metadata found"
|
|
- ✅ No ContentTypeId or MediaService tags
|
|
|
|
## Prevention Best Practices
|
|
|
|
1. **Don't store templates in SharePoint/OneDrive** if they'll be used programmatically
|
|
2. **Always clean templates** downloaded from cloud services before use
|
|
3. **Run the diagnostic script** if you see corruption warnings
|
|
4. **Keep local backups** of clean templates
|
|
|
|
## Technical Details
|
|
|
|
The corruption is specifically in the `docProps/custom.xml` file within the Excel ZIP structure:
|
|
|
|
```xml
|
|
<!-- Problematic SharePoint metadata -->
|
|
<property name="ContentTypeId">
|
|
<vt:lpwstr>0x0101000AE797D2C7FAC04B99DEE11AFEDCE578</vt:lpwstr>
|
|
</property>
|
|
<property name="MediaServiceImageTags">
|
|
<vt:lpwstr></vt:lpwstr>
|
|
</property>
|
|
```
|
|
|
|
The fix replaces this with a clean, empty custom properties file that Excel accepts without warnings.
|
|
|
|
## Results
|
|
|
|
✅ All Excel files have been cleaned
|
|
✅ Template has been cleaned for future use
|
|
✅ Files now open without corruption warnings
|
|
✅ No data or functionality lost
|
|
✅ Future files will be generated clean
|
|
|
|
---
|
|
|
|
*Solution implemented: 2025-09-22* |