Files
bussines_case_automation/excel_table_repair_analysis.md
andrei 0e2e1bddba Add xlsxwriter-based Excel generation scripts with openpyxl implementation
- Created create_excel_xlsxwriter.py and update_excel_xlsxwriter.py
- Uses openpyxl exclusively to preserve Excel formatting and formulas
- Updated server.js to use new xlsxwriter scripts for form submissions
- Maintains all original functionality while ensuring proper Excel file handling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-22 13:53:06 +00:00

5.6 KiB

Excel Table Repair Error Analysis

Issue Summary

When opening Ubuntu-generated Excel files, Excel displays repair errors specifically for tables:

  • Repaired Records: Table from /xl/tables/table1.xml part (Table)
  • Repaired Records: Table from /xl/tables/table2.xml part (Table)

CRITICAL FINDING: The same script generates working files on macOS but broken files on Ubuntu, indicating a platform-specific issue rather than a general Excel format problem.

Investigation Findings

Three-Way Table Structure Comparison

Template File (Original - Working)

  • Contains proper XML declaration: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
  • Includes comprehensive namespace declarations:
    • xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    • xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
    • xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3"
  • Has mc:Ignorable="xr xr3" compatibility directive
  • Contains unique identifiers (xr:uid, xr3:uid) for tables and columns
  • Proper table ID sequence (table1 has id="2", table2 has id="3")

macOS Generated File (Working - No Repair Errors)

  • Missing XML declaration - no <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
  • Missing namespace declarations for revision extensions
  • No compatibility directives (mc:Ignorable)
  • Missing unique identifiers for tables and columns
  • Different table ID sequence (table1 has id="1", table2 has id="2")
  • File sizes: 1,032 bytes (table1), 1,121 bytes (table2)

Ubuntu Generated File (Broken - Requires Repair)

  • Missing XML declaration - no <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
  • Missing namespace declarations for revision extensions
  • No compatibility directives (mc:Ignorable)
  • Missing unique identifiers for tables and columns
  • Same table ID sequence as macOS (table1 has id="1", table2 has id="2")
  • Identical file sizes to macOS: 1,032 bytes (table1), 1,121 bytes (table2)

Key Discovery: XML Content is Identical

SHOCKING REVELATION: The table XML content between macOS and Ubuntu generated files is byte-for-byte identical. Both have:

  1. Missing XML declarations
  2. Missing namespace extensions
  3. Missing unique identifiers
  4. Same table ID sequence (1, 2)
  5. Identical file sizes

macOS table1.xml vs Ubuntu table1.xml:

<table id="1" name="Table8" displayName="Table8" ref="A43:H47" headerRowCount="1" totalsRowShown="0" headerRowDxfId="53" dataDxfId="52" xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">...

(Completely identical)

Root Cause Analysis - Platform Dependency

Since the table XML is identical but only Ubuntu files require repair, the issue is NOT in the table XML content. The problem must be:

  1. File encoding differences during ZIP assembly
  2. ZIP compression algorithm differences between platforms
  3. File timestamp/metadata differences in the ZIP archive
  4. Different Python library versions handling ZIP creation differently
  5. Excel's platform-specific validation logic being more strict on certain systems

Common Formula Issues

Both versions contain #REF! errors in calculated columns:

<calculatedColumnFormula>#REF!</calculatedColumnFormula>

This suggests broken cell references but doesn't cause repair errors.

Impact Assessment

  • Functionality: No data loss, tables work after repair
  • User Experience: Excel shows warning dialog requiring user action only on Ubuntu-generated files
  • Automation: Breaks automated processing workflows only for Ubuntu deployments
  • Platform Consistency: Same code produces different results across platforms

Recommendations

Platform-Specific Investigation Priorities

  1. Compare Python library versions between macOS and Ubuntu environments
  2. Check ZIP file metadata (timestamps, compression levels, file attributes)
  3. Examine file encoding during Excel assembly process
  4. Test with different Python Excel libraries (openpyxl vs xlsxwriter vs others)
  5. Analyze ZIP file internals with hex editors for subtle differences

Immediate Workarounds

  1. Document platform dependency in deployment guides
  2. Test all generated files on target Excel environment before distribution
  3. Consider generating files on macOS for production use
  4. Implement automated repair detection in the workflow

Long-term Fixes

  1. Standardize to template format with proper XML declarations and namespaces
  2. Use established Excel libraries with proven cross-platform compatibility
  3. Implement comprehensive testing across multiple platforms
  4. Add ZIP file validation to detect platform-specific differences

Technical Details

File Comparison Results

File Template macOS Generated Ubuntu Generated Ubuntu vs macOS
table1.xml 1,755 bytes 1,032 bytes 1,032 bytes Identical
table2.xml 1,844 bytes 1,121 bytes 1,121 bytes Identical

Platform Dependency Evidence

  • Identical table XML content between macOS and Ubuntu
  • Same missing features (declarations, namespaces, UIDs)
  • Different Excel behavior (repair required only on Ubuntu)
  • Suggests ZIP-level or metadata differences

Analysis completed: 2025-09-19 Files examined: Template vs Test5 generated Excel workbooks