Files
bussines_case_automation/excel_repair_solution_proposal.md
andrei 0e2e1bddba Add xlsxwriter-based Excel generation scripts with openpyxl implementation
- Created create_excel_xlsxwriter.py and update_excel_xlsxwriter.py
- Uses openpyxl exclusively to preserve Excel formatting and formulas
- Updated server.js to use new xlsxwriter scripts for form submissions
- Maintains all original functionality while ensuring proper Excel file handling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-22 13:53:06 +00:00

9.2 KiB

Excel Table Repair - Solution Proposal

Executive Summary

The Excel table repair errors are caused by platform-specific differences in ZIP file assembly, not XML content issues. Since the table XML is identical between working (macOS) and broken (Ubuntu) files, the solution requires addressing the underlying file generation process rather than XML formatting.

Solution Strategy

Approach: Modify the script to generate Excel tables using the exact XML format from the working template.

Implementation:

  1. Extract template table XML as reference patterns
  2. Generate proper XML declarations for all table files
  3. Add missing namespace declarations and compatibility directives
  4. Implement UID generation for tables and columns
  5. Fix table ID sequencing to match Excel expectations

Advantages:

  • Addresses root XML format issues
  • Works across all platforms
  • Future-proof against Excel updates
  • No dependency on external libraries

Implementation Timeline: 2-3 days

Option 2: Python Library Standardization

Approach: Replace custom Excel generation with established cross-platform libraries.

Implementation Options:

  1. openpyxl - Most popular, excellent table support
  2. xlsxwriter - Fast performance, good formatting
  3. pandas + openpyxl - High-level data operations

Advantages:

  • Proven cross-platform compatibility
  • Handles XML complexities automatically
  • Better maintenance and updates
  • Extensive documentation and community

Implementation Timeline: 1-2 weeks (requires rewriting generation logic)

Option 3: Platform Environment Isolation

Approach: Standardize the Python environment across platforms.

Implementation:

  1. Docker containerization with fixed Python/library versions
  2. Virtual environment with pinned dependencies
  3. CI/CD pipeline generating files on controlled environment

Advantages:

  • Ensures identical execution environment
  • Minimal code changes required
  • Reproducible builds

Implementation Timeline: 3-5 days

Phase 1: Immediate Fix (Template-Based XML)

Step 1: XML Template Extraction

def extract_template_xml_patterns():
    """Extract proper XML patterns from working template"""
    template_tables = {
        'table1': {
            'declaration': '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>',
            'namespaces': {
                'main': 'http://schemas.openxmlformats.org/spreadsheetml/2006/main',
                'mc': 'http://schemas.openxmlformats.org/markup-compatibility/2006',
                'xr': 'http://schemas.microsoft.com/office/spreadsheetml/2014/revision',
                'xr3': 'http://schemas.microsoft.com/office/spreadsheetml/2016/revision3'
            },
            'compatibility': 'mc:Ignorable="xr xr3"',
            'uid_pattern': '{00000000-000C-0000-FFFF-FFFF{:02d}000000}'
        }
    }
    return template_tables

Step 2: XML Generation Functions

def generate_proper_table_xml(table_data, table_id):
    """Generate Excel-compliant table XML with proper format"""

    # XML Declaration
    xml_content = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\n'

    # Table element with all namespaces
    xml_content += f'<table xmlns="{MAIN_NS}" xmlns:mc="{MC_NS}" '
    xml_content += f'mc:Ignorable="xr xr3" xmlns:xr="{XR_NS}" '
    xml_content += f'xmlns:xr3="{XR3_NS}" '
    xml_content += f'id="{table_id + 1}" '  # Correct ID sequence
    xml_content += f'xr:uid="{generate_table_uid(table_id)}" '
    xml_content += f'name="{table_data.name}" '
    xml_content += f'displayName="{table_data.display_name}" '
    xml_content += f'ref="{table_data.ref}">\n'

    # Table columns with UIDs
    xml_content += generate_table_columns_xml(table_data.columns, table_id)

    # Table style info
    xml_content += generate_table_style_xml(table_data.style)

    xml_content += '</table>'

    return xml_content

def generate_table_uid(table_id):
    """Generate proper UIDs for tables"""
    return f"{{00000000-000C-0000-FFFF-FFFF{table_id:02d}000000}}"

def generate_column_uid(table_id, column_id):
    """Generate proper UIDs for table columns"""
    return f"{{00000000-0010-0000-{table_id:04d}-{column_id:06d}000000}}"

Step 3: File Assembly Improvements

def create_excel_file_with_proper_compression():
    """Create Excel file with consistent ZIP compression"""

    # Use consistent compression settings
    with zipfile.ZipFile(output_path, 'w',
                        compression=zipfile.ZIP_DEFLATED,
                        compresslevel=6,  # Consistent compression level
                        allowZip64=False) as zipf:

        # Set consistent file timestamps
        fixed_time = (2023, 1, 1, 0, 0, 0)

        for file_path, content in excel_files.items():
            zinfo = zipfile.ZipInfo(file_path)
            zinfo.date_time = fixed_time
            zinfo.compress_type = zipfile.ZIP_DEFLATED

            zipf.writestr(zinfo, content)

Phase 2: Testing and Validation

Cross-Platform Testing Matrix

Platform Python Version Library Versions Test Status
Ubuntu 22.04 3.10+ openpyxl==3.x Pending
macOS 3.10+ openpyxl==3.x Working
Windows 3.10+ openpyxl==3.x TBD

Validation Script

def validate_excel_file(file_path):
    """Validate generated Excel file for repair issues"""

    checks = {
        'table_xml_format': check_table_xml_declarations,
        'namespace_compliance': check_namespace_declarations,
        'uid_presence': check_unique_identifiers,
        'zip_metadata': check_zip_file_metadata,
        'excel_compatibility': test_excel_opening
    }

    results = {}
    for check_name, check_func in checks.items():
        results[check_name] = check_func(file_path)

    return results

Phase 3: Long-term Improvements

Migration to openpyxl

# Example migration approach
from openpyxl import Workbook
from openpyxl.worksheet.table import Table, TableStyleInfo

def create_excel_with_openpyxl(business_case_data):
    """Generate Excel using openpyxl for cross-platform compatibility"""

    wb = Workbook()
    ws = wb.active

    # Add data
    for row in business_case_data:
        ws.append(row)

    # Create table with proper formatting
    table = Table(displayName="BusinessCaseTable", ref="A1:H47")
    style = TableStyleInfo(name="TableStyleMedium3",
                          showFirstColumn=False,
                          showLastColumn=False,
                          showRowStripes=True,
                          showColumnStripes=False)
    table.tableStyleInfo = style

    ws.add_table(table)

    # Save with consistent settings
    wb.save(output_path)

Implementation Checklist

Immediate Actions (Week 1)

  • Extract XML patterns from working template
  • Implement proper XML declaration generation
  • Add namespace declarations and compatibility directives
  • Implement UID generation algorithms
  • Fix table ID sequencing logic
  • Test on Ubuntu environment

Validation Actions (Week 2)

  • Create comprehensive test suite
  • Validate across multiple platforms
  • Performance testing with large datasets
  • Excel compatibility testing (different versions)
  • Automated repair detection

Future Improvements (Month 2)

  • Migration to openpyxl library
  • Docker containerization for consistent environment
  • CI/CD pipeline with cross-platform testing
  • Comprehensive documentation updates

Risk Assessment

High Priority Risks

  • Platform dependency: Current solution may not work on Windows
  • Excel version compatibility: Different Excel versions may have different validation
  • Performance impact: Proper XML generation may be slower

Mitigation Strategies

  • Comprehensive testing: Test on all target platforms before deployment
  • Fallback mechanism: Keep current generation as backup
  • Performance optimization: Profile and optimize XML generation code

Success Metrics

Primary Goals

  • Zero Excel repair dialogs on Ubuntu-generated files
  • Identical behavior across macOS and Ubuntu
  • No data loss or functionality regression

Secondary Goals

  • Improved file generation performance
  • Better code maintainability
  • Enhanced error handling and logging

Conclusion

The recommended solution addresses the root cause by implementing proper Excel XML format generation while maintaining cross-platform compatibility. The template-based approach provides immediate relief while the library migration offers long-term stability.

Next Steps: Begin with Phase 1 implementation focusing on proper XML generation, followed by comprehensive testing across platforms.


Proposal created: 2025-09-19 Estimated implementation time: 2-3 weeks Priority: High - affects production workflows