Add xlsxwriter-based Excel generation scripts with openpyxl implementation
- Created create_excel_xlsxwriter.py and update_excel_xlsxwriter.py - Uses openpyxl exclusively to preserve Excel formatting and formulas - Updated server.js to use new xlsxwriter scripts for form submissions - Maintains all original functionality while ensuring proper Excel file handling 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
260
excel_repair_solution_proposal.md
Normal file
260
excel_repair_solution_proposal.md
Normal file
@@ -0,0 +1,260 @@
|
||||
# Excel Table Repair - Solution Proposal
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The Excel table repair errors are caused by **platform-specific differences in ZIP file assembly**, not XML content issues. Since the table XML is identical between working (macOS) and broken (Ubuntu) files, the solution requires addressing the underlying file generation process rather than XML formatting.
|
||||
|
||||
## Solution Strategy
|
||||
|
||||
### Option 1: Template-Based XML Injection (Recommended)
|
||||
**Approach**: Modify the script to generate Excel tables using the exact XML format from the working template.
|
||||
|
||||
**Implementation**:
|
||||
1. **Extract template table XML** as reference patterns
|
||||
2. **Generate proper XML declarations** for all table files
|
||||
3. **Add missing namespace declarations** and compatibility directives
|
||||
4. **Implement UID generation** for tables and columns
|
||||
5. **Fix table ID sequencing** to match Excel expectations
|
||||
|
||||
**Advantages**:
|
||||
- ✅ Addresses root XML format issues
|
||||
- ✅ Works across all platforms
|
||||
- ✅ Future-proof against Excel updates
|
||||
- ✅ No dependency on external libraries
|
||||
|
||||
**Implementation Timeline**: 2-3 days
|
||||
|
||||
### Option 2: Python Library Standardization
|
||||
**Approach**: Replace custom Excel generation with established cross-platform libraries.
|
||||
|
||||
**Implementation Options**:
|
||||
1. **openpyxl** - Most popular, excellent table support
|
||||
2. **xlsxwriter** - Fast performance, good formatting
|
||||
3. **pandas + openpyxl** - High-level data operations
|
||||
|
||||
**Advantages**:
|
||||
- ✅ Proven cross-platform compatibility
|
||||
- ✅ Handles XML complexities automatically
|
||||
- ✅ Better maintenance and updates
|
||||
- ✅ Extensive documentation and community
|
||||
|
||||
**Implementation Timeline**: 1-2 weeks (requires rewriting generation logic)
|
||||
|
||||
### Option 3: Platform Environment Isolation
|
||||
**Approach**: Standardize the Python environment across platforms.
|
||||
|
||||
**Implementation**:
|
||||
1. **Docker containerization** with fixed Python/library versions
|
||||
2. **Virtual environment** with pinned dependencies
|
||||
3. **CI/CD pipeline** generating files on controlled environment
|
||||
|
||||
**Advantages**:
|
||||
- ✅ Ensures identical execution environment
|
||||
- ✅ Minimal code changes required
|
||||
- ✅ Reproducible builds
|
||||
|
||||
**Implementation Timeline**: 3-5 days
|
||||
|
||||
## Recommended Implementation Plan
|
||||
|
||||
### Phase 1: Immediate Fix (Template-Based XML)
|
||||
|
||||
#### Step 1: XML Template Extraction
|
||||
```python
|
||||
def extract_template_xml_patterns():
|
||||
"""Extract proper XML patterns from working template"""
|
||||
template_tables = {
|
||||
'table1': {
|
||||
'declaration': '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>',
|
||||
'namespaces': {
|
||||
'main': 'http://schemas.openxmlformats.org/spreadsheetml/2006/main',
|
||||
'mc': 'http://schemas.openxmlformats.org/markup-compatibility/2006',
|
||||
'xr': 'http://schemas.microsoft.com/office/spreadsheetml/2014/revision',
|
||||
'xr3': 'http://schemas.microsoft.com/office/spreadsheetml/2016/revision3'
|
||||
},
|
||||
'compatibility': 'mc:Ignorable="xr xr3"',
|
||||
'uid_pattern': '{00000000-000C-0000-FFFF-FFFF{:02d}000000}'
|
||||
}
|
||||
}
|
||||
return template_tables
|
||||
```
|
||||
|
||||
#### Step 2: XML Generation Functions
|
||||
```python
|
||||
def generate_proper_table_xml(table_data, table_id):
|
||||
"""Generate Excel-compliant table XML with proper format"""
|
||||
|
||||
# XML Declaration
|
||||
xml_content = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\n'
|
||||
|
||||
# Table element with all namespaces
|
||||
xml_content += f'<table xmlns="{MAIN_NS}" xmlns:mc="{MC_NS}" '
|
||||
xml_content += f'mc:Ignorable="xr xr3" xmlns:xr="{XR_NS}" '
|
||||
xml_content += f'xmlns:xr3="{XR3_NS}" '
|
||||
xml_content += f'id="{table_id + 1}" ' # Correct ID sequence
|
||||
xml_content += f'xr:uid="{generate_table_uid(table_id)}" '
|
||||
xml_content += f'name="{table_data.name}" '
|
||||
xml_content += f'displayName="{table_data.display_name}" '
|
||||
xml_content += f'ref="{table_data.ref}">\n'
|
||||
|
||||
# Table columns with UIDs
|
||||
xml_content += generate_table_columns_xml(table_data.columns, table_id)
|
||||
|
||||
# Table style info
|
||||
xml_content += generate_table_style_xml(table_data.style)
|
||||
|
||||
xml_content += '</table>'
|
||||
|
||||
return xml_content
|
||||
|
||||
def generate_table_uid(table_id):
|
||||
"""Generate proper UIDs for tables"""
|
||||
return f"{{00000000-000C-0000-FFFF-FFFF{table_id:02d}000000}}"
|
||||
|
||||
def generate_column_uid(table_id, column_id):
|
||||
"""Generate proper UIDs for table columns"""
|
||||
return f"{{00000000-0010-0000-{table_id:04d}-{column_id:06d}000000}}"
|
||||
```
|
||||
|
||||
#### Step 3: File Assembly Improvements
|
||||
```python
|
||||
def create_excel_file_with_proper_compression():
|
||||
"""Create Excel file with consistent ZIP compression"""
|
||||
|
||||
# Use consistent compression settings
|
||||
with zipfile.ZipFile(output_path, 'w',
|
||||
compression=zipfile.ZIP_DEFLATED,
|
||||
compresslevel=6, # Consistent compression level
|
||||
allowZip64=False) as zipf:
|
||||
|
||||
# Set consistent file timestamps
|
||||
fixed_time = (2023, 1, 1, 0, 0, 0)
|
||||
|
||||
for file_path, content in excel_files.items():
|
||||
zinfo = zipfile.ZipInfo(file_path)
|
||||
zinfo.date_time = fixed_time
|
||||
zinfo.compress_type = zipfile.ZIP_DEFLATED
|
||||
|
||||
zipf.writestr(zinfo, content)
|
||||
```
|
||||
|
||||
### Phase 2: Testing and Validation
|
||||
|
||||
#### Cross-Platform Testing Matrix
|
||||
| Platform | Python Version | Library Versions | Test Status |
|
||||
|----------|---------------|-----------------|-------------|
|
||||
| Ubuntu 22.04 | 3.10+ | openpyxl==3.x | ⏳ Pending |
|
||||
| macOS | 3.10+ | openpyxl==3.x | ✅ Working |
|
||||
| Windows | 3.10+ | openpyxl==3.x | ⏳ TBD |
|
||||
|
||||
#### Validation Script
|
||||
```python
|
||||
def validate_excel_file(file_path):
|
||||
"""Validate generated Excel file for repair issues"""
|
||||
|
||||
checks = {
|
||||
'table_xml_format': check_table_xml_declarations,
|
||||
'namespace_compliance': check_namespace_declarations,
|
||||
'uid_presence': check_unique_identifiers,
|
||||
'zip_metadata': check_zip_file_metadata,
|
||||
'excel_compatibility': test_excel_opening
|
||||
}
|
||||
|
||||
results = {}
|
||||
for check_name, check_func in checks.items():
|
||||
results[check_name] = check_func(file_path)
|
||||
|
||||
return results
|
||||
```
|
||||
|
||||
### Phase 3: Long-term Improvements
|
||||
|
||||
#### Migration to openpyxl
|
||||
```python
|
||||
# Example migration approach
|
||||
from openpyxl import Workbook
|
||||
from openpyxl.worksheet.table import Table, TableStyleInfo
|
||||
|
||||
def create_excel_with_openpyxl(business_case_data):
|
||||
"""Generate Excel using openpyxl for cross-platform compatibility"""
|
||||
|
||||
wb = Workbook()
|
||||
ws = wb.active
|
||||
|
||||
# Add data
|
||||
for row in business_case_data:
|
||||
ws.append(row)
|
||||
|
||||
# Create table with proper formatting
|
||||
table = Table(displayName="BusinessCaseTable", ref="A1:H47")
|
||||
style = TableStyleInfo(name="TableStyleMedium3",
|
||||
showFirstColumn=False,
|
||||
showLastColumn=False,
|
||||
showRowStripes=True,
|
||||
showColumnStripes=False)
|
||||
table.tableStyleInfo = style
|
||||
|
||||
ws.add_table(table)
|
||||
|
||||
# Save with consistent settings
|
||||
wb.save(output_path)
|
||||
```
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
### Immediate Actions (Week 1)
|
||||
- [ ] Extract XML patterns from working template
|
||||
- [ ] Implement proper XML declaration generation
|
||||
- [ ] Add namespace declarations and compatibility directives
|
||||
- [ ] Implement UID generation algorithms
|
||||
- [ ] Fix table ID sequencing logic
|
||||
- [ ] Test on Ubuntu environment
|
||||
|
||||
### Validation Actions (Week 2)
|
||||
- [ ] Create comprehensive test suite
|
||||
- [ ] Validate across multiple platforms
|
||||
- [ ] Performance testing with large datasets
|
||||
- [ ] Excel compatibility testing (different versions)
|
||||
- [ ] Automated repair detection
|
||||
|
||||
### Future Improvements (Month 2)
|
||||
- [ ] Migration to openpyxl library
|
||||
- [ ] Docker containerization for consistent environment
|
||||
- [ ] CI/CD pipeline with cross-platform testing
|
||||
- [ ] Comprehensive documentation updates
|
||||
|
||||
## Risk Assessment
|
||||
|
||||
### High Priority Risks
|
||||
- **Platform dependency**: Current solution may not work on Windows
|
||||
- **Excel version compatibility**: Different Excel versions may have different validation
|
||||
- **Performance impact**: Proper XML generation may be slower
|
||||
|
||||
### Mitigation Strategies
|
||||
- **Comprehensive testing**: Test on all target platforms before deployment
|
||||
- **Fallback mechanism**: Keep current generation as backup
|
||||
- **Performance optimization**: Profile and optimize XML generation code
|
||||
|
||||
## Success Metrics
|
||||
|
||||
### Primary Goals
|
||||
- ✅ Zero Excel repair dialogs on Ubuntu-generated files
|
||||
- ✅ Identical behavior across macOS and Ubuntu
|
||||
- ✅ No data loss or functionality regression
|
||||
|
||||
### Secondary Goals
|
||||
- ✅ Improved file generation performance
|
||||
- ✅ Better code maintainability
|
||||
- ✅ Enhanced error handling and logging
|
||||
|
||||
## Conclusion
|
||||
|
||||
The recommended solution addresses the root cause by implementing proper Excel XML format generation while maintaining cross-platform compatibility. The template-based approach provides immediate relief while the library migration offers long-term stability.
|
||||
|
||||
**Next Steps**: Begin with Phase 1 implementation focusing on proper XML generation, followed by comprehensive testing across platforms.
|
||||
|
||||
---
|
||||
|
||||
*Proposal created: 2025-09-19*
|
||||
*Estimated implementation time: 2-3 weeks*
|
||||
*Priority: High - affects production workflows*
|
||||
Reference in New Issue
Block a user