Add xlsxwriter-based Excel generation scripts with openpyxl implementation

- Created create_excel_xlsxwriter.py and update_excel_xlsxwriter.py
- Uses openpyxl exclusively to preserve Excel formatting and formulas
- Updated server.js to use new xlsxwriter scripts for form submissions
- Maintains all original functionality while ensuring proper Excel file handling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
andrei
2025-09-22 13:53:06 +00:00
commit 0e2e1bddba
842 changed files with 316330 additions and 0 deletions

View File

@@ -0,0 +1,260 @@
# Excel Table Repair - Solution Proposal
## Executive Summary
The Excel table repair errors are caused by **platform-specific differences in ZIP file assembly**, not XML content issues. Since the table XML is identical between working (macOS) and broken (Ubuntu) files, the solution requires addressing the underlying file generation process rather than XML formatting.
## Solution Strategy
### Option 1: Template-Based XML Injection (Recommended)
**Approach**: Modify the script to generate Excel tables using the exact XML format from the working template.
**Implementation**:
1. **Extract template table XML** as reference patterns
2. **Generate proper XML declarations** for all table files
3. **Add missing namespace declarations** and compatibility directives
4. **Implement UID generation** for tables and columns
5. **Fix table ID sequencing** to match Excel expectations
**Advantages**:
- ✅ Addresses root XML format issues
- ✅ Works across all platforms
- ✅ Future-proof against Excel updates
- ✅ No dependency on external libraries
**Implementation Timeline**: 2-3 days
### Option 2: Python Library Standardization
**Approach**: Replace custom Excel generation with established cross-platform libraries.
**Implementation Options**:
1. **openpyxl** - Most popular, excellent table support
2. **xlsxwriter** - Fast performance, good formatting
3. **pandas + openpyxl** - High-level data operations
**Advantages**:
- ✅ Proven cross-platform compatibility
- ✅ Handles XML complexities automatically
- ✅ Better maintenance and updates
- ✅ Extensive documentation and community
**Implementation Timeline**: 1-2 weeks (requires rewriting generation logic)
### Option 3: Platform Environment Isolation
**Approach**: Standardize the Python environment across platforms.
**Implementation**:
1. **Docker containerization** with fixed Python/library versions
2. **Virtual environment** with pinned dependencies
3. **CI/CD pipeline** generating files on controlled environment
**Advantages**:
- ✅ Ensures identical execution environment
- ✅ Minimal code changes required
- ✅ Reproducible builds
**Implementation Timeline**: 3-5 days
## Recommended Implementation Plan
### Phase 1: Immediate Fix (Template-Based XML)
#### Step 1: XML Template Extraction
```python
def extract_template_xml_patterns():
"""Extract proper XML patterns from working template"""
template_tables = {
'table1': {
'declaration': '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>',
'namespaces': {
'main': 'http://schemas.openxmlformats.org/spreadsheetml/2006/main',
'mc': 'http://schemas.openxmlformats.org/markup-compatibility/2006',
'xr': 'http://schemas.microsoft.com/office/spreadsheetml/2014/revision',
'xr3': 'http://schemas.microsoft.com/office/spreadsheetml/2016/revision3'
},
'compatibility': 'mc:Ignorable="xr xr3"',
'uid_pattern': '{00000000-000C-0000-FFFF-FFFF{:02d}000000}'
}
}
return template_tables
```
#### Step 2: XML Generation Functions
```python
def generate_proper_table_xml(table_data, table_id):
"""Generate Excel-compliant table XML with proper format"""
# XML Declaration
xml_content = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\n'
# Table element with all namespaces
xml_content += f'<table xmlns="{MAIN_NS}" xmlns:mc="{MC_NS}" '
xml_content += f'mc:Ignorable="xr xr3" xmlns:xr="{XR_NS}" '
xml_content += f'xmlns:xr3="{XR3_NS}" '
xml_content += f'id="{table_id + 1}" ' # Correct ID sequence
xml_content += f'xr:uid="{generate_table_uid(table_id)}" '
xml_content += f'name="{table_data.name}" '
xml_content += f'displayName="{table_data.display_name}" '
xml_content += f'ref="{table_data.ref}">\n'
# Table columns with UIDs
xml_content += generate_table_columns_xml(table_data.columns, table_id)
# Table style info
xml_content += generate_table_style_xml(table_data.style)
xml_content += '</table>'
return xml_content
def generate_table_uid(table_id):
"""Generate proper UIDs for tables"""
return f"{{00000000-000C-0000-FFFF-FFFF{table_id:02d}000000}}"
def generate_column_uid(table_id, column_id):
"""Generate proper UIDs for table columns"""
return f"{{00000000-0010-0000-{table_id:04d}-{column_id:06d}000000}}"
```
#### Step 3: File Assembly Improvements
```python
def create_excel_file_with_proper_compression():
"""Create Excel file with consistent ZIP compression"""
# Use consistent compression settings
with zipfile.ZipFile(output_path, 'w',
compression=zipfile.ZIP_DEFLATED,
compresslevel=6, # Consistent compression level
allowZip64=False) as zipf:
# Set consistent file timestamps
fixed_time = (2023, 1, 1, 0, 0, 0)
for file_path, content in excel_files.items():
zinfo = zipfile.ZipInfo(file_path)
zinfo.date_time = fixed_time
zinfo.compress_type = zipfile.ZIP_DEFLATED
zipf.writestr(zinfo, content)
```
### Phase 2: Testing and Validation
#### Cross-Platform Testing Matrix
| Platform | Python Version | Library Versions | Test Status |
|----------|---------------|-----------------|-------------|
| Ubuntu 22.04 | 3.10+ | openpyxl==3.x | ⏳ Pending |
| macOS | 3.10+ | openpyxl==3.x | ✅ Working |
| Windows | 3.10+ | openpyxl==3.x | ⏳ TBD |
#### Validation Script
```python
def validate_excel_file(file_path):
"""Validate generated Excel file for repair issues"""
checks = {
'table_xml_format': check_table_xml_declarations,
'namespace_compliance': check_namespace_declarations,
'uid_presence': check_unique_identifiers,
'zip_metadata': check_zip_file_metadata,
'excel_compatibility': test_excel_opening
}
results = {}
for check_name, check_func in checks.items():
results[check_name] = check_func(file_path)
return results
```
### Phase 3: Long-term Improvements
#### Migration to openpyxl
```python
# Example migration approach
from openpyxl import Workbook
from openpyxl.worksheet.table import Table, TableStyleInfo
def create_excel_with_openpyxl(business_case_data):
"""Generate Excel using openpyxl for cross-platform compatibility"""
wb = Workbook()
ws = wb.active
# Add data
for row in business_case_data:
ws.append(row)
# Create table with proper formatting
table = Table(displayName="BusinessCaseTable", ref="A1:H47")
style = TableStyleInfo(name="TableStyleMedium3",
showFirstColumn=False,
showLastColumn=False,
showRowStripes=True,
showColumnStripes=False)
table.tableStyleInfo = style
ws.add_table(table)
# Save with consistent settings
wb.save(output_path)
```
## Implementation Checklist
### Immediate Actions (Week 1)
- [ ] Extract XML patterns from working template
- [ ] Implement proper XML declaration generation
- [ ] Add namespace declarations and compatibility directives
- [ ] Implement UID generation algorithms
- [ ] Fix table ID sequencing logic
- [ ] Test on Ubuntu environment
### Validation Actions (Week 2)
- [ ] Create comprehensive test suite
- [ ] Validate across multiple platforms
- [ ] Performance testing with large datasets
- [ ] Excel compatibility testing (different versions)
- [ ] Automated repair detection
### Future Improvements (Month 2)
- [ ] Migration to openpyxl library
- [ ] Docker containerization for consistent environment
- [ ] CI/CD pipeline with cross-platform testing
- [ ] Comprehensive documentation updates
## Risk Assessment
### High Priority Risks
- **Platform dependency**: Current solution may not work on Windows
- **Excel version compatibility**: Different Excel versions may have different validation
- **Performance impact**: Proper XML generation may be slower
### Mitigation Strategies
- **Comprehensive testing**: Test on all target platforms before deployment
- **Fallback mechanism**: Keep current generation as backup
- **Performance optimization**: Profile and optimize XML generation code
## Success Metrics
### Primary Goals
- ✅ Zero Excel repair dialogs on Ubuntu-generated files
- ✅ Identical behavior across macOS and Ubuntu
- ✅ No data loss or functionality regression
### Secondary Goals
- ✅ Improved file generation performance
- ✅ Better code maintainability
- ✅ Enhanced error handling and logging
## Conclusion
The recommended solution addresses the root cause by implementing proper Excel XML format generation while maintaining cross-platform compatibility. The template-based approach provides immediate relief while the library migration offers long-term stability.
**Next Steps**: Begin with Phase 1 implementation focusing on proper XML generation, followed by comprehensive testing across platforms.
---
*Proposal created: 2025-09-19*
*Estimated implementation time: 2-3 weeks*
*Priority: High - affects production workflows*