In Part 1 of this series, we explored how to generate Excel files in C# using the Open XML SDK, leveraging the standard structure and document model provided by Microsoft. While libraries like DocumentFormat.OpenXml
offer convenience, they come with overhead—including memory usage, abstraction layers, and rigid APIs.
In this installment, we go deeper. We’ll peel back the layers and demystify what an Excel .xlsx
file really is under the hood: a ZIP archive containing XML files in a well-defined structure. This realization opens the door to writing raw Excel files from scratch—with complete control and potentially massive performance improvements.
What is an Excel .xlsx
File?
At its core, an .xlsx
file is just a ZIP archive with a specific internal layout and a collection of XML files conforming to the Office Open XML (OOXML) specification.
To confirm this, take any .xlsx
file on your computer and rename the extension from .xlsx
to .zip
. Then, unzip it. You’ll find a structure like this:
/ ├── [Content_Types].xml ├── _rels/.rels ├── xl/ │ ├── workbook.xml │ ├── _rels/workbook.xml.rels │ ├── styles.xml │ ├── sharedStrings.xml │ └── worksheets/ │ └── sheet1.xml
Each of these files serves a specific purpose:
workbook.xml
: Defines the workbook and sheetssharedStrings.xml
: Stores reused string valuessheet1.xml
: Contains cell data for the worksheetstyles.xml
: Defines formatting
By writing these XML files ourselves and packaging them into a ZIP file, we can generate .xlsx
documents from scratch.
Why Go Low-Level?
While using libraries like Open XML SDK is great for most use cases, there are scenarios where writing the XML manually makes sense:
- Performance & Memory: Avoid unnecessary object instantiations and reduce memory usage by streaming your XML output.
- Control: Customize file structure, format, and cell encoding without being limited by API constraints.
- Simplicity for Simple Needs: If your Excel file needs are minimal (e.g., a single sheet with plain text), you can generate exactly what you need without bloated dependencies.
Step-by-Step: Generating an Excel File From Scratch
Here’s a high-level outline of how to do it:
1. Create the Required XML Files
Write C# code that generates the following files:
[Content_Types].xml
_rels/.rels
xl/workbook.xml
xl/worksheets/sheet1.xml
- Optionally:
xl/styles.xml
,xl/sharedStrings.xml
For instance, here’s a minimal sheet1.xml
:
<?xml version="1.0" encoding="UTF-8"?> <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"> <sheetData> <row r="1"> <c r="A1" t="inlineStr"><is><t>Hello</t></is></c> <c r="B1" t="inlineStr"><is><t>World</t></is></c> </row> </sheetData> </worksheet>
2. Save the Files in a Temporary Directory
Use the standard file system I/O operations in C# to create the required folder structure.
3. Zip the Directory Into a .xlsx File
Use System.IO.Compression.ZipFile
to compress the directory and rename the output as .xlsx
:
ZipFile.CreateFromDirectory(tempDirectory, outputFilePath);
4. Clean Up
Delete the temporary files/folder once the archive is created.
Tips for Real-World Scenarios
- Stream the XML: Avoid holding large datasets in memory. Use
XmlWriter
to stream rows as needed. - Validate Against OOXML Spec: Ensure your XML follows the standard. Minor mistakes can cause Excel to reject or repair the file.
- Re-use Shared Strings: If many cells share the same text, use
sharedStrings.xml
to reduce file size.
Final Thoughts
Generating Excel files without Excel or third-party libraries gives you unmatched control. While it requires a deeper understanding of the OOXML structure, it allows you to craft highly efficient, custom-tailored spreadsheets.