Converting XML to HTML via XSLT is one of the most common transformation tasks in document processing, and one of the most revealing tests of stylesheet design quality. A well-designed conversion stylesheet handles edge cases gracefully, produces clean semantic markup, and remains maintainable as the source XML evolves. A poorly designed one produces output that looks right until someone inspects the HTML or feeds it through a validator. This guide covers the approach I use for building reliable XML-to-HTML transforms, connecting to the broader patterns in the XSLT workflows reference and the structural considerations in the XML reference.

Starting with the Identity Transform

Every XML-to-HTML conversion project should begin with an identity transform. This is a single template rule that copies every node from the input to the output unchanged. From that baseline, you selectively override templates for the elements you need to convert.

The reason is practical: an identity transform ensures that nothing in the source document disappears silently. If you miss a template for an element, it passes through to the output as-is rather than vanishing. This makes debugging dramatically easier because the absence of expected HTML output always traces back to a missing or incorrect template override rather than a mystery about where content went.

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

Start with this, then build your conversion templates on top of it. Remove the identity transform only after you are confident that every element type in your source documents is handled by an explicit template.

Mapping Source Elements to HTML

The core of the conversion is a set of template rules that match source XML elements and produce HTML output. The design depends on your source vocabulary, but the patterns are consistent.

For document-oriented XML (articles, reports, manuals), the mapping typically looks like this:

  • Top-level document element maps to <article> or a <div> with semantic class.
  • Section elements map to <section> with headings derived from title children.
  • Paragraph elements map to <p>.
  • Emphasis and formatting elements map to <em>, <strong>, <code>, etc.
  • List elements map to <ul>, <ol>, <li>.
  • Figure and image elements map to <figure>, <img>, <figcaption>.
  • Table elements map to HTML table elements.
  • Cross-reference elements map to <a> with appropriate href generation.

For data-oriented XML (records, feeds, API responses), the mapping is more about presentation: wrapping field values in display containers, building tables from record structures, and generating navigation from hierarchical data.

Design Tip Map source elements to the most semantically appropriate HTML5 elements, not just generic divs. Semantic output improves accessibility, search engine understanding, and long-term maintainability of the generated HTML.

Handling Mixed Content

Mixed content (text interleaved with inline markup) is where XML-to-HTML conversion gets interesting. A paragraph that contains plain text, emphasis, cross-references, and footnote markers requires careful template design to preserve the reading order and inline structure.

The key is to use xsl:apply-templates inside block-level templates rather than xsl:value-of. The apply-templates instruction processes child nodes in document order, triggering appropriate templates for each child element while preserving text nodes between them. Using xsl:value-of instead would flatten the content to plain text, losing all inline markup.

This distinction catches people who are accustomed to data-oriented processing where xsl:value-of is sufficient. In document-oriented conversion, xsl:apply-templates is almost always the correct choice for processing element content.

Output Method and Serialization

XSLT provides three standard output methods: xml, html, and text. For XML-to-HTML conversion, the output method matters.

Using method="html" produces HTML output with HTML serialization rules: empty elements like <br> are serialized without a closing slash, boolean attributes are rendered correctly, and the HTML DOCTYPE can be specified.

Using method="xml" produces XHTML output with XML serialization rules: all elements have closing tags (or self-closing syntax), attributes are always quoted, and the output is well-formed XML.

For modern web use, either approach works in practice. Browsers handle both HTML and XHTML. The choice depends on your downstream processing needs. If the output feeds into another XML pipeline, use xml output method. If it is rendered directly in browsers, html output method produces cleaner results.

<xsl:output method="html" encoding="UTF-8" indent="yes"
            doctype-system="about:legacy-compat"/>

Generating Navigation and Structure

For longer documents, generating a table of contents, breadcrumb navigation, or section links from the source document structure adds substantial value. This is an area where XSLT excels because the source document tree contains all the structural information needed.

A typical table of contents generator uses xsl:for-each or xsl:apply-templates in a mode to iterate over section elements, extract their titles, generate anchor IDs, and build a linked list. The same IDs are inserted into the corresponding section headings so the navigation links work.

The important detail is ID generation. Generated IDs must be unique and stable. Using generate-id() produces unique IDs but they change if the document structure changes, which breaks bookmarks. Deriving IDs from section titles (with normalization) produces stable IDs but requires handling duplicate titles. Choose the approach that matches your use case.

Common Conversion Pitfalls

Whitespace handling. XML preserves whitespace differently than HTML renders it. A source document with indented XML may produce output with unwanted whitespace between elements. Use xsl:strip-space to control which elements have whitespace stripped, and test the visual output in a browser.

Namespace stripping. Source XML with namespaces will carry those namespaces into the output unless you explicitly exclude them. Use exclude-result-prefixes in the stylesheet element to prevent namespace declarations from appearing in the HTML output.

Character encoding. Ensure the output encoding matches what your serving infrastructure expects. UTF-8 is the safe default. Declaring one encoding in the xsl:output but serving with a different Content-Type header produces rendering bugs that are hard to trace.

Empty elements. Some source XML elements may be empty (self-closing). Your templates should handle these gracefully rather than producing empty HTML elements that affect layout.