Content Operations

Content operations at the XML layer involve a different set of concerns than application development. The documents are the product. The tooling exists to move those documents reliably from authoring through validation to publication. This note captures the operational patterns that have proven durable in practice.

Versioning Strategy

XML content needs versioning at two levels: the schema version and the document version.

Schema versioning tracks changes to the document structure. Use a major.minor scheme. Major versions indicate breaking changes (removed elements, changed cardinality). Minor versions indicate additive changes (new optional elements). Never make breaking changes in a minor version.

Document versioning tracks changes to individual document instances. Store the current version in document metadata. When a document is modified, increment the version and record the change in an audit element. This creates a self-contained change history inside the document itself.

Separate these two version numbers. A document at version 14 might conform to schema version 2.3. Conflating the two makes migration planning difficult.

Validation Pipeline

Every content operation that modifies a document should run validation before and after the operation.

Pre-operation validation confirms the input document is in a known-good state. If the input fails validation, the operation should not proceed. This prevents cascading corruption.

Post-operation validation confirms the output document is valid. If the output fails validation, the operation has introduced an error and should be rolled back.

The validation pipeline should check:

Well-formedness. Parse the document and confirm it is well-formed XML.
Schema conformance. Validate against the declared schema version.
Business rules. Apply domain-specific rules that cannot be expressed in XSD (cross-field dependencies, value range constraints, referential integrity).
Encoding. Confirm the declared encoding matches the actual encoding.

Run these checks in order. Each check assumes the previous checks passed.

Publishing Workflow

The publishing workflow moves validated documents from the authoring environment to the public-facing site.

Author creates or edits a document in the content repository.
Validate runs the full validation pipeline on the changed document.
Transform applies the appropriate XSLT stylesheet to produce the output format.
Review presents the transformed output for human review (optional but recommended for significant changes).
Stage copies the output to a staging environment for final verification.
Publish promotes the staged output to production.

Each step should be independently repeatable. If the transform step fails, you should be able to re-run it without repeating validation. If the publish step fails, you should be able to re-run it without re-staging.

Batch Processing

When processing large document sets (hundreds or thousands of files), the single-document workflow does not scale. Batch processing requires:

Parallel execution. Transform multiple documents simultaneously. XSLT transformations are stateless and parallelize well.
Progress tracking. Log which documents have been processed and which remain. A batch that fails partway through should be resumable.
Error isolation. A failure in one document should not stop the entire batch. Process all documents, collect errors, and report them at the end.
Resource management. Monitor memory usage during batch runs. Some XSLT processors accumulate memory across transformations. Consider recycling processor instances periodically.

Monitoring

Operational monitoring for content systems should track:

Document count by type and schema version
Validation failure rate (should be near zero for automated pipelines)
Transformation time per document (baseline and trend)
Publishing latency (time from commit to live)
Stale content detection (documents not updated within expected intervals)

These metrics reveal problems early. A rising validation failure rate suggests upstream authoring issues. Increasing transformation time suggests document complexity growth or processor degradation.

For transformation-specific performance data, see the benchmarks section. For schema-related best practices, see the note on namespace handling.