|
|
Why compile XSLT?By Jacek R. AmbroziakXSLT is a W3C standard high-level language specialized to 'natively' process XML data. It has direct support for navigating XML structures, recognizing patterns and for generating output XML, or HTML, or plain text. The generic term 'Transformation' (the 'T' in 'XSLT') covers a wide range of processing tasks including: 1. formatting XML for display in HTML browsers, and in general adding presentation qualities to pure content oriented XML 2. searching for data in XML documents 3. creating multiple 'document views', e.g. lists sorted by different criteria, detail/sensitive information suppressed/expanded, etc. As I have demonstrated before, XSLT can also be successfully applied intra-application: instead of generating any output (text, trees), a processor can call methods of some API. In the search engine application I had built 'indexing stylesheets' call methods of an IndexBuilder object controlling a process of full-text index building. There are plenty of applications for XSLT, some less obvious than others. Why compile XSLT? For the same reasons Java or C/C++/C# is compiled instead of interpreted: 1. execution of the binary form is much faster 2. execution of the binary form doesn't require the presence of the original translator, only an appropriate physical or virtual machine Gregor (and previously XSLTC: see my article) compiles XSLT stylesheets into binary Java classes that can be executed anywhere, stored in databases, collected in jar files, sent over the net, etc. An additional benefit (noticed by Ken Holman of Crane Softwrights Ltd.) is that stylesheet author's Intellectual Property is protected: stylesheets don't have to be distributed in source form. Please note that I use the term 'compilation' in its strict Computer Science sense meaning translation from a source language into a (typically lower level) target language. Sometimes the term is abused to mean some other form of stylesheet's structure manipulation. The litmus test is whether or not executable code is generated. Gregor's output, Java classes called 'translets' (the term I coined for my first presentation of XSLTC at WWW9 Conference in Amsterdam 2000), are just a regular Java classes of the form Java compilers produce. Translets capture the source transformation's logic in high performance binary form. I find it useful to think about Java as comprising the following constituents: 1. Java, the programming language 2. Java Virtual Machine (JVM) 3. Java class libraries What gives Java its appeal is its ubiquity and this is due to the JVM's ubiquity. JVMs exist in all popular Internet browsers, some PDAs and cell phones, and naturally on servers and desktops (check out JRockIt). Now, to exploit Java's ubiquity and the utility of its libraries you don't necessarily have to write Java programs¾all you need is Java classes! And the classes can be generated from other high-level languages sometimes more suitable for the task at hand. Such is the case with XSLT and Gregor: what is common is the Java platform (JVM and class libraries) but for programming the processing logic you can choose either general purpose Java language or a specialized formalism of XSLT (and XQuery in the near future). If you need to process XML, XSLT can be the better choice than Java XML API programming, and with Gregor the end results will be the same as if the logic had been written in Java (or most likely better in several respects). The bottom line is you can write your XML processing application in Java or in XSLT or in combination¾XSLT that calls arbitrary Java methods. With Gregor you can do it without sacrificing performance. In summary, the benefits (so far) of Gregor/XSLT compilation are peerless performance, seamless integration with Java leading to open choice between processing logic expression, persistence and 'run anywhere' character of translets. Of crucial importance is also Gregor's enabling factor. Gregor not only surpasses all other XSLT processors for Java in performance, but it enables XSLT processing logic on small devices where other full-fledged processors such as Saxon or Xalan wouldn't fit due to memory limitations. At XML Europe 2000 I demonstrated a toy PalmPilot XSLT application; today I work on real Gregor/XSLT applications for the Sharp Zaurus SL5500. Why another XSLT Compiler?Because the market has recognized a need for an XSLT compiler and no other XSLT for Java compiler exists to match Gregor's goals. Gregor improves on its predecessor (XSLTC) in a number of areas: 1. New modular architecture: · enables true performance optimization · reduces complexity leading to automatic avoidance of bugs that has traditionally plagued XSLTC · leads to extensibility towards different source and target languages (for instance, compiling XQuery to C/C#/MSIL code) 2. New compilation strategy clarifies complex issues and involves true code optimization 3. Support for arbitrary Java extension functions, both static and virtual (instance methods) 4. All the runtime and translet algorithms revisited and in most cases rewritten from scratch. Unity of vision and programming standard. 5. New persistent, compact DOM implementation for Small Devices 6. Massive reduction of runtime footprint and temporary object creation; since Gregor uses a different strategy of XPath evaluation critical runtime data structures have been simplified with gains in both smaller memory consumption and higher transformation speed 7. Reduced Runtime Library footprint for wireless/PDA applications. 8. Depending on demand, profiling/debugging tools 9. Depending on demand, extensibility towards other sources of XML data such as streams and XML databases 10. Modifiable internal DOM implementation required by my forthcoming Authoring Tool product Gregor's AnatomyGregor takes a different approach to XSLT compilation than its predecessor and follows a standard model of an optimizing compiler. It basically consists of three modules: 1. Front End: XSLT Processor 2. Intermediate Representation Optimizer 3. Back End (today emitting Java bytecodes) The first module, XSLT processor, is the only one that knows anything about XSLT as a source language. It knows how to parse and interpret stylesheets. It doesn't know anything about optimization or Java bytecodes. The second module in turn is agnostic about XSLT or Java bytecodes. It does, however, perform sophisticated reasoning about dynamics of XML data manipulation. Finally, the Back End is the only module to know anything about Java Virtual Machine. It doesn't care about XSLT or higher level interactions between data structures. Its role is to perform low-level optimizations and to emit final classes. A careful reader will realize some consequences of this architecture: · While today the Back End emits Java bytecodes, it can be substituted by a different generator, to output C code, Java code, C# code, or .NET's MSIL (Microsoft Intermediate Language). This can be done without touching the other modules: XSLT specific processing and Intermediate Optimizer are isolated from the concerns of target language generation. · While today the Front End processes XSLT 1.0, tomorrow it may process XSLT 2.0 and XQuery, again without touching the other modules in a major way. · It is the middle module that is the real heart of the system. While it is by far the most advanced of the modules, it is not bothered by XSLT's or JVM's numerous details. It is designed to stay the same when one day the Front End will process XQuery and the Back End will emit .NET's MSIL or StrongARM assembly. · In contrast to either the Front or Back Ends, the Intermediate Optimizer can afford to have a global view of XML processing required without being bothered by details irrelevant to its task. As a consequence it can perform optimizations too difficult to implement in XSLTC’s architecture. While this basic architecture has been well known and neither Front nor Back Ends are complicated, it is the middle module that is the pivotal "secret sauce" that is the essence of Gregor (and/or its extended family). Today only Gregor/XSLT/JVM exists; Gregor/XQuery/MSIL or Gregor/XSLT2.0/C is possible. All these combinations would share the same XML processing logic optimization. Info: info@ambrosoft.com |
|
Send mail to info@ambrosoft.com with
questions or comments about this web site.
|