Cover image for The latex web companion : integrating TEX, HTML, and XML
Title:
The latex web companion : integrating TEX, HTML, and XML
Publication Information:
Reading, Mass. : Addison Wesley, 1999
ISBN:
9780201433111
General Note:
Accompanies text entitled : The LATEX companion (Z253.4.L38 M57 2004)
Added Author:

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000004690404 QA76.76.H94 L37 1999 Open Access Book Book
Searching...
Searching...
30000010122350 QA76.76.H94 L37 1999 Open Access Book Book
Searching...

On Order

Summary

Summary

Shows how you can publish LaTeX documents on the Web. This book describes tools and techniques for transforming LaTeX sources into Web formats for electronic publication, and for transforming Web sources into LaTeX documents for optimal printing.


Author Notes

Michel Goossens is past president of the TeX Users Group. A research physicist at CERN, where the Web paradigm was born, he is responsible for LaTeX, HTML, SGML, and, more recently, XML support for scientific documents.

Sebastian Rahtz is Past Secretary of TUG, a cofounder of CTAN, creator of the TeX Live CD-ROM, and a co-author of The LaTeX Graphics Companion. He is an IT analyst at Elsevier Science Ltd.

Eitan Gurari, Ross Moore, and Robert Sutor are, respectively, the principal architects of TeX4ht, LaTeX2HTML, and techexplorer.

Eitan Gurari, Ross Moore, and Robert Sutor are, respectively, the principal architects of TeX4ht, LaTeX2HTML, and techexplorer.

Eitan Gurari, Ross Moore, and Robert Sutor are, respectively, the principal architects of TeX4ht, LaTeX2HTML, and techexplorer.



0201433117AB11042003


Excerpts

Excerpts

The aim of this book is to provide help for authors, primarily scientists, who want to invest in the Web or other hypertext presentation systems but are not living in the world of Microsoft Word or QuarkXPress. They have an investment in markup systems such as LaTeX and have special needs in fields like mathematics, non-European languages, and algorithmic graphics. The book will tell them how to make full use of the Adobe Acrobat format from LaTeX; convert their legacy documents to HTML or XML; make use of their math in Web applications; use LaTeX as a tool in preparing Web pages; read and write simple XML/SGML; produce high-quality printed pages from their Web-hosted XML or HTML pages using TeX or PDF. LaTeX as a document repository for the Internet The World Wide Web has invaded all areas of society, and science is no exception to this rule. This should come as no surprise since the Web paradigm was born at CERN, one of the largest scientific laboratories in the world. The present ubiquitous Web interface is the result of basic research that took place in the first years of the 1990s at CERN. Before then use of the Internet had been mostly an affair of specialists. It needed the genius and insight of Tim Berners-Lee and collaborators to create a tool that allowed physicists participating in CERN's high-energy physics program but located all over the world to exchange data and information via the Internet in an intuitive and "user-friendly" way. Their work led directly to the development of the HTML language, the HTTP protocol, and the URL addressing scheme--the three basic pillars on which the Web is built. From the very beginning, the group took the farsighted decision to share their work freely with the Internet community. Then, thanks also to the appearance of the graphic interface of the Mosaic browser, the Web paradigm was received enthusiastically by developers and users alike. The growth of the number of Web sites and users became exponential, culminating in the Web Woodstock at CERN in May 1994. CERN, a scientific laboratory dedicated to basic research, did not have the resources to coordinate Web development further, and hence these responsibilities were transferred to the international World Wide Web Consortium W3C, which at present consists of three main components: the Laboratory for Computer Science at MIT MIT, USA; INRIA INRIA, France; and Keio University KEIO, Japan. The Consortium is supported by DARPA DARPA and the European Commission EC. One lesson to be learned from the history of the advent of the Web is that basic research, in completely unexpected ways, can lead to very important and wide-ranging spin-offs for society. Although most people do not realize it, SGML (in the form of the ubiquitous lingua franca of the Web, HTML) is today without doubt the leading markup language for electronic documents. Similarly LaTeX has been used for over a decade for marking up scientific documents. Even today there is no viable alternative to print texts containing a lot of mathematics without using LaTeX. Therefore it seems reasonable to look for ways to find a (possibly) automatic procedure to translate LaTeX documents in a form that is exploitable on the Web. Conversely, documents marked up in XML and HTML should be able to benefit from the high typographic qualities of the TeX processor. Therefore in this book we explain how LaTeX can be used as the central component of an electronic document strategy for the Web. We show how you can reuse your existing LaTeX documents on the Web by translating them into HTML, and how, by using some LaTeX extension packages, you can more fully exploit the hypertext capabilities of HTML. Today HTML and Web browsers cannot deal very well with nontextual document components, such as pictures (which are translated into bitmap images) or mathematics. We also address the translation of LaTeX into PDF and the possibilities of interpreting LaTeX commands directly by extensions of a browser. We also introduce you to the secrets of XML, the extensible markup language, which uses a subset of SGML and which is set to replace HTML as it allows for application-dependent extensions. In particular, we look at MML--the mathematical markup language--its syntax and how it can be generated, and what it can be used for. Going in the other direction, we discuss various strategies to transform Web source documents marked up in XML or HTML into LaTeX or PDF for optimal printing, in particular using DSSSL and XSL style sheets. Many tools for transforming TeX-based source files into HTML have been developed over the years. The programs described in this book are a representive sample chosen mainly because we were familiar with them and have used them ourselves. The absence of a description of other tools in this book in no way implies that we consider them to be less useful or of inferior quality. Logical structure of the book We suggest that all readers look at Chapter 1 before going any further, because this chapter introduces how we think--that the Web is not a threat to LaTeX, but an opportunity and why you should or should not continue to write in LaTeX. We also present a short introduction to the Web from the point of view of the LaTeX user. Chapter 2 treats the subject of how to marry hyperdocuments with page fidelity using the Portable Document Format (PDF). The conversion of LaTeX documents into HTML is tackled in Chapters 3 and 4. In Chapter 3 we discuss LaTeX2HTML, which uses Perl to interpret LaTeX source documents and to generate HTML code. Extension packages can be easily added in the form of Perl routines, while various extensions to the LaTeX language make LaTeX2HTML a real high-performance tool to generate hypertext documents. We take a different approach in Chapter 4, where TeX4ht uses a redefinition of LaTeX's TeX macros to generate HTML or XML, possibly using also the MML application for expressing the mathematics. Recently we have seen the development of browsers (with plug-ins) that are able to interpret mathematical markup directly. Chapter 5 looks at implementations that can direcly interpret large subsets of native LaTeX code without prior translation into HTML, in particulartechexplorer, a plug-in for Netscape and Internet Explorer developed by IBM, and WebEQ, a Java applet for rendering math. Chapter 6 looks at the broader picture and gives a gentle introduction to SGML (Standard Generalized Markup Language); it explains how XML (eXtensible Markup Language), a simpler and more "Internet and user-friendly" variant of SGML will become an important element in any future document strategy for the Internet. It is anticipated that XML, combined with object databases and other current object-oriented technologies, will revolutionize our document management at all levels. Tools for authoring and interpreting XML will be described, and we will spend some time building a LaTeX-like XML markup language. TeX was originally developed by Don Knuth to print his math books in accordance with the highest standards of the typographic art. Therefore it should come as no surprise that TeX has been proposed as a typesetting engine for Web material. Tools to translate XML sources into various output formats are described in Chapter 7. The use of Cascading Style Sheets (CSS), Document Style Semantics and Specification Language (DSSSL), and Extensible Style Language (XSL) for controlling the translation process will be detailed. Chapter 8 tackles the "hot" issue of how to take maximal advantage of LaTeX's optimal mathematical notation to translate LaTeX markup into XML and MathML (Mathematical Markup language), a companion to XML to present and work with math on the Web. The book ends with appendixes that contain technical information to complement the chapters in the book. We provide an introduction to Web name spaces, discuss internationalization issues, and review a few important XML DTDs. We also explain where you can find the software mentioned in this book. History and authorship When The LaTeX Graphics Companion was in its early stages, Sebastian Rahtz and Michel Goossens intended to include coverage of the Portable Document Format, SGML, and the Web in that book. It became apparent, however, that the hypertext and SGML material would require a whole book of their own, so as soon as the Graphics Companion was completed, work started on this Web Companion. Even more than is the case with most TeX work, the packages and programs related to the Web and TeX were changing very rapidly; it was decided, therefore, to ask the authors of three of the most important packages to work with Rahtz and Goossens, to make sure that the chapters would be up-to-date and accurate. The chapter on LaTeX2HTML is primarily the work of Moore; that on TeX4ht the work of Gurari; and that on IBMtechexplorerand WebEQ that of Sutor; Goossens and Rahtz shared the remaining chapters between them. Gurari, Moore, and Sutor also contributed significantly to the rest of the book by commenting on material, contributing sections, and discussing the issues involved. It is, perhaps, a tribute to the Internet that the five authors never met in person as a group during the entire writing and editing process. The nearest they came was a pleasant dinner in St. Malo at the 1998 EuroTeX meeting, where all but Eitan Gurari were present. Using, and finding, all those packages and programs Unless explicitly mentioned otherwise, all packages and programs described in this book are freely available in public software archives; some are in the public domain, while others are protected by copyright. Some programs are available only in source form or work only on certain computer platforms, and you should be prepared for a certain amount of "getting your hands dirty" in some cases. We also cannot guarantee that later versions of packages or programs will give results identical to those in our book. Many of them are under active development, and new or changed versions appear several times a year; we completed this book in the winter of 1998-1999, and tested the examples with versions current at that time. As regular users of the World Wide Web will know, keeping track of URLs is a tricky, error-prone process as sites continually disappear or change their structure. In this book, therefore, we do not give formal URLs in the text, but rather give pointers (typeset like W3C) to a catalog of URLs in the Appendix. This catalog will be kept up to date and will be available in the CTAN directory mentioned earlier. We have also tried to clear up some of the fog of acronyms by providing a glossary of terms. Colophon This book was prepared using LaTeX. The main text font is Adobe Janson, the sans serif font is Y&Y's European Modern Sans, the math is set in Y&Y MathTime Plus, and the literal typewriter text is set in Y&Y's European Modern typewriter. The LaTeX style was refined and generalized by Frank Mittelbach from that developed by him and Sebastian Rahtz for The LaTeX Graphics Companion, which, in turn, was derived from the style by Frank Mittelbach and Michel Goossens for The LaTeX Companion. Acknowledgments We are grateful to Nelson Beebe (University of Utah), Tim Bray (Textuality), Mimi Burbank (Florida State University), David Carlisle (NAG), Hans Hagen (Pragma), Han The Thahn (Masaryk University, Brno), T. V. Raman (Adobe Systems), D. P. Story (University of Akron), Michael Downes (American Mathematical Society), Peter Flynn (University College, Cork), Chris Maden (O'Reilly), Thomas Merz (Munich), and Chris Rowley (Open University) for advice, encouragement, and comments on draft chapters. Sebastian Rahtz would like to take this opportunity to thank Tanmoy Bhattacharya, David Carlisle, Patrick Daly, Yannis Haralambous, and many others, for their help with thehyperrefpackage, and Berthold Horn (Y&Y) for sponsoring part of the development. Eitan M. Gurari is very thankful to Gertjan Klein and Sebastian Rahtz for their contribution to the development of TeX4ht. Gertjan's help came at early stages of the project, offering important code and advice for making TeX4ht a portable tool and providing numerous detailed comments and suggestions for configuring the output. Sebastian got involved in the project at later stages, providing an enormous amount of feedback, setting up challenging objectives, collaborating in the development of interesting configuration files, aggressively promoting the system, and heavily editing my contribution to this book. Aside and beyond the professional aspects, Gertjan and Sebastian were great Net associates! Robert Sutor wants to express express his gratitude to Bill Pulleyblank, Marshall Schor, and Dick Jenks of the IBM Research Division for their support during the timetechexplorerwas developed. Ross Moore would like to acknowledge first Nikos Drakos, for his foresight in designing a translator such as latextohtml and establishing its basic design principles. There is insufficient space here to list all those who have made significant contributions; we thank them all. Among them we especially wish to acknowledge Marcus Hennecke and Herb Swan, who were the most significant contributors when Nikos could no longer be involved. We also wish to acknowledge Jens Lippman, Scott Nelson, and Marek Rouchal who continue to supply the support necessary to develop, maintain, and distribute the latest revisions of the LaTeX2HTML program. Second Ross wants to thank Michel Goossens, Mimi Jett, Jerold Marsden, Robert Miner, and Kristoffer Rose for supporting visits to various places around the world, where ideas for extensions to LaTeX2HTML were discussed and/or developed; some of these visits have directly affected the contents of this book. On the publishing side, Frank Mittelbach (series editor) did an excellent job of trying to keep us on the straight and narrow path, and Peter Gordon (Addison Wesley Longman, Inc.) provided all the encouragement, support, jokes, and help any authors could want. When it came to production, Helen Goldstein and Maureen Willard were very patient with our idiosyncrasies and steered us safely to completion and edited our dubious prose into real English. Feedback We would like to ask you, dear reader, for your collaboration. We kindly invite you to send your comments, suggestions, or remarks to any of the authors. We will be glad to correct any mistakes or oversights in a future edition and are open to suggestions for improvements or the inclusion of important developments we may have overlooked. We will maintain a list of errata in a file calledwebcomp.errin the LaTeX distribution, and this will contain current addresses for the authors. 0201433117P04062001 Excerpted from The Latex Web Companion: Integrating TeX, HTML, and XML by Michel Goossens, Eitan M. Gurari, Ross Moore, Sebastian Rahtz, Robert S. Sutor All rights reserved by the original copyright owners. Excerpts are provided for display purposes only and may not be reproduced, reprinted or distributed without the written permission of the publisher.

Table of Contents

List of Figures
List of Tables
Preface
1 The Web, its documents, and LaTeX
The Web, a window on the Internet
The Hypertext Transport Protocol
Universal Resource Locators and Identifiers
The Hypertext Markup Language
LaTeX in the Web environment
Overview of document formats and strategies
Staying with DVI
PDF for typographic quality
Down-translation to HTML
Java and browser plug-ins
Other LaTeX-related approaches to the Web
Is there an optimal approach? Conclusion
2 Portable Document Format
What is PDF? Generating PDF from TeX
Creating and manipulating PDF
Setting up fonts
Adding value to your PDF
Rich PDF with LaTeX: The hyperref package
Implicit behavior of hyperref
Configuring hyperref
Additional user macros for hyperlinks
Acrobat-specific commands
Special support for other packages
Creating PDF and HTML forms
Validating form fields
Designing PDF documents for the screen
Catalog of package options
Generating PDF directly from TeX
Setting up pdfTeX
New primitives
Graphics and color
3 The LaTeX2HTML translator
Introduction
A few words on history
Principles for Web document generation
Required software and customization
Running LaTeX2HTML on a LaTeX document
Installation
Customizing the local installation
Extension mechanisms and LaTeX packages
Mathematics modes with LaTeX2HTML
An overview of LaTeX2HTMLs math modes
Advanced mathematics with the math extension
Unicode fonts and named entities, in expert mode
HTML 4.0 and style sheets
Large images and HTML 2.0
Future use of MathML
Support for different languages
Titles and keywords
Character-set encodings
Multilingual documents using babel
Images using special fonts
Converting transliterations using preprocessors
Extending LaTeX sources with hypertext commands using the html package
Hyperlinks to external documents
Enhancements appropriate for HTML
Alternative text for hyperlinks
Conditional environments
Navigation and layout of HTML pages
Example of linking various external documents
Advanced features
4 Translating LaTeX to HTML using TeX4ht
Using TeX4ht
Package options
Picture representation of special content
A complete example
Manual creation of hypertext elements
Raw hypertext code
Hypertext pages
Hypertext links
Cascading Style Sheets
How TeX4ht works
From LaTeX to DVI
From DVI to HTML
Other matters
Extended customization of TeX4ht
Configuration files
Tables of contents
Parts, chapters, sections, and so on
Defining sectioning commands
Lists
Environments
Tables
Small details
The inner workings of TeX4ht
The translation process
Running LaTeX
Running the tex4ht program
A look at t4ht
From DVI to GIF
A taste