Yuan-ze Institute of Technology, Taiwan, ROC (YUAN-ZE) Project YAO Materials Version 1.0 Alpha 5.0 Contents: This document consists of four parts: 1. Descriptive outline of ObjectSGML 2. Credits 3. Implementation considerations and programming style 4. Release status 1. DESCRIPTIVE OUTLINE OF PROJECT YAO NOTE: Some features have not been implemented ObjectSGML and POEM What are they? Who supports them? How do they work? What is ObjectSGML? Object-oriented SGML parser Hooks for native HyTime support Architectural form validation Location addressing HyQ queries and properties It's free to anyone! What is POEM? Portable object-oriented entity manager Platform-independent interface to real storage Supports ISO/IEC 9070 public identifiers for universal object identification Supports ISO/IEC 10744 Standard Bento (SBENTO) Supports ISO 9069 SGML Document Interchange Format (SDIF) Project YAO YAO: A Chinese word with several auspicious meanings for our project, including "distant, widespread" and "medicine" International consortium for free SGML software development kits (SDK) Original Developers Yuan-ze Institute of Technology (Taiwan, ROC) IBM (U.S.) Petrus Services and Consulting, Ltd. (France) TechnoTeacher, Inc. (U.S.) NOTE: Petrus no longer participates Developers have written four SGML parsers SDK Implementation C++ class library Platform-independent ObjectSGML Based on YASP 1.35 ObjectSGML Architecture Low-level "parser event" API Variable-persistence cache High-level "information object" API Entity manager Parser Event API Recognizes "parser event" types selected by application Saves and restores parse state for incremental parsing Supports multiple concurrent parsing contexts Different SGML declarations Different DTDs/LPDs Architectural form "meta-DTDs" Variable-persistence Cache Maintained by application in proprietary format Allows rapid access to information found by parsing context API Optimized for application (e.g., word processor, data base) Avoids re-parsing Persistence values: next event element entity parsing context session permanent "same-as" Information Object API Element structure and entity structure Addressable by HyTime location addressing, "proploc", and HyQ Supported by cache (if present) or incremental parsing by low-level API POEM Architecture Full entity structure support Multiple entities per storage object ("archives") Multiple storage objects per entity Complete or partial storage objects ("substrings") Converts line terminators to SGML record boundaries Portable entity manager with local enrollable "storage managers" Public identifier catalog locates storage objects Maintains location in entity and storage objects 2. CREDITS ------------------------------------------------------------------------- Project YAO Presents ... ObjectSGML an object-oriented SGML parser with native HyTime support POEM a Portable Object-oriented Entity Manager Original Sponsors: Yuan-ze Institute of Technology, Taiwan, R.O.C. Information Technology Research Center, Prof. Shy-ming Ju, Director IBM Corporation, Armonk, N.Y., U.S.A. Research Division NOTE: IBM is no longer a sponsor. Project Leader: Charles F. Goldfarb Interface Design: Charles F. Goldfarb Neill Kipp (TechnoTeacher, Inc.) Erik Naggum (Naggum Software) Pierre Richard ObjectSGML Program Design: Pierre Richard ObjectSGML Programming Pierre Richard POEM Program Design: Erik Naggum POEM Programming: Erik Naggum Quality Assurance Leader: Wayne Wohler Porting and Testing: Wayne Wohler David Knight David Chang Derived from: "Yorktown Advanced SGML Parser (YASP)" by Pierre Richard "Almaden Research Center SGML Parser (ARCSGML) by Charles F. Goldfarb 3. IMPLEMENTATION CONSIDERATIONS AND PROGRAMMING STYLE SGML constraint: 1. Character 0 must be SHUNCHAR and UNUSED Compiler and system considerations: 1. Support for C++ language, version 3.0 (Bjarne Stroustrup, 2nd ed.). 2. Testing with: IBM C/C++ Tools version 2.0 on OS/2 2.1 Borland C++ 4.0 on DOS 5.0/Windows 3.1 3. Must be able to tell compiler (and library) that all char are unsigned. 4. We do not use any compiler-defined language extensions. 5. We do not rely on "undefined behavior" or "implementation-defined behavior" (as defined by the C++ language definition) 6. int is assumed to be <= 15 bits (that is, any variable whose value is expected to exceed the capacity of 15 bits is declared as a long, except that if 16 bits will suffice, it is declared as unsigned int. 7. File names follow the DOS restrictions (8.3). and are therefore all lowercase; the extensions are as required by the platform. 8. We do not use floating point. 9. Maximum length of external linkages is minimum required of all C++ implementations: 31. 10. We stick to the most common features of C++ to maximize reliability and portability. We do not use (inter alia) templates or exceptions. 4. RELEASE STATUS *** IMPORTANT RELEASE NUMBERING NOTICE *** This release is essentially a bug fix to 10A40. The headers in the code files have NOT been revised to show the new release numbers; they do occur in the *.txt and *.exe files. An initial implementation of the POEM services module, with a test driver, is included. It has been tested under UNIX but there have been intermittent execution failures under Windows (however this may be due to user error, like not leaving enough stack space or something). It does compile and link cleanly under Borland C++ 4.0. 4.1 ObjectSGML 4.1.1 Function Supports parser event API with additional prolog events. Basic code is tested code from YASP 1.35 (IBM product code), but restructured for C++. ESIS events now have initial property-based, state-driven, object-oriented interface, requiring minimal knowledge of parser objects. RAST handles correct test cases according to the conformance testing standard. However, for incorrect test cases, it currently issues an informative parser error message instead of the required "#ERROR". This ought to be fixed, but should not impede testing. 4.1.2 Coding Converted to C++ and largely restructured to take advantage of it. Does not yet conform to all style conventions, comments, name space protection, etc. No compilation warnings in OS/2; the only warnings in Borland are for functions that were not compiled 'inline' because they contained IF, DO, or other flow control structures. 4.1.3 Environments Coded and tested in OS/2 2.1 32-bit environment and in Borland C++ 4.0 16-bit DOS and Windows environments. (Bugs in Borland 3.1 enum handling have also been worked around.) Should port to 32-bit UNIX environment without difficulty. 4.1.4 YAO knowledge base Some problems were detected when running industry-standard test suites. They are described in YAOKB.TXT. Because of resource constraints, it was not possible to analyze or repair them. As no support is currently committed for Project YAO software, please discuss problems and solutions with fellow users on comp.text.sgml. 4.2 POEM 4.2.1 Function Preliminary version of POEM services module, tentatively known as Virtual Entity to Real Storage Environment (VERSE) server, has been implemented. However, external identifier mapping to VERSE services is not yet implemented. 4.2.2 Coding Written from scratch in C++ but there is no attempt to use virtuoso features. 4.2.3 Environments Coded with Gnu compiler under Unix and ported to Borland C++ 4.0 under Windows. Compiles and links cleanly under Borland, but there are intermittent execution problems. Should port to OS/2 without difficulty. 5. RELEASE SCHEDULE There is no commitment to future releases at the present time. Users of the code may wish to investigate the following enhancements: 1. Improved error handling, with message processing and specification of recovery options in domain of application (probably by enrolling an error handler when a parsing context is constructed). Revision of RAST to handle error cases with standard message. 2. Implementation of POEM and revision of ObjectSGML to be POEM client.