Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 22 Feb 89 17:32:16 EST Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Feb 89 13:30:31 PST Date: Wed, 22 Feb 89 00:14:09 PST From: Thom Linden To: Common Lisp mailing Message-ID: <890222.001409.baggins@almvma> Subject: cs proposal part 2 %---------------------------------------------------------------------- %---------------------------------------------------------------------- \newcommand{\edithead}{\begin{tabular}{l p{3.95in}} \multicolumn{2}{l} } \newcommand{\csdag}{\bf$\Rightarrow$\ddag} \newcommand{\editstart}{} \newcommand{\editend}{\\ & \end{tabular}} %---------------------------------------------------------------------- %---------------------------------------------------------------------- \appendix \chapter{Editorial Modifications to CLtL} The following sections specify the editorial changes needed in CLtL to support the proposal. Section/subsection numbers and titles match those found in \cite{steele84}. The notation {\csdag x (pn, function)} denotes a reference to paragraph x within the subsection (we count each individual example or metastatement as 1 paragraph of text). Also, {\bf (pn, function)}, or simply {\bf (pn)} is included as an additional aid to the reader indicating the page number and function modified. When an entire paragraph is deleted, the first few words of the paragraph is noted. If a section or paragraph of CLtL is {\em not} referenced, no editorial changes are required to support this proposal. \footnote{This may be an over optimistic statement since the changes are fairly pervasive. The editor should take the sense of Chapter 1 into account in resolving any discrepancies.} %---------------------------------------------------------------------- \setcounter{section}{1} \section{Data Types} % 2 %---------------------------------------------------------------------- \edithead {\csdag 8 (p12)} \editstart \\ \bf replace & \cltxt provides for a rich character set, including ways to represent characters of various type styles. \\ \bf with & \cltxt provides support for international language characters as well as characters used in specialized arenas, eg. mathematics. \editend \setcounter{subsection}{1} \subsection{Characters} % 2.2. \edithead {\csdag 1 (p20)} \editstart \\ \bf replace & \cltxt Characters are represented as data objects of type {\clkwd character}. There are two subtypes of interest, called {\clkwd standard-char} and {\clkwd string-char}. \\ \bf with & \cltxt Characters are represented as data objects of type {\clkwd character}. \editend \\ \edithead {\csdag 2 (p20)} \editstart \\ \bf replace & \cltxt This works well enough for printing characters. Non-printing characters \\ \bf with & \cltxt This works well enough for graphic characters. Non-graphic characters \editend \subsubsection{Standard Characters} % 2.2.1. \edithead {\csdag 1 before (p20)} \editstart \\ \bf insert & \cltxt A {\em character repertoire} defines a collection of characters independent of their specific rendered image or font. Character repertoires are specified independent of coding and their characters are only identified with a unique label, a graphic symbol, and a character description. A {\em coded character set} is a character repertoire plus an {\em encoding} providing a unique mapping between each character and a number which serves as the character representation. \\ & Common LISP requires all implementations to support a {\em standard} character subrepertoire. Typically, an implementation incorporates the standard characters as a subset of a larger repertoire corresponding to a frequently used set of characters, or base coded character set. The term {\em base character repertoire} refers to the collection of characters represented by the base coded character set. \editend \\ \edithead {\csdag 1 before (p20)} \editstart \\ \bf insert & \cltxt The {\clkwd base-character} type is defined as a subtype of {\clkwd character}. A {\clkwd base-character} object can contain any member of the base character repertoire. Objects of type {\clkwd (and character (not base-character))} are referred to as {\em extended characters}. \editend \\ \edithead {\csdag 1 (p20)} \editstart \\ \bf delete & \cltxt Common LISP defines a "standard character set" ... \editend \\ \edithead {\csdag 1 (P20)} \editstart \\ \bf new & \cltxt The Common LISP standard character subrepertoire consists of a newline \#$\backslash${\clkwd Newline}, the graphic space character \#$\backslash${\clkwd Space}, and the following additional ninety-four graphic characters or their equivalents: \editend \\ \edithead {\csdag 2 (p21)} \editstart \\ \bf delete & \cltxt ! " \# ... \editend \\ \edithead {\csdag 2 new (p21)} \editstart \\ & {\bf Common LISP Standard Character Subrepertoire} \editend \footnote{\cltxt \#$\backslash${\clkwd Space} and \#$\backslash${\clkwd Newline} are omitted. graphic labels and descriptions are from ISO 6937/2. The first letter of the graphic label categorizes the character as follows: L - Latin, N - Numeric, S - Special .} \\ {\small \begin{tabular}{||l|c|l||l|c|l||} \hline Label & Glyph & Name or description & Label & Glyph & Name or description \\ \hline LA01 & a & small a & ND01 & 1 & digit 1 \\ \hline LA02 & A & capital A & ND02 & 2 & digit 2 \\ \hline LB01 & b & small b & ND03 & 3 & digit 3 \\ \hline LB02 & B & capital B & ND04 & 4 & digit 4 \\ \hline LC01 & c & small c & ND05 & 5 & digit 5 \\ \hline LC02 & C & capital C & ND06 & 6 & digit 6 \\ \hline LD01 & d & small d & ND07 & 7 & digit 7 \\ \hline LD02 & D & capital D & ND08 & 8 & digit 8 \\ \hline LE01 & e & small e & ND09 & 9 & digit 9 \\ \hline LE02 & E & capital E & ND10 & 0 & digit 0 \\ \hline LF01 & f & small f & SC03 & \$ & dollar sign \\ \hline LF02 & F & capital F & SP02 & ! & exclamation mark \\ \hline LG01 & g & small g & SP04 & " & quotation mark \\ \hline LG02 & G & capital G & SP05 & \apostrophe & apostrophe \\ \hline LH01 & h & small h & SP06 & ( & left parenthesis \\ \hline LH02 & H & capital H & SP07 & ) & right parenthesis \\ \hline LI01 & i & small i & SP08 & , & comma \\ \hline LI02 & I & capital I & SP09 & \_ & low line \\ \hline LJ01 & j & small j & SP10 & - & hyphen or minus sign \\ \hline LJ02 & J & capital J & SP11 & . & full stop, period \\ \hline LK01 & k & small k & SP12 & / & solidus \\ \hline LK02 & K & capital K & SP13 & : & colon \\ \hline LL01 & l & small l & SP14 & ; & semicolon \\ \hline LL02 & L & capital L & SP15 & ? & question mark \\ \hline LM01 & m & small m & SA01 & + & plus sign \\ \hline LM02 & M & capital M & SA03 & $<$ & less-than sign \\ \hline LN01 & n & small n & SA04 & = & equals sign \\ \hline LN02 & N & capital N & SA05 & $>$ & greater-than sign \\ \hline LO01 & o & small o & SM01 & \# & number sign \\ \hline LO02 & O & capital O & SM02 & \% & percent sign \\ \hline LP01 & p & small p & SM03 & \& & ampersand \\ \hline LP02 & P & capital P & SM04 & * & asterisk \\ \hline LQ01 & q & small q & SM05 & @ & commercial at \\ \hline LQ02 & Q & capital Q & SM06 & [ & left square bracket \\ \hline LR01 & r & small r & SM07 & $\backslash$ & reverse solidus \\ \hline LR02 & R & capital R & SM08 & ] & right square bracket \\ \hline LS01 & s & small s & SM11 & \{ & left curly bracket \\ \hline LS02 & S & capital S & SM13 & $|$ & vertical bar \\ \hline LT01 & t & small t & SM14 & \} & right curly bracket \\ \hline LT02 & T & capital T & SD13 & \bq & grave accent \\ \hline LU01 & u & small u & SD15 & $\hat{ }$ & circumflex accent \\ \hline LU02 & U & capital U & SD19 & $\tilde{ }$ & tilde \\ \hline LV01 & v & small v & & & \\ \hline LV02 & V & capital V & & & \\ \hline LW01 & w & small w & & & \\ \hline LW02 & W & capital W & & & \\ \hline LX01 & x & small x & & & \\ \hline LX02 & X & capital X & & & \\ \hline LY01 & y & small y & & & \\ \hline LY02 & Y & capital Y & & & \\ \hline LZ01 & z & small z & & & \\ \hline LZ02 & Z & capital Z & & & \\ \hline \end{tabular} } \\ \edithead {\csdag 3 (p21)} \editstart \\ \bf delete & \cltxt @ A B C... \editend \\ \edithead {\csdag 4 (p21)} \editstart \\ \bf delete & \cltxt \bq a b c... \editend \\ \edithead {\csdag 5 (p21)} \editstart \\ \bf delete & \cltxt The Common LISP Standard character set is apparently ... \editend \\ \edithead {\csdag 6 (p21)} \editstart \\ \bf replace & \cltxt Of the ninety-four non-blank printing characters \\ \bf with & \cltxt Of the ninety-five graphic characters \editend \\ \edithead {\csdag 9 (p21)} \editstart \\ \bf delete & \cltxt The following characters are called ... \editend \\ \edithead {\csdag 10 (p21)} \editstart \\ \bf delete & \cltxt {\clkwd \#$\backslash$Backspace \#$\backslash$Tab } ... \editend \\ \edithead {\csdag 11 (p21)} \editstart \\ \bf delete & \cltxt Not all implementations of Common ... \editend \subsubsection{Line Divisions} % 2.2.2. \edithead {\csdag 6 (p22)} \editstart \\ \bf replace & \cltxt a two-character sequence, such as {\clkwd \#$\backslash$Return } and then {\clkwd \#$\backslash$Newline }, is not acceptable, \\ \bf with & \cltxt a two-character sequence is not acceptable, \editend \\ \edithead {\csdag 8 (p22)} \editstart \\ \bf delete & \cltxt Implementation note: If an implementation uses ... \editend \subsubsection{Non-standard Characters} % 2.2.3. \edithead {\csdag delete entire section (p23)} \editstart \editend \subsubsection{Character Attributes} % 2.2.4. \edithead {\csdag 0 section heading (p23)} \editstart \\ \bf replace & \cltxt Character Attributes \\ \bf with & \cltxt Character Identity \editend \\ \edithead {\csdag 1 through 8 (p23)} \editstart \\ \bf delete all paragraphs& \cltxt Every object of type {\clkwd character} ... \editend \\ \edithead {\csdag 1 (p23)} \editstart \\ \bf new & \cltxt Characters are uniquely distinguished by their codes, which are drawn from the set of non-negative integers. That is, within Common LISP a unique numerical code is assigned to each semantically different character. \\ & Common LISP characters are partitioned into a unique collection of repertoires called {\em character registries}. That is, each character is included in one and only one character registry. \\ & Character codes are composed from a character registry and a character label. The convention by which a character registry and character label compose a character code is implementation dependent. \editend \subsubsection{String Characters} % 2.2.5. \edithead {\csdag delete entire section (p23)} \editstart \editend \setcounter{subsection}{4} \subsubsection{Character Registries} % 2.2.5. \edithead {\csdag new section (p23)} \editstart \\ \bf new & \cltxt An implementation must document the registries it supports. Registries must be uniquely named using only {\clkwd standard-p} characters. For each registry supported, an implementation must define the individual characters supported including at least the following: \begin{itemize} \item Character Labels, Glyphs, and Descriptions. \item Reader Canonicalization. \item Effect of character predicates. \begin{itemize} \item {\clkwd alpha-char-p} \item {\clkwd lower-case-p} \item {\clkwd upper-case-p} \item {\clkwd both-case-p} \item {\clkwd graphic-char-p} \item {\clkwd alphanumericp} \end{itemize} \item Interaction with File I/O. In particular, the coded character set standards \footnote{For example, ISO8859/1-1987.} and external encoding schemes which are supported must be specified. \end{itemize} \editend \subsection{Symbols} % 2.3. \edithead {\csdag 12 (p25)} \editstart \\ \bf replace & \cltxt A symbol may have uppercase letters, lowercase letters, or both in its print name. \\ \bf with & \cltxt A symbol may have characters from any supported character registry in its print name. It may have uppercase letters, lowercase letters, or both. \editend \setcounter{subsection}{4} \subsection{Arrays} \subsubsection{Vectors} \edithead {\csdag 6 (p29)} \editstart \\ \bf replace & \cltxt All implementations provide specialized arrays for the cases when the components are characters (or rather, a special subset of the characters); \\ \bf with & \cltxt All implementations provide specialized arrays for the cases when the components are characters (or optionally, special subsets of the characters); \editend \subsubsection{Strings} \edithead {\csdag 1 (p30)} \editstart \\ \bf replace & \cltxt A string is simply a vector of characters. More precisely, a string is a specialized vector whose elements are of type {\clkwd string-char}. \\ \bf with & \cltxt A string is simply a vector of characters. More precisely, a string is a specialized vector whose elements are of type {\clkwd character} or a subtype of character. \editend \setcounter{subsection}{14} \subsection{Overlap, Inclusion, and Disjointness of Types} % 2.15. \edithead {\csdag 14 (p34)} \editstart \\ \bf replace & \cltxt The type {\clkwd standard-char} is a subtype of {\clkwd string-char}; {\clkwd string-char} is a subtype of {\clkwd character}. \\ \bf with & \cltxt The type {\clkwd base-character} is a subtype of {\clkwd character}. The type {\clkwd string-char} is implementation defined as either {\clkwd base-character} or {\clkwd character}. \editend \\ \edithead {\csdag 15 (p34)} \editstart \\ \bf replace & \cltxt The type {\clkwd string} is a subtype of {\clkwd vector}, for {\clkwd string} means {\clkwd (vector string-char)}. \\ \bf with & \cltxt The type {\clkwd string} is a subtype of {\clkwd vector}, {\clkwd string} consists of vectors specialized by subtypes of {\clkwd character}. \editend \\ \edithead {\csdag 15 after (p34)} \editstart \\ \bf insert & \cltxt The type {\clkwd base-string} means {\clkwd (vector base-character)}. \editend \\ \edithead {\csdag 15 after (p34)} \editstart \\ \bf insert & \cltxt The type {\clkwd general-string} means {\clkwd (vector character)} and is a subtype of {\clkwd string}. \editend \\ \edithead {\csdag 20 (p34)} \editstart \\ \bf replace & \cltxt {\clkwd (simple-array string-char (*))}; \\ \bf with & \cltxt {\clkwd (and string simple-array)}; \editend \\ \edithead {\csdag 20 after (p34)} \editstart \\ \bf insert & \cltxt The type {\clkwd simple-base-string} means {\clkwd (simple-array base-character (*))} and is the most efficient string which can hold the standard characters. {\clkwd simple-base-string} is a subtype of {\clkwd base-string}. \editend \\ \edithead {\csdag 20 after (p34)} \editstart \\ \bf insert & \cltxt The type {\clkwd simple-general-string} means {\clkwd (simple-array character (*))}. {\clkwd simple-general-string} is a subtype of {\clkwd general-string}. \editend \\ \edithead {\csdag 22 after (p34)} \editstart \\ \bf replace & \cltxt The type {\clkwd simple-string} is a subtype of {\clkwd string}. (Note that although {\clkwd string} is a subtype of {\clkwd vector, simple-string} is not a subtype of {\clkwd simple-vector}. \\ \bf with & \cltxt The type {\clkwd simple-string} is a subtype of {\clkwd string}, {\clkwd simple-string} consists of simple vectors specialized by subtypes of {\clkwd character}. (Note that although {\clkwd string} is a subtype of {\clkwd vector, simple-string} is not a subtype of {\clkwd simple-vector}. \editend %---------------------------------------------------------------------- \setcounter{section}{3} \section{Type Specifiers} % 4 %---------------------------------------------------------------------- \setcounter{subsection}{1} \subsection{Type Specifier Lists} % 4.2. \edithead {\csdag 8 Table 4-1 (alphabetic list) (p43)} \editstart \\ \bf remove & \\ & \cltxt {\clkwd standard-char} \\ & {\clkwd string-char} \editend \\ \edithead {\csdag 8 Table 4-1 (alphabetic list) (p43)} \editstart \\ \bf insert & \\ & \cltxt {\clkwd base-character} \\ & {\clkwd base-string} \\ & {\clkwd general-string} \\ & {\clkwd simple-base-string} \\ & {\clkwd simple-general-string} \editend \setcounter{subsection}{2} \subsection{Predicating Type Specifiers} % 4.3. \edithead {\csdag 2 (p43)} \editstart \\ \bf delete & \cltxt As an example, the entire ... \editend \\ \edithead {\csdag 3 delete example (p43)} \editstart \\ \bf delete & \cltxt {\clkwd (deftype string-char () } ... \editend \setcounter{subsection}{4} \subsection{Type Specifiers That Specialize} % 4.5. \edithead {\csdag 5 after (p46)} \editstart \\ \bf insert & \cltxt {\clkwd (character {\em repertoire})} \\ & This denotes a character type specialized to members of the specified repertoire. {\em Repertoire} may be {\clkwd :base} or {\clkwd :standard} or any supported character registry name or a list of names. \editend \setcounter{subsection}{5} \subsection{Type Specifiers That Abbreviate} % 4.6. \edithead {\csdag 20 (p49)} \editstart \\ \bf replace & \cltxt Means the same as {\clkwd (array string-char ({\em size}))}: the set of strings of the indicated size. \\ \bf with & \cltxt Means the union of the vector types specialized by subtypes of character and the indicated size. For the purpose of object creation, it is equivalent to {\clkwd (general-string ({\em size}))}. \editend \\ \edithead {\csdag 23 (p49)} \editstart \\ \bf replace & \cltxt Means the same as {\clkwd (simple-array string-char ({\em size}))}: the set of simple strings of the indicated size. \\ \bf with & \cltxt Means the union of the simple vector types specialized by subtypes of character and the indicated size. For the purpose of object creation, it is equivalent to {\clkwd (simple-general-string ({\em size}))}. \editend \\ \edithead {\csdag 23 after (p49)} \editstart \\ \bf insert & \cltxt {\clkwd (base-string {\em size})} \\ & Means the same as {\clkwd (array base-character ({\em size}))}: the set of base strings of the indicated size. \\ & {\clkwd (simple-base-string {\em size})} \\ & Means the same as {\clkwd (simple-array base-character ({\em size}))}: the set of simple base strings of the indicated size. \editend \\ \edithead {\csdag 23 after (p49)} \editstart \\ \bf insert & \cltxt {\clkwd (general-string {\em size})} \\ & Means the same as {\clkwd (array character ({\em size}))}: the set of base strings of the indicated size. \\ & {\clkwd (simple-general-string {\em size})} \\ & Means the same as {\clkwd (simple-array general-character ({\em size}))}: the set of simple general strings of the indicated size. \editend \setcounter{subsection}{7} \subsection{Type Conversion Function} % 4.8. \edithead {\csdag 6 (p51)} \editstart \\ \bf replace & \cltxt Some strings, symbols, and integers may be converted to characters. If {\em object} is a string of length 1, then the sole element of the print name is returned. If {\em object} is a symbol whose print name is of length 1, then the sole element of the print name is returned. If {\em object} is an integer {\em n}, then {\clkwd (int-char } {\em n}{\clkwd )} is returned. See {\clkwd character}. \\ \bf with & \cltxt Some strings amd symbols may be converted to characters. If {\em object} is a string of length 1, then the sole element of the print name is returned. If {\em object} is a symbol whose print name is of length 1, then the sole element of the print name is returned. See {\clkwd character}. \editend \\ \edithead {\csdag 6 after (p52)} \editstart \\ \bf insert & \begin{itemize} \cltxt \item Any string subtype may be converted to any other string subtype, provided the new string can contain all actual elements of the old string. It is an error if it cannot. \end{itemize} \editend %---------------------------------------------------------------------- \setcounter{section}{5} \section{Predicates} % 6 %---------------------------------------------------------------------- \edithead {\csdag 2 (p71)} \editstart \\ \bf replace & \cltxt but {\clkwd standard-char} begets {\clkwd standard-char-p} \\ \bf with & \cltxt but {\clkwd bit-vector} begets {\clkwd bit-vector-p} \editend \setcounter{subsection}{1} \subsection{Data Type Predicates} % 6.2. \setcounter{subsubsection}{1} \subsubsection{Specific Data Type Predicates} % 6.2.2. \edithead {\csdag 36 (p75)} \editstart \\ \bf replace & \cltxt {\clkwd characterp} {\em object} \\ \bf with & \cltxt {\clkwd characterp} {\em object} \&{\clkwd optional} {\em repertoire} \editend \\ \edithead {\csdag 37 (p75)} \editstart \\ \bf replace & \cltxt {\clkwd characterp} is true if its argument is a character, and otherwise is false. \\ \bf with & \cltxt If {\em repertoire} is omitted, {\clkwd characterp} is true if its argument is a character object, and otherwise is false. If a {\em repertoire} argument is specified, {\clkwd characterp} is true if its argument is a character object and a member of the specified repertoire, and otherwise is false. For example, {\clkwd (characterp \#$\backslash$A} {\clkwd :Latin)} is true since \#$\backslash$A is a member of the Latin character registry. {\em repertoire} may be any supported character registry name or the names {\clkwd :base} or {\clkwd :standard}. {\clkwd (characterp x :base)} is true if its argument is a member of the base character repertoire and false otherwise. {\clkwd (characterp x :standard)} is true if its argument is a member of the standard character subrepertoire and false otherwise. \editend \\ \edithead {\csdag 38 (p75)} \editstart \\ \bf replace & \cltxt {\clkwd (characterp x) $\equiv$ (typep x \apostrophe character)} \\ \bf with & \cltxt {\clkwd (characterp x :standard) $\equiv$ (typep x \apostrophe (character :standard)} \editend \\ \edithead {\csdag 72 (p76)} \editstart \\ \bf replace & \cltxt See also {\clkwd standard-char-p, string-char-p, streamp,} \\ \bf with & \cltxt See also {\clkwd standard-char-p, streamp,} \editend \setcounter{subsubsection}{2} \subsubsection{Equality Predicates} % 6.2.3. \edithead {\csdag 75 (p81)} \editstart \\ \bf replace & \cltxt which ignores alphabetic case and certain other attributes of characters; \\ \bf with & \cltxt which ignores alphabetic case of characters; \editend %---------------------------------------------------------------------- \setcounter{section}{6} \section{Control Structure} % 7 %---------------------------------------------------------------------- \setcounter{subsection}{1} \subsection{Generalized Variables} % 7.2. \edithead {\csdag 19 modify table (p95)} \editstart \\ \bf replace & \cltxt char string-char \\ & schar string-char \\ \bf with & \cltxt char character \\ & schar character \editend \\ \edithead {\csdag 22 table entry (p96)} \editstart \\ \bf delete & \cltxt char-bit first set-char-bit \editend %---------------------------------------------------------------------- \setcounter{section}{9} \section{Symbols} % 10 %---------------------------------------------------------------------- \edithead {\csdag 3 (p163)} \editstart \\ \bf replace & \cltxt It is ordinarily not permitted to alter a symbol's print name. \\ \bf with & \cltxt It is an error to alter a symbol's print name. \editend \setcounter{subsection}{1} \subsection{The Print Name} % 10.2. \edithead {\csdag 5 (p168)} \editstart \\ \bf replace & \cltxt It is an extremely bad idea \\ \bf with & \cltxt It is an error and an extremely bad idea \editend %---------------------------------------------------------------------- \setcounter{section}{10} \section{Packages} % 11 %---------------------------------------------------------------------- \setcounter{subsection}{6} \subsection{Package System Functions and Variables} % 11.7. \edithead {\csdag 31 (p184,intern)} \editstart \\ \bf append & \cltxt All strings, base and extended, are acceptable {\em string} arguments. \editend %---------------------------------------------------------------------- \setcounter{section}{12} \section{Characters} % 13 %---------------------------------------------------------------------- \edithead {\csdag 6 after (p233)} \editstart \\ \bf insert & \cltxt {\clkwd char-code-limit} [{\clkwd Constant}] \\ & The value of {\clkwd char-code-limit} is a non-negative integer that is the upper exclusive bound on values produced by the function {\clkwd char-code}, which returns the {\em code} of a given character; that is, the values returned by {\clkwd char-code} are non-negative and strictly less than the value of {\clkwd char-code-limit}. There may be unassigned codes between 0 and {\clkwd char-code-limit} which are not legal arguments to {\clkwd code-char}. \\ & \cltxt {\clkwd *all-character-registry-names*} [{\clkwd Variable}] \\ & The value of {\clkwd *all-character-registry-names*} is a list of all character registry names supported by the implementation. \editend \setcounter{subsection}{0} \subsection{Character Attributes} % 13.1. \edithead {\csdag replace entire section (p233)} \editstart \\ \bf with & \cltxt Earlier versions of Common LISP incorporated {\em font} and {\em bits} as attributes of character objects. These are considered implementation-defined attributes and if supported by an implementation effect the action of selected functions. In particular, the following effects are noted: \\ & \begin{itemize} \item Attributes, such as those dealing with how the character is displayed or its typography, are not part of the character code. For example, bold-face, color or size are not considered part of the character code. \item If two characters differ in any attributes, then they are not {\clkwd char=}. \item If two characters have identical attributes, then their ordering by {\clkwd char}$<$ is consistent with the numerical ordering by the predicate $<$ on their code attributes. (Similarly for {\clkwd char}$>$, {\clkwd char}$>=$ and {\clkwd char}$<=$.) \item The effect, if any, on {\clkwd char-equal} of each attribute has to be specified as part of the definition of that attribute. \item The effect of {\clkwd char-upcase} and {\clkwd char-downcase} is to preserve attributes. \item The function {\clkwd char-int} is equivalent to {\clkwd char-code} if no attributes are associated with the character object. \item The function {\clkwd int-char} is equivalent to {\clkwd code-char} if no attributes are associated with the character object. \item It is implementation dependent whether characters within double quotes have attributes removed. \item It is implementation dependent whether attributes are removed from symbol names by {\clkwd read}. \end{itemize} \editend \setcounter{subsection}{1} \subsection{Predicates on Characters} % 13.2. \edithead {\csdag 3 (p234)} \editstart \\ \bf replace & \cltxt argument is a "standard character" that is, an object of type {\clkwd standard-char}. Note that any character with a non-zero {\em bits} or {\em font} attribute is non-standard. \\ \bf with & \cltxt argument is one of the Common LISP standard character subrepertoire. \editend \\ \edithead {\csdag 4 (p234)} \editstart \\ \bf delete & \cltxt Note that any character with non-zero ... \editend \\ \edithead {\csdag 6 (p235)} \editstart \\ \bf replace & \cltxt Of the standard characters all but \#$\backslash${\clkwd Newline} are graphic. The semi-standard characters \#$\backslash${\clkwd Backspace}, \#$\backslash${\clkwd Tab}, \#$\backslash${\clkwd Rubout}, \#$\backslash${\clkwd Linefeed}, \#$\backslash${\clkwd Return}, and \#$\backslash${\clkwd Page} are not graphic. \\ \bf with & \cltxt Of the standard characters all but \#$\backslash${\clkwd Newline} are graphic. \editend \\ \edithead {\csdag 7 (p235)} \editstart \\ \bf delete & \cltxt Programs may assume that graphic ... \editend \\ \edithead {\csdag 8 (p235)} \editstart \\ \bf delete & \cltxt Any character with a non-zero bits... \editend \\ \edithead {\csdag 9 (p235)} \editstart \\ \bf delete & \cltxt {\clkwd string-char-p} ... \editend \\ \edithead {\csdag 10 (p235)} \editstart \\ \bf delete & \cltxt The argument {\em char} must be ... \editend \\ \edithead {\csdag 13 (p235)} \editstart \\ \bf replace & \cltxt If a character is alphabetic, then it is perforce graphic. Therefore any character with a non-zero bits attribute cannot be alphabetic. Whether a character is alphabetic is may depend on its font number. \\ \bf with & \cltxt If a character is alphabetic, then it is perforce graphic. \editend \\ \edithead {\csdag 22 (p236)} \editstart \\ \bf replace & \cltxt If a character is either uppercase or lowercase, it is necessarily alphabetic (and therefore is graphic, and therefore has a zero bits attribute). However, it is permissible in theory for an alphabetic character to be neither uppercase nor lowercase (in a non-Roman font, for example). \\ \bf with & \cltxt If a character is either uppercase or lowercase, it is necessarily alphabetic (and therefore is graphic). \editend \\ \edithead {\csdag 25 (p236)} \editstart \\ \bf replace & \cltxt The argument {\em char} must be a character object, and {\em radix} must be a non-negative integer. If {\em char} is not a digit of the radix specified \\ \bf with & \cltxt The argument {\em char} must be in the standard character subrepertoire and {\em radix} must be a non-negative integer. If {\em char} is not a standard character or is not a digit of the radix specified \editend \\ \edithead {\csdag 51 (p237)} \editstart \\ \bf delete & \cltxt If two characters have the same bits ... \editend \\ \edithead {\csdag 52 (p237)} \editstart \\ \bf replace & \cltxt If two characters differ in any attribute (code, bits, or font), then they are different. \\ \bf with & \cltxt If the codes of two characters differ, then they are different. \editend \\ \edithead {\csdag 94 (p239)} \editstart \\ \bf replace & \cltxt The predicate {\clkwd char-equal} is like {\clkwd char=}, and similarly for the others, except according to a different ordering such that differences of bits attributes and case are ignored, and font information is taken into account in an implementation dependent manner. \\ \bf with & \cltxt The predicate {\clkwd char-equal} is like {\clkwd char=}, and similarly for the others, except according to a different ordering such that differences of case are ignored. \editend \\ \edithead {\csdag 97 example (p239)} \editstart \\ \bf delete & \cltxt {\clkwd (char-equal \#$\backslash$A \#$\backslash$Control-A) is true} \editend \\ \edithead {\csdag 98 (p239)} \editstart \\ \bf delete & \cltxt The ordering may depend on the font ... \editend \setcounter{subsection}{2} \subsection{Character Construction and Selection} % 13.3. \edithead {\csdag 3 (p239)} \editstart \\ \bf replace & \cltxt The argument {\em char} must be a character object. {\clkwd char-code} returns the {\em code} attribute of the character object; this will be a non-negative integer less than the (normal) value \\ \bf with & \cltxt The argument {\em char} must be a character object. {\clkwd char-code} returns the {\em code} of the character object; this will be a non-negative integer less than the value \editend \\ \edithead {\csdag 4 (p240)} \editstart \\ \bf delete & \cltxt {\clkwd char-bits } ... \editend \\ \edithead {\csdag 5 (p240)} \editstart \\ \bf delete & \cltxt The argument {\em char} must be ... \editend \\ \edithead {\csdag 6 (p240)} \editstart \\ \bf delete & \cltxt {\clkwd char-font } ... \editend \\ \edithead {\csdag 7 (p240)} \editstart \\ \bf delete & \cltxt The argument {\em char} must be ... \editend \\ \edithead {\csdag 8 (p240)} \editstart \\ \bf replace & \cltxt {\clkwd code-char {\em code} \&optional {\em (bits 0) (font 0)} [{\em Function}]} \\ \bf with & \cltxt {\clkwd code-char {\em code} [{\em Function}]} \editend \\ \edithead {\csdag 9 (p240)} \editstart \\ \bf replace & \cltxt All three arguments must be non-negative integers. If it is possible in the implementation to construct a character object whose code attribute is {\em code}, whose bits attribute is {\em bits}, and whose font attribute is {\em font}, then such an object is returned; \\ \bf with & \cltxt The argument must be a non-negative integer. If it is possible in the implementation to construct a character object identified by {\em code}, then such an object is returned; \editend \\ \edithead {\csdag 10 (p240)} \editstart \\ \bf replace & \cltxt For any integers, {\em c, b,} and {\em f}, if {\clkwd (code-char {\em c b f})} is \\ \bf with & \cltxt For any integer, {\em c}, if {\clkwd (code-char {\em c})} is \editend \\ \edithead {\csdag 12 (p240)} \editstart \\ \bf delete & \cltxt {\clkwd (char-bits (code-char } ... \editend \\ \edithead {\csdag 13 (p240)} \editstart \\ \bf delete & \cltxt {\clkwd (char-font (code-char } ... \editend \\ \edithead {\csdag 14 (p240)} \editstart \\ \bf delete & \cltxt If the font and bits attributes ... \editend \\ \edithead {\csdag 15 (p240)} \editstart \\ \bf delete & \cltxt {\clkwd (char= (code-char (char-code ...} \editend \\ \edithead {\csdag 16 (p240)} \editstart \\ \bf delete & \cltxt is true. \editend \\ \edithead {\csdag 17 (p240)} \editstart \\ \bf delete & \cltxt {\clkwd make-char} ... \editend \\ \edithead {\csdag 18 (p240)} \editstart \\ \bf delete & \cltxt The argument {\em char} must be ... \editend \\ \edithead {\csdag 19 (p240)} \editstart \\ \bf delete & \cltxt If {\em bits} or {\em font} are zero ... \editend \\ \edithead {\csdag 19 (p240)} \editstart \\ \bf append & \cltxt {\clkwd find-char} {\em label registry} [{\em Function}] \\ & {\clkwd find-char} returns a character object. The arguments {\em label} and {\em registry} are names (objects coerceable to strings as if by the function {\clkwd string}) of character registries and labels. {\em label} uniquely identifies a character within the character registry named {\em registry}. If the implementation does not support the specified character, {\clkwd nil} is returned. \editend \setcounter{subsection}{3} \subsection{Character Conversions} % 13.4. \edithead {\csdag 8 (p241)} \editstart \\ \bf replace & \cltxt {\clkwd char-upcase} returns a character object with the same font and bits attributes as {\em char}, but with possibly a different code attribute. \\ \bf with & \cltxt {\clkwd char-upcase} returns a character object with possibly a different code. \editend \\ \edithead {\csdag 10 (p241)} \editstart \\ \bf replace & \cltxt Similarly, {\clkwd char-downcase} returns a character object with the same font and bits attributes as {\em char}, but with possibly a different code attribute. \\ \bf with & \cltxt Similarly, {\clkwd char-downcase} returns a character object with possibly a different code. \editend \\ \edithead {\csdag 12 (p241)} \editstart \\ \bf delete & \cltxt Note that the action of ... \editend \\ \edithead {\csdag 13 (p241)} \editstart \\ \bf replace & \cltxt {\clkwd digit-char {\em weight} \&optional ({\em radix} 10) ({\em font} 0) [{\em Function}]} \\ \bf with & \cltxt {\clkwd digit-char {\em weight} \&optional ({\em radix} 10) [{\em Function}]} \editend \\ \edithead {\csdag 14 (p241)} \editstart \\ \bf replace & \cltxt All arguments must be integers. {\clkwd digit-char} determines whether or not it is possible to construct a character object whose font attribute is {\em font}, and whose {\em code} \\ \bf with & \cltxt All arguments must be integers. {\clkwd digit-char} determines whether or not it is possible to construct a character object whose {\em code} \editend \\ \edithead {\csdag 15 (p242)} \editstart \\ \bf replace & \cltxt {\clkwd digit-char} cannot return {\clkwd nil} if {\em font} is zero, {\em radix} \\ \bf with & \cltxt {\clkwd digit-char} cannot return {\clkwd nil}. {\em radix} \editend \\ \edithead {\csdag 22 (p242)} \editstart \\ \bf delete & \cltxt Note that no argument is provided for ... \editend \\ \edithead {\csdag 23 through 30 (p242, char-int, int-char)} \editstart \\ \bf delete & \cltxt {\clkwd char-int} {\em char} \editend \\ \edithead {\csdag 32 (p242)} \editstart \\ \bf replace & \cltxt All characters that have zero font and bits attributes and that are non-graphic \\ \bf with & \cltxt All characters that are non-graphic \editend \\ \edithead {\csdag 33 (p243)} \editstart \\ \bf replace & \cltxt The standard newline and space characters have the respective names {\clkwd Newline} and {\clkwd Space}. The semi-standard characters have the names {\clkwd Tab, Page, Rubout, Linefeed, Return,} and {\clkwd Backspace}. \\ \bf with & \cltxt The standard newline and space characters have the respective names {\clkwd Newline} and {\clkwd Space}. \editend \\ \edithead {\csdag 35 (p243)} \editstart \\ \bf delete & \cltxt {\clkwd char-name} will only locate "simple" ... \editend \\ \edithead {\csdag 36 (p243)} \editstart \\ \bf append & \cltxt {\clkwd name-char} may accept other names for characters in addition to those returned by {\clkwd char-name}. \editend \\ \edithead {\csdag 36 (p243)} \editstart \\ \bf append & \cltxt {\clkwd char-registry-name} {\em char} [{\em Function}] \\ & {\clkwd char-registry-name} returns a string representing the character registry to which {\em char} belongs. \editend \\ \edithead {\csdag 36 (p243)} \editstart \\ \bf append & \cltxt {\clkwd char-label} {\em char} [{\em Function}] \\ & {\clkwd char-label} returns a string representing the character label of {\em char}. \editend \\ \edithead {\csdag 36 (p243)} \editstart \\ \bf append & \cltxt {\clkwd char-ccs-value} {\em char name} [{\em Function}] \\ & {\clkwd char-ccs-value} returns the non-negative integer representing the encoding of the character {\em char} in The coded character set named by {\em name}. If the implementation does not support the specified coded character set, {\clkwd nil} is returned. If the named coded character set does not contain the character, {\clkwd nil} is returned. \editend \setcounter{subsection}{4} \subsection{Character Control-Bit Functions} % 13.5. \edithead {\csdag delete entire section (p243)} \editstart \editend %---------------------------------------------------------------------- \setcounter{section}{13} \section{Sequences} % 14 %---------------------------------------------------------------------- \setcounter{subsection}{0} \subsection{Simple Sequence Functions} % 14.1 \edithead {\csdag 21 (p249,make-sequence)} \editstart \\ \bf append & \cltxt If type {\clkwd string} is specified, the result is equivalent to {\clkwd make-string}. \editend %---------------------------------------------------------------------- \setcounter{section}{17} \section{Strings} % 18 %---------------------------------------------------------------------- \edithead {\csdag 1 (p299)} \editstart \\ \bf replace & \cltxt Specifically, the type {\clkwd string} is identical to the type {\clkwd (vector string-char),} which in turn is the same as {\clkwd (array string-char (*))}. \\ \bf with & \cltxt Specifically, the type {\clkwd string} is a subtype of {\clkwd vector} and consists of vectors specialized by subtypes of {\clkwd character}. \editend \setcounter{subsection}{0} \subsection{String Access} % 18.1. \edithead {\csdag 4 (p300)} \editstart \\ \bf replace & \cltxt character object. (This character will necessarily satisfy the predicate {\clkwd string-char-p}). \\ \bf with & \cltxt character object. \editend \\ \edithead {\csdag 9 (p300)} \editstart \\ \bf replace & \cltxt {\clkwd setf} may be used with {\clkwd char} to destructively replace a character within a string. \\ \bf with & \cltxt {\clkwd setf} may be used with {\clkwd char} to destructively replace a character within a string. The new character must be of a type which can be stored in the string; it is an error otherwise. \editend \setcounter{subsection}{2} \subsection{String Construction and Manipulation} % 18.3. \edithead {\csdag 2 (p302)} \editstart \\ \bf replace & \cltxt {\clkwd make-string {\em size} \&key :initial-element [{\em Function}]} \\ \bf with & \cltxt {\clkwd make-string {\em size} \&key :initial-element :element-type [{\em Function}]} \editend \\ \edithead {\csdag 3 (p302,make-string)} \editstart \\ \bf replace & \cltxt This returns a string (in fact a simple string) of length {\em size}, each of whose characters has been initialized to the {\clkwd :initial-element} argument. If an {\clkwd :initial-element} argument is not specified, then the string will be initialized in an implementation-dependent way. \\ \bf with & \cltxt This returns a string of length {\em size}, each of whose characters has been initialized to the {\clkwd :initial-element} argument. If an {\clkwd :initial-element} argument is not specified, then the string will be initialized in an implementation-dependent way. The {\clkwd :element-type} argument names the type of the elements of the string; a string is constructed of the most specialized type that can accommodate elements of the given type. If {\clkwd :element-type} is omitted, the type {\clkwd character} is the default. \editend \\ \edithead {\csdag 5 (p302,make-string)} \editstart \\ \bf replace & \cltxt A string is really just a one-dimensional array of "string characters" (that is, those characters that are members of type {\clkwd string-char}). More complex character arrays may be constructed using the function {\clkwd make-array}. \\ \bf with & \cltxt More complex character arrays may be constructed using the function {\clkwd make-array}. \editend \\ \edithead {\csdag 29 (p304,make-string)} \editstart \\ \bf replace & \cltxt If {\em x} is a string character (a character of type {\clkwd string-char}), then \\ \bf with & \cltxt If {\em x} is a character, then \editend %---------------------------------------------------------------------- \setcounter{section}{21} \section{Input/Output} % 22 \setcounter{subsection}{0} \subsection{Printed Representation of LISP Objects} % 22.1. \setcounter{subsubsection}{0} \subsubsection{What the Read Function Accepts} % 22.1.1. \edithead {\csdag Table 22-1: Standard Character Syntax Types (p336)} \editstart \\ \bf delete entry & \cltxt {\clkwd } {\em whitespace} \\ & {\clkwd } {\em whitespace} \\ & {\clkwd } {\em constituent} \\ & {\clkwd } {\em whitespace} \\ & {\clkwd } {\em constituent} \\ & {\clkwd } {\em whitespace} \editend \setcounter{subsubsection}{1} \subsubsection{Parsing of Numbers and Symbols} % 22.1.2. \edithead {\csdag Table 22-3: Standard Constituent Character Attributes (p340)} \editstart \\ \bf delete entry & \cltxt {\clkwd } {\em illegal} \\ & {\clkwd } {\em illegal} \\ & {\clkwd } {\em illegal} \\ & {\clkwd } {\em illegal} \\ & {\clkwd } {\em illegal} \\ & {\clkwd } {\em illegal} \editend \setcounter{subsubsection}{3} \subsubsection{Standard Dispatching Macro Character Syntax} % 22.1.4. \edithead {\csdag Table 22-4: Standard \# Macro Character Syntax (p352)} \editstart \\ \bf delete entry & \cltxt {\clkwd \#} {\em signals error} \\ & {\clkwd \#} {\em signals error} \\ & {\clkwd \#} {\em signals error} \\ & {\clkwd \#} {\em signals error} \\ & {\clkwd \#} {\em signals error} \\ & {\clkwd \#} {\em undefined} \editend \\ \edithead {\csdag 8 (p353)} \editstart \\ \bf replace & \cltxt The following names are standard across all implementations: \\ \bf with & \cltxt All non-graphic characters, including extended characters, are uniquely named in an implementation-dependent manner. In particular, an implementation may support names of the form {\em label:registry}. The following names are standard across all implementations: \editend \\ \edithead {\csdag 11 through 18 inclusive delete (p353)} \editstart \\ \bf delete & \cltxt The following names are semi-standard; ... \editend \\ \edithead {\csdag 20 through 26 inclusive delete (p354)} \editstart \\ \bf delete & \cltxt The following convention is used in implementations ... \editend \\ \edithead {\csdag 108 (p360)} \editstart \\ \bf replace & \cltxt {\clkwd \#, \#, \#, \#, \#} \\ \bf with & \cltxt {\clkwd \#, \#} \editend \setcounter{subsubsection}{4} \subsubsection{The Readtable} % 22.1.5. \edithead {\csdag 3 (p360)} \editstart \\ \bf replace & \cltxt Even if an implementation supports characters with non-zero {\em bits} and {\em font} attributes, it need not (but may) allow for such characters to have syntax descriptions in the readtable. However, every character of type {\clkwd string-char} must be represented in the readtable. \\ \bf with & \cltxt All base and extended characters are representable in the readtable. \editend \setcounter{subsubsection}{5} \subsubsection{What the Print Function Produces} % 22.1.6. \edithead {\csdag 13 (p366)} \editstart \\ \bf replace & \cltxt is used. For example, the printed representation of the character \#$\backslash$A with control and meta bits on would be \#$\backslash${\clkwd CONTROL-META-A}, and that of \#$\backslash$a with control and meta bits on would be \#$\backslash${\clkwd CONTROL-META-$\backslash$a}. \\ \bf with & \cltxt is used (see 22.1.4). \editend \setcounter{subsection}{2} \subsection{Output Functions} % 22.3. \setcounter{subsubsection}{0} \subsubsection{Output to Character Streams} % 22.3.1. \edithead {\csdag 26 (p384)} \editstart \\ \bf replace & \cltxt ({\em not} the substring delimited by {\clkwd :start} and {\clkwd :end}). \\ \bf with & ({\em not} the substring delimited by {\clkwd :start} and {\clkwd :end}). Only characters which are members of the coded character set(s) associated with the output stream or \#$\backslash${\clkwd Newline} are valid to be written; it is an error otherwise. All character streams must provide appropriate line division behavior for \#$\backslash${\clkwd Newline}. \editend \\ \edithead {\csdag 27 after (p384)} \editstart \\ \bf insert & \cltxt {\clkwd external-coded-string-length} {\em object} \&{\clkwd optional} {\em output-stream} [{\em Function}] \\ & {\clkwd external-coded-string-length} returns the number of implementation defined units required for the object on the output-stream. If not applicable to the output stream, the function returns {\clkwd nil}. This number corresponds to the current state of the stream and may change if there has been intervening output. If the output stream is not specified {\clkwd *standard-output*} is the default. \editend \setcounter{subsubsection}{2} \subsubsection{Formatted Output to Character Streams} % 22.3.3. \edithead {\csdag 23 delete example (p387)} \editstart \\ \bf delete & \cltxt {\clkwd (format nil "Type} $\tilde{ }$ {\clkwd :C to $\tilde{ }$ :A."} . . . \editend \\ \edithead {\csdag 66 (p389)} \editstart \\ \bf replace & \cltxt $\tilde{ }${\clkwd :C} spells out the names of the control bits and represents non-printing characters by their names: {\clkwd Control-Meta-F, Control-Return, Space}. This is a "pretty" format for printing characters. \\ \bf with & \cltxt $\tilde{ }${\clkwd :C} represents non-printing characters by their names: {\clkwd Newline, Space}. This is a "pretty" format for printing characters. \editend %---------------------------------------------------------------------- %---------------------------------------------------------------------- \setcounter{section}{22} \section{File System Interface} % 23 \setcounter{subsection}{1} \subsection{Opening and Closing Files} % 23.2. \edithead {\csdag 2 (p418)} \editstart \\ \bf replace & \cltxt {\clkwd open {\em filename} \&key :direction :element-type} {\clkwd :if-exists :if-does-not-exist} [{\em Function}] \\ \bf with & \cltxt {\clkwd open {\em filename} \&key :direction :element-type} {\clkwd :external-coded-character-format} {\clkwd :if-exists :if-does-not-exist} [{\em Function}] \editend \\ \edithead {\csdag 11 (p419)} \editstart \\ \bf replace & \cltxt {\clkwd string-char} \\ & The unit of transaction is a string-character. The functions {\clkwd read-char} and/or {\clkwd write-char} may be used on the stream. \\ \bf with & \cltxt The default value of {\clkwd :element-type} is implementation-defined as character or a subtype of character. \\ & {\clkwd base-character} \\ & The unit of transaction is a base character. The functions {\clkwd read-char} and/or {\clkwd write-char} may be used on the stream. \editend \\ \edithead {\csdag 16 (p419)} \editstart \\ \bf replace & \cltxt {\clkwd character} \\ & The unit of transaction is any character, not just a string-character. The functions {\clkwd read-char} and/or {\clkwd write-char} may be used on the stream. \\ \bf with & \cltxt {\clkwd character} \\ & The unit of transaction is any character. The functions {\clkwd read-char} and/or {\clkwd write-char} may be used on the stream. \editend \\ \edithead {\csdag 19 after (p420)} \editstart \\ \bf insert & \cltxt {\clkwd :external-coded-character-format} \\ & This argument specifies a name or list of names(s) indicating an implementation recognized scheme for representing 1 or more coded character sets with non-homogeneous codes. \\ & The default value is {\clkwd :default} and is implementation defined but must include the base characters. \\ & As many coded character set names must be provided as the implementation requires for that external coding convention. \\ & References to standard ISO coded character set names must include the full ISO reference number and approval year. The following are valid ISO reference names: :ISO8859/1-1987, :ISO6937/2-1983, :ISO646-1983, etc.. All implementation recognized schemes are formed from {\clkwd standard-p} characters. \editend %---------------------------------------------------------------------- %---------------------------------------------------------------------- %---------------------------------------------------------------------- \begin{thebibliography}{wwwwwwww 99} \bibitem[Ida87]{ida87} M. Ida, et al., {\em JEIDA Common LISP Committee Proposal on Embedding Multi-Byte Characters }, ANSI X3J13 document 87-022, (1987). \bibitem[ISO 646]{iso646} ISO, {\em Information processing -- ISO 7-bit coded character set for information interchange }, ISO (1983). \bibitem[ISO 4873]{iso4873} ISO, {\em Information processing -- ISO 8-bit code for information interchange -- Structure and rules for implementation }, ISO (1986). \bibitem[ISO 6937/1]{iso6937/1} ISO, {\em Information processing -- Coded character sets for text communication -- Part 1: General introduction }, ISO (1983). \bibitem[ISO 6937/2]{iso6937/2} ISO, {\em Information processing -- Coded character sets for text communication -- Part 2: Latin alphabetic and non-alphabetic graphic characters }, ISO (1983). \bibitem[ISO 8859/1]{iso8859/1} ISO, {\em Information processing -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1 }, ISO (1987). \bibitem[ISO 8859/2]{iso8859/2} ISO, {\em Information processing -- 8-bit single-byte coded graphic character sets -- Part 2: Latin alphabet No. 2 }, ISO (1987). \bibitem[ISO 8859/6]{iso8859/6} ISO, {\em Information processing -- 8-bit single-byte coded graphic character sets -- Part 6: Latin/Arabic alphabet }, ISO (1987). \bibitem[ISO 8859/7]{iso8859/7} ISO, {\em Information processing -- 8-bit single-byte coded graphic character sets -- Part 7: Latin/Greek alphabet }, ISO (1987). \bibitem[Kerns87]{kerns87} R. Kerns, {\em Extended Characters in Common LISP }, X3J13 Character Subcommittee document, Symbolics Inc (1987). \bibitem[Kurokawa88]{kurokawa88} T. Kurokawa, et al., {\em Technical Issues on International Character Set Handling in Lisp }, ISO/IEC SC22 WG16 document N33, (1988). \bibitem[Linden87]{linden87} T. Linden, {\em Common LISP - Proposed Extensions for International Character Set Handling }, Version 01.11.87, IBM Corporation (1987). \bibitem[Steele84]{steele84} G. Steele Jr., {\em Common LISP: the Language }, Digital Press (1984). \bibitem[Xerox87]{xerox87} Xerox, {\em Character Code Standard, Xerox System Integration Standard }, Xerox Corp. (1987). \end{thebibliography} \end{document} % End of document.  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 22 Feb 89 17:06:06 EST Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Feb 89 13:28:49 PST Date: Wed, 22 Feb 89 00:13:28 PST From: Thom Linden To: Common Lisp mailing Message-ID: <890222.001328.baggins@almvma> Subject: cs proposal part1 \documentstyle{report} % Specifies the document style. \pagestyle{headings} \title{\bf Extensions to Common LISP to Support International Character Sets} \author{ Michael Beckerle\thanks{Gold Hill Computers} \and Paul Beiser\thanks{Hewlett-Packard} \and Jerry Duggan\thanks{Hewlett-Packard} \and Robert Kerns\thanks{Independent consultant} \and Kevin Layer\thanks{Franz, Inc.} \and Thom Linden\thanks{IBM Research, Subcommittee Chair} \and Larry Masinter\thanks{Xerox Research} \and David Unietis\thanks{Lucid, Inc.} } \date{February 21, 1989} % Deleting this command produces today's date. \begin{document} \maketitle % Produces the title. \setcounter{secnumdepth}{4} \setcounter{tocdepth}{4} \tableofcontents %---------------------------------------------------------------------- %---------------------------------------------------------------------- \newfont{\cltxt}{cmr10} \newfont{\clkwd}{cmtt10} \newcommand{\apostrophe}{\clkwd '} \newcommand{\bq}{\clkwd\symbol{'22}} %---------------------------------------------------------------------- %---------------------------------------------------------------------- \chapter{Introduction} This is a proposal to the X3 J13 committee for both extending and modifying the Common LISP language definition to provide a standard basis for Common LISP support of the variety of characters used to represent the native languages of the international community. This proposal was created by the Character Subcommittee of X3 J13. We would like to acknowledge discussions with T. Yuasa and other members of the JIS Technical Working Group, comments from members of X3 J13, and the proposals \cite{ida87}, \cite{linden87}, \cite{kerns87}, and \cite{kurokawa88} for providing the motivation and direction for these extensions. As all these documents and discussions were created expressly for LISP standardization usage, we have borrowed freely from their ideas as well as the texts themselves. This document is separated into two parts. The first part explains the major language changes and their motivations. While intended as commentary to a general audience, and not explicitly as part of the standard document, the X3 J13 editor may include sections at her/his discretion. The second part, Appendix A, provides the page by page set of editorial changes to \cite{steele84}. \section{Objectives} The major objectives of this proposal are: \begin{itemize} \item To provide a consistent, well-defined scheme allowing support of both very large character sets and multiple character sets. \footnote{The distinction between the terms {\em character repertoire} and {\em coded character set} is made later. The usage of the term {\em character set}, avoided after this introduction, encompasses both terms.} Many software applications are intended for international use, or have requirements for incorporation of language elements of multiple native languages within a single application. Also, many applications require specialized languages including, for example, scientific and typesetting symbols. In order to ensure some portability of these applications, data expressed in a mixture of these languages must be treated uniformly by the software language. All character and string manipulations should operate uniformly, regardless of the character set(s) of the character objects. This applies to array indexing, readtable definitions, read symbol construction and I/O operations. \item To ensure efficient performance of string and character operations. Many native languages, such as Japanese and Chinese, use character sets which contain more characters than the Latin alphabet. Supporting larger sized character sets frequently means employing larger data fields to uniquely encode each character. Common LISP implementations using larger sized character sets can incur performance penalties in terms of space, time, or both. The use of large and/or multiple character sets by an implementation implies the need for a more complex character type representation. Given a more complex character representation, the efficiency of language operations on characters (e.g. string operations) could be affected. \item To assure forward compatibility of the proposed model and definition with existing Common LISP implementations. Developers should not be required to re-write large amounts of either LISP code or data representations in order to apply the proposed changes to existing implementations. The proposed changes should provide an easy portability path for existing code to many possible implementations. \end{itemize} There are a number of issues, some under the general rubric of internationalization, which this proposal does {\em not} cover. Among these issues are: \begin{itemize} \item Time and date formats \item Monetary formats \item Numeric punctuation \item Fonts \item Lexicographic orderings \item Right-to-left and bidirectional languages \end{itemize} %---------------------------------------------------------------------- %---------------------------------------------------------------------- %---------------------------------------------------------------------- %---------------------------------------------------------------------- \chapter{Overview} We use several terms within this document which are new in the context of Common LISP. Definitions for the following prominent terms are provided for the reader's convenience. A {\em character repertoire} defines a collection of characters independent of their specific rendered image or font. This corresponds to the mathematical notion of a {\em set} \footnote{We avoid the term {\em character set} as it has been (over)used in the context of character repertoire as well as in the context of coded character set.}. Character repertoires are specified independent of coding and their characters are only identified with a unique {\em character label}, a graphic symbol, and a character description. A {\em coded character set} is a character repertoire plus an {\em encoding} providing a unique mapping between each character and a number which serves as the character representation. There are numerous internationally standardized coded character sets; for example, \cite{iso8859/1} and \cite{iso646}. A character may be included in one or more character repertoires. Similarly, a character may be included in one or more coded character sets. For example, the Latin letter "A" is contained in the coded character set standards: ISO 8859/1, ISO 8859/2, ISO 6937/2, and others. To universally identify each character, we define a unique collection of repertoires called {\em character registries} as a partitioning of all characters. That is, each character is included in one and only one character registry. In Common LISP a {\em character} data object is identified by its {\em character code}, a unique numerical code. Each character code is composed from a character registry and a character label. Character data objects which are classified as {\em graphic}, or displayable, are each associated with a {\em glyph}. The glyph is the visual representation of the character. The primary purpose of introducing these terms is to provide a consistent naming to Common LISP concepts which are related to those found in ISO standardization of coded character sets. \footnote{The bibliography includes several relevant ISO coded character set standards.} They also serve as a demarcation between these standardization activities. For example, while Common LISP is free to define unique manipulation facilities for characters, registries and coded character sets, it should not define standard coded character sets nor standard character registries. A secondary purpose is to detach the language specification from underlying hardware representation. From a language specification viewpoint it is inconsequential whether characters occupy one or more (8-bit) bytes or whether a Common LISP implementation's internal representation for characters is distinct from or identical to any of the numerous external representations (for example, the text interchange representation \cite{iso6937/2}). We specifically do not propose any standard coded character sets. %---------------------------------------------------------------------- \section{Character Identity} Characters are uniquely distinguished by their codes, which are drawn from the set of non-negative integers. That is, within Common LISP a unique numerical code is assigned to each semantically different character. It is important to separate the notion of glyph from the notion of character data object when defining a scheme under which issues of identity can be rigorously decided by a computer language. Glyphs are the visual aspects of characters, writable on surfaces, and sometimes called 'graphics'. A language specification valid for more than a narrow range of systems can only make assumptions about the existence of {\em abstract} glyphs (for example, the Latin letter A) and not about glyph variants (for example, the italicized Latin letter {\em A}) or characteristics of display devices. Thus, an important element of this proposal is the removal of the {\em font} and {\em bits} attributes from the language specification. \footnote{These and other attributes may still be supported as implementation-defined extensions.} All functions dealing with the {\em bits} and {\em font} attributes are either removed or modified by this proposal. The deleted functions and constants include: {\em char-font-limit, char-bits-limit, int-char, char-int, char-bits, char-font, make-char, char-control-bit, char-meta-bit, char-super-bit, char-hyper-bit, char-bit, set-char-bit}. The definition in \cite{steele84} of semi-standard characters has been eliminated. This is replaced by a more uniform approach to character naming with the introduction of character registries (see below). %---------------------------------------------------------------------- \section{Character Naming} A Common LISP program must be able to name, compose and decompose characters in a uniform, portable manner, independent of any underlying representation. One possible composition is by the pair $<$ coded character set standard, decimal representation $>$ \footnote{This syntax is for illustration only and is not being proposed.}. Thus, for example, one might compose the Latin 'A' with the pair $<$ ISO8859/2-1987, 65 $>$, $<$ ISO8859/6-1987, 65 $>$, or $<$ ISO646-1983, 65 $>$, etc.. The difficulty here is two-fold. First, there are several ways to compose the same character and second, there may be multiple answers to the question: {\em To what coded character set does character object x belong?}.\footnote{Even worse, the answer might change yearly.} The identical problems occur if the pair $<$ character repertoire standard, decimal representation $>$ is used. \footnote{Existing ISO repertoires seem to be defined exclusively in the context of coded character sets and not as standards in their own right.} The concept of character registry is introduced by this proposal to resolve the problem of character naming, composition and decomposition. Each character is universally defined by the pair $<$ character registry name, character label $>$. For this to be a portable definition, it must have a standard meaning. Thus we propose the formation of an ISO Working Group to define an international {\em Character Registry Standard}. At this writing there is no existing Character Registry Standard nor ISO Working Group organized to define such a standard. \footnote{It is the intention of X3 J13 to promote and adopt an eventual ANSI or ISO Character Registry Standard. In particular, we acknowledge that X3 J13 is {\em not} the appropriate forum to define the standard. We believe it is a required component of all programming languages providing support for international characters.} Common LISP character codes are composed from a character registry and a character label. The convention by which a character label and character registry compose a character code is implementation dependent. We introduce new functions {\clkwd find-char, char-registry-name,} and {\clkwd char-label} to compose and decompose character objects. We also extend the {\clkwd characterp} predicate to support testing membership of a character in a given character registry. \footnote{ For example, testing membership in the Japanese Katakana character registry. } A global variable {\clkwd *all-character-registry-names*} is added to support application determination of supported character registries. The naming and content of the standard character registries is left unspecified by this proposal. \footnote{The only constraint is that character registries be named using only {\clkwd standard-p} characters.} Below are some candidate character registry names: \begin{itemize} \item Arabic \item Armenian \item Bo-po-mo-fo \item Control (meaning the collection of standard text communication control codes) \item Cyrillic \item Georgian \item Greek \item Hangul \item Hebrew \item Hiragana \item Japanese-Punctuation \item Kanji \item Katakana \item Latin \item Latin-Punctuation \item Mathematical \item Pattern \item Phonetic \item Technical \end{itemize} The list above is provided as a starting point for discussion and is not intended to be representative nor exhaustive. The Common LISP language definition does not depend on these names nor any specific content (for example: Where should the plus sign appear?). It is application programs which require a reliable definition of the registry names and their constituents. The Common LISP language definition imposes the framework for constructing and manipulating character objects. The proposed ISO Character Registry Standard is fixed; an implementation may not extend a standard registry's constituent set of characters beyond the standard definition. An implementation may provide support for all or part of any character registry and may provide new character registries which include characters having unique semantics (i.e. not defined in any standard character registry). Implementation registries must be uniquely named using only {\clkwd standard-p} characters. An implementation must document the registries it supports. For each registry supported the documentation must include at least the following: \begin{itemize} \item Character Labels, Glyphs, and Descriptions. \item Reader Canonicalization. \item Effect of character predicates. In particular, \begin{itemize} \item {\clkwd alpha-char-p} \item {\clkwd lower-case-p} \item {\clkwd upper-case-p} \item {\clkwd both-case-p} \item {\clkwd graphic-char-p} \item {\clkwd alphanumericp} \end{itemize} \item Interaction with File I/O. In particular, the coded character sets \footnote{For example, ISO8859/1-1987.} and external encoding schemes \footnote{For example, {\em Xerox System Integration Character Code Standard}\cite{xerox87}.} supported are documented. \end{itemize} Which coded character sets and encoding schemes are supported by the overall computing system, the details of the mapping of glyphs to characters to character codes are left unspecified by Common LISP. The diversity of glyph sets and coded character set conventions in use worldwide and the desirability of allowing Common LISP applications to portabily manipulate symbolic elements from many languages, perhaps simultaneously, mandate such a flexible approach. %---------------------------------------------------------------------- \section{Hierarchy of Types} Providing support for extensive character repertoires may impact Common LISP implementation performance in terms of space, time, or both. \footnote{This does not apply to all implementations. Unique hardware support and user community requirements must be taken into consideration.} In particular, many existing implementations support variants of the ISO 8859/1 standard. Supporting large repertoires argues for a multi-byte internal representation for each character, even if an application primarily (or exclusively) uses the ISO 8859/1 characters. This proposal extends the definition of the character and string type hierarchy to include specialized subtypes of character and string. An implementation is free to associate compact internal representation tailored to each subtype. The {\clkwd string} type specifier, when used for object creation, for example in {\clkwd make-sequence}, is defined to mean the most general string subtype supported by the implementation (similarily for the {\clkwd simple-string} type specifier). This definition emphasizes portability of existing Common LISP applications to international character environments over performance. Applications emphasizing efficiency of text processing in non-international environments will require some modification to utilize subtypes with compact internal representations. It has been suggested that either a single type is sufficient to support international characters, or that a hierarchy of types could be used, in a manner transparent to the user. A desire to provide flexibility which encourages implementations to support international characters without compromising application efficiency led us to accept the need for more than one type. We believe that these choices reflect a minimal modification of this aspect of the type system, and that exposing the types for string and character construction while requiring uniform treatment for characters otherwise is the most reasonable approach. \subsection{Character Type} The following type specifier is added as a subtype of {\clkwd character}: \begin{itemize} \item {\clkwd base-character} \end{itemize} An implementation may support additional subtypes of {\clkwd character} which may or may not be supertypes of {\clkwd base-character}. In addition, an implementation may define {\clkwd base-character} as equivalent to {\clkwd character}. Characters of type {\clkwd base-character} are referred to as {\em base characters}. Characters of type {\clkwd (and character (not base-character))} are referred to as {\em extended characters}. The base characters are distinguished in the following respects: \begin{itemize} \item The standard characters are a subrepertoire of the base characters. The selection of base characters which are not standard characters is implementation defined. \item Only members of the base character repertoire can be elements of a base string. \item The base characters are, in general, the default characters for I/O operations. \end{itemize} No upper bound is specified for the number of glyphs in the base character repertoire--that is implementation dependent. The lower bound is 96, the number of standard characters defined for Common LISP. \footnote{Or, in contrast, the base repertoire may include all implementation supported characters.} The distinction of base characters is largely a pragmatic choice. It permits efficient handling of common situations, is in some sense privileged for host system I/O, and can serve as an intermediate basis for portability, less general than the standard characters, but possibly more useful across a narrower range of implementations. Many computers have some "base" character representation which is a function of hardware instructions for dealing with characters, as well as the organization of the file system. The base character representation is likely to be the smallest transaction unit permitted for text file and terminal I/O operations. On a system with a record based I/O paradigm, the base character representation is likely to be the smallest record quantum. On many computer systems, this representation is a byte. However, there are often multiple coded character sets supportable on a computer, through the use of special display and entry hardware, which are varying interpretations of the basic system character representation. For example, ISO 8859/1 and ISO 6937/2 are two different interpretations of the same 1-byte code representations. Many countries have their own glyph-to-code mappings for 1-byte character codes addressing the special requirements of national languages. Differentiating between these, without reference to display hardware, is a matter of convention, since they all use the same set of code representations. When a single byte is not enough, two or more bytes are sometimes used for character encoding. This makes character handling even more difficult on machines where the natural representation size is a byte, since not only is the semantic value of a character code a matter of convention, which may vary within the same computing system, but so is the identification of a set of bits as a complete character code. It is the intention of this proposal that the composition of base characters is typically determined by the code capacity of the natural file system and I/O transaction representations, and the assumed display glyphs should be those of the terminals most commonly employed. There are several advantages to this scheme. Internal representation of strings of just base characters can be more compact than strings including extended characters. Source programs are likely to consist predominantly of base characters since the standard characters are a subset of the base character repertoire. Parsing of pure base character text can be more efficient than parsing of text including extended characters. I/O can be performed more simply with base characters. The standard characters are the 96 characters used in the Common LISP definition {\bf or their equivalents}. This was the Common LISP \cite{steele84} definition, but {\em equivalents} is a vague term. The standard characters are not defined by their glyphs, but by their roles within the language. There are two aspects to the roles of the standard characters: one is their role in reader and format control string syntax; the second is their role as components of the names of all Common LISP functions, macros, constants, and global variables. As long as an implementation chooses 96 glyphs and treats those 96 in a manner consistent with the language's specification for the standard characters (e.g. the naming of functions), it doesn't matter what glyphs the I/O hardware uses to represent those characters: they are the standard characters. Any program or data text written wholly in those characters is portable through simple code conversion. \footnote{For example, the currency glyph, \$ , might be replaced uniformly by the currency glyph available on a particular display.} Additional mechanisms, such as in \cite{linden87}, which support establishment of equivalency between otherwise distinct characters are not excluded by this proposal. \footnote{We believe this is an important issue but it requires additional implementation experience. We also encourage new proposals from JIS and ISO LISP Working Groups on this issue.} \subsection{String Type} The {\clkwd string} type is defined as a vector of characters. More precisely, a string is a specialized vector whose elements are of type {\clkwd character} or a subtype of character. Similarly, a simple string is a specialized simple vector whose elements are of type {\clkwd character} or a subtype of character. The following string subtypes are distinguished with standardized names: {\clkwd base-string}, {\clkwd general-string}, {\clkwd simple-base-string}, and {\clkwd simple-general-string}. All strings which are not base strings are referred to as {\em extended strings}. A base string can only contain base characters. {\clkwd general-string} is equivalent to {\clkwd (vector character)} and can contain any implementation supported base or extended characters, in any mixture. All Common LISP functions defined to operate on strings treat base and extended strings uniformly with the following caveat: for any function which inserts a character into a string, it is an error to insert an extended character into a base string. \footnote{An implementation may, optionally, provide automatic coersion to an extended string.} An implementation may support string subtypes in addition to {\clkwd base-string} and {\clkwd general-string}. For example, a hypothetical implementation supporting Arabic and Cyrillic character registries might provide as extended characters: \begin{itemize} \item {\clkwd general-string} -- may contain Arabic, Cyrillic or base characters in any mixture. \item {\clkwd region-specialized-string} -- may contain installation selected repertoire (Arabic/Cyrillic) or base characters in any mixture. \item {\clkwd base-string} -- may contain base characters \end{itemize} Though, clearly, portability of applications using {\clkwd region-specialized-string} is limited, a performance advantage might argue for its use. \footnote{{\clkwd region-specialized-string} is used here for illustration only; it is not being proposed as a standardized string subtype.} Alternatively, an implementation supporting a large base character repertoire including, say, Japanese Kanji may define {\clkwd base-character} as equivalent to {\clkwd character}. We expect that applications sensitive to the performance of character handling in some host environments will utilize the string subtypes to provide performance improvement. Applications with emphasis on international portability will likely utilize only {\clkwd general-string}s. The {\clkwd coerce} function is extended to allow for explicit coercion between base strings and extended strings. It is an error to coerce an extended character to a base character. During reader construction of symbols, if all the characters in the symbol's name are of type {\clkwd base-character}, then the name of the symbol may be stored as a base string. Otherwise it will be stored as an extended string. The base string type allows for more compact representation of strings of base characters, which are likely to predominate in any system. Note that in any particular implementation the base characters need not be the most compactly representable, since others might have a smaller repertoire. However, in most implementations base strings are likely to be more space efficient than extended strings. %---------------------------------------------------------------------- \section{Streams and System I/O} A lot of the work of ensuring that a Common LISP implementation operates correctly in a multiple coded character set environment must be performed by the I/O interface. The system I/O interface, abstracted in Common LISP as streams, is responsible for ensuring that text input from outside LISP is properly mapped into character objects internally, and that the inverse mapping is performed on output. It is beyond the scope of a language definition to specify the details of this operation, but options are specified which allow runtime indication from the user as to what coded character sets a stream uses, and how the mappings should be done. It is expected that implementations will provide reasonable defaults and invocation options to accommodate desired use at an installation. One keyword argument is proposed as an addition to {\clkwd open}: \begin{itemize} \item {\clkwd :external-coded-character-format} whose value would be: \begin{itemize} \item A name or list of names indicating an implementation recognized scheme for representing 1 or more coded character sets. \footnote{ For example, the so/si convention used by IBM on 370 machines could be selected by a list including the name {\clkwd :ibm-shift-delimited}. The run-encoding convention defined by XEROX could be selected by {\clkwd :xerox-run-encoded}. The convention based on ASCII which uses leading bit patterns to distinguish two-byte codes from one-byte codes could be selected by {\clkwd :ascii-high-byte-delimited}. } As many coded character set names must be provided as the implementation requires for that external coding convention. \footnote{ For example, if {\clkwd :ibm-shift-delimited} were the argument, two coded character set specifiers would have to be provided. } \end{itemize} \end{itemize} These arguments are provided for input, output, and bidirectional streams. It is an error to try to write a character other than a member of the specified coded character sets to a stream. (This excludes the \#$\backslash${\clkwd Newline} character. Implementations must provide appropriate line division behavior for all character streams.) An implementation supporting multiple coded character sets must allow for the external representation of characters to be separately (and perhaps multiply) specified to {\clkwd open}, since there can be circumstances under which more than one external representation for characters is in use, or more than one coded character set is mixed together in an external representation convention. In addition to supporting conversion at the system interface, the language must allow user programs to determine how much space data objects will require when output in whichever external representations are available. The new function {\clkwd external-coded-string-length} takes a character or string object as its required argument. It also takes an optional {\em output-stream}. It returns the number of implementation-defined representation units \footnote{ Often the same as the storage width of a base character, usually a byte. } required to externally store that object, using the representation convention associated with the stream. If the object cannot be represented in that convention, the function returns {\clkwd nil}. This function is necessary to determine if strings can be written to fixed length fields in databases or terminal screen templates. Note that this function does not address the problem of calculating screen width of strings printed in proportional fonts. Related to the I/O interface, we also introduce the function {\clkwd char-ccs-value} which takes a character object and a coded character set name (eg. {\clkwd :ISO8859/1-1987}) and returns the encoding of the character within the coded character set. %---------------------------------------------------------------------- %----------------------------------------------------------------------  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 22 Feb 89 17:00:34 EST Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Feb 89 13:32:43 PST Date: Wed, 22 Feb 89 02:09:18 PST From: Thom Linden To: Common Lisp mailing Message-ID: <890222.020918.baggins@almvma> Subject: Jan 1 cs proposal comments >> From: "David A. Moon" >> Subject: Comments on the Character proposal dated January 1, 1989 >> >> Page 6 -- *all-registry-names* should be renamed to >> *all-character-registry-names*; the word "registry" by itself >> is too general. I made this change to the latest version of the proposal. >> >> Page 9 -- the fourth bullet requires a defined total ordering of all >> characters. This seems unnecessary, and is impossible to implement in any >> system (such as Symbolics Genera) that allows dynamic addition of character >> registries by third-party software vendors and by users; in such a system >> character codes have to be allocated dynamically and therefore their order >> cannot be fixed ahead of time. You are quite right. This bullet is removed. >> >> Page 9 -- This says an implementation must define the result of >> standard-char-p on the characters it supports. I think that is incorrect. >> Common Lisp fully defines the result of standard-char-p, which is NIL >> for all characters added by an implementation. Right. This bullet is removed. >> >> Page 14 -- This EXTERNAL-WIDTH function probably should be part of a >> database facility or a terminal screen template facility; I'm not sure it >> is useful by itself. Also note that its result is only meaningful with >> respect to a specific state of the stream. To give two examples, with the >> SO/SI encoding the answer can vary by 1 depending on whether the stream is >> already shifted into the correct state for the first character; with the >> universal encoding Symbolics uses, the answer can vary by a lot depending on >> whether the character repertoires appearing in the string have been used >> earlier on the same stream (and hence have been assigned encoding numbers). >> Because of this dependence on the state of the stream, I cannot think of >> any correct use of EXTERNAL-WIDTH that does not involve immediately >> outputting the string to the stream. Therefore I believe the same effect >> can be achieved without adding any new functions, by calling FILE-POSITION, >> outputting to the stream, calling FILE-POSITION again, and subtracting. If >> you still want to propose this feature, you should change the name: use >> "length" instead of "width", since that's the word Common Lisp always uses, >> and use a name that relates to the :EXTERNAL-CODE-FORMAT option to OPEN; >> for example, STRING-LENGTH-IN-EXTERNAL-CODE-FORMAT or >> EXTERNAL-CODED-STRING-LENGTH. I changed the name to EXTERNAL-CODED-STRING-LENGTH. The description already contained a comment regarding current state. Actually, I favored the STREAM-INFO proposal which was voted down. This is much less ambitious but I still feel more useful than actually forcing I/O, backing up and rewriting. It's also not clear that your alternative has the same effect since it seems that some unwanted side-effects would occur such as premature appearance on a display screen. >> >> Page 24 -- I can't figure out what you intend the meaning of SIMPLE-STRING >> to be. Your report mostly does not mention it, but it doesn't say to >> remove it either. If I have correctly correlated page 24 back to CLtL, you >> are defining SIMPLE-STRING to be synonymous with SIMPLE-GENERAL-STRING. >> Maybe what you really meant, though, was what you said in November you >> would do, which was to make SIMPLE-STRING mean (AND STRING SIMPLE-ARRAY), >> in other words a union of several subtypes. This is particular confusing >> because Common Lisp uses the name SIMPLE-VECTOR to mean what you might call >> a simple general vector, that is, (SIMPLE-ARRAY T 1) rather than >> (SIMPLE-ARRAY * 1). Here are my suggestions for what to do with the >> various names for string subtypes: >> >> STRING As a union of all strings, this is fine. >> GENERAL-STRING I think (VECTOR CHARACTER) is just as good. >> BASE-STRING I think (VECTOR BASE-CHARACTER) is just as good. >> SIMPLE-STRING Should mean (SIMPLE-ARRAY CHARACTER 1). >> SIMPLE-BASE-STRING This is fine. >> SIMPLE-GENERAL-STRING This name is horrible, use SIMPLE-STRING. >> >> My rationale for these suggestions largely comes from thinking about >> which of these names would ever be used in type declarations and about >> how these names relate to the other names already in Common Lisp. To >> repeat older comments: >> >> Pages 19 and 20 introduce a new type named simple-base-string, in addition >> to simple-string. If you think about how simple-string would be used for >> compiler optimization, it makes sense for simple-string to be the name for >> the single simplest representation, rather than a name for a whole family >> of representations that would have to be discriminated at run time. Thus >> what you call simple-base-string should be called simple-string, and what >> you call simple-string should just be called (simple-array character (*)). >> This would not be an incompatible change in the meaning of simple-string. >> Simple-string would be analogous to simple-vector. >> >> I changed my mind slightly on that and now claim that while SIMPLE-STRING >> should still be a single representation, not a union, it should be the >> representation that can hold all characters. This is both because of the >> principle that correct programs should be easier to write than >> extra-efficient programs, and because of the powerful analogy with the name >> SIMPLE-VECTOR. Then the name SIMPLE-BASE-STRING is also needed for >> convenient type declarations of the more efficient but less functional >> string representation. That name is good, by analogy to BASE-CHARACTER. >> >> Adopting the above suggestions helps you decide what to do about the >> SCHAR, SBCHAR, and SGCHAR mess. First of all, you only need two functions, >> not three, because there are only two specified specialized representations. >> SCHAR should be for what I've called SIMPLE-STRING, SBCHAR should be >> for SIMPLE-BASE-STRING, and SGCHAR is not needed. (In fact I would prefer >> to remove all of the specialized versions of AREF from the language, in >> favor of THE or type declarations, but I know that would only pass over >> some peoples' dead bodies so I won't push it.) >> >> In case you are wondering, I have no quarrel with the name BASE-CHARACTER >> and would not want to see it removed. I guess I differ from Larry here, >> unless I erred when I wrote down his comments during the meeting. The statement on p24 making SIMPLE-STRING == (SIMPLE-ARRAY CHARACTER (*)) was in error. P25 had it right. Since we changed SCHAR to accept all simple strings there is no reason for SGCHAR and SBCHAR and these are eliminated. String and simple-string are (more clearly I hope) defined as union types. I've changed the terminology from 'for the purpose of declaration' to 'for object creation'. Perhaps there is a better term but the effect seems to be identical to what you suggest. That is, correct, portable programs are easier to write, one simply uses string and simple-string. More efficient, less portable programs need to specify the specialized subtype(s) explicitly. Having both string and simple-string defined as union types seems desirable on the basis of uniformity. Of the type abbreviations I think BASE-CHARACTER is the most useful and GENERAL-STRING, SIMPLE-BASE-STRING and SIMPLE-GENERAL-STRING less so. I don't believe that any of these really complicate the language. >> >> Page 25 -- The discussion of STRING and SIMPLE-STRING thinks that there >> is a distinction between declaration and discrimination, but Common Lisp >> no longer has such a distinction. Even when Common Lisp did have such >> a distinction, the meanings for declaration stated here were incorrect. I changed this to 'object creation'. Perhaps there is a better term. >> >> Page 29 -- *all-character-registry-names* has to be a variable, not a >> constant, to accomodate systems (such as Symbolics Genera) that allows >> dynamic addition of character registries by third-party software vendors >> and by users. Right, I made this change. >> >> Page 35 -- CHAR-REGISTRY should be renamed to CHAR-REGISTRY-NAME, so that >> if at some later time character registry objects are added, there is no >> possibility of confusion about whether this function returns a name or >> an object. Right, I made this change. >> >> Page 40 -- the default :ELEMENT-TYPE for OPEN cannot be BASE-CHARACTER. I >> think this was discussed at the X3J13 meeting. The report suffers from a >> confusion between two meanings of BASE-CHARACTER: the character type >> implemented most efficiently by the Lisp, and the character type most >> natural to the file system. These are not always the same. Furthermore, >> in a network-based system that supports multiple file systems equally >> (Symbolics Genera is an example), each file system might have a different >> natural character type. BASE-CHARACTER should just mean the character type >> implemented most efficiently by the Lisp. The default for :ELEMENT-TYPE >> has two viable choices that I can see, and maybe you should just propose >> both and let people vote: >> >> (1) CHARACTER. This matches the behavior of MAKE-STRING and friends, >> adheres to the principle that writing correct programs should be easier >> than writing extra-efficient programs (since making a program correct >> requires making every part of it correct, while making a program >> efficient only requires improving the bottlenecks), and doesn't cost >> anything in implementations that don't have extended characters. >> >> (2) The most natural type for the particular pathname being opened. >> In some systems this would be a constant, and in a subset of those >> systems this would be BASE-CHARACTER, however in general this might >> depend on the host, device, or even type fields of the pathname, >> and might also depend on information stored in the file system. >> In general this would always be an (improper) supertype of >> BASE-CHARACTER, but it's probably a bad idea to make that a requirement, >> as some file systems might not be able to implement it conveniently. >> Again this doesn't cost anything in implementations that don't have >> extended characters. The discussion on p16 about the base coded character set efficiency has been removed. The default element-type now states that it is implementation defined as character or a subtype of character. >> >> The relationship of option 2 to :ELEMENT-TYPE :DEFAULT (a feature that >> already exists in Common Lisp) needs to be clarified. Perhaps they >> are the same. The same? I don't understand. For example, I can imagine the element-type default as base-character and the external format defaulted to either an ASCII or EBCDIC encoding. >> >> Also the following promise from 14 November did not show up in the report: >> >> >> There should be a name for the "natural" encoding and there should be a >> >> specification of the properties of the natural encoding that a programmer >> >> can rely on. Suggestions for the name include :BASE, :NATURAL, and >> >> :INTERCHANGE. The definition probably involves the concept of data >> >> interchange with non-Lisp programs on the same system. >> >> This will be added to the revision. I lied. No one came up with the 'properties' of such an encoding. Do you have some text to suggest? >> >> Appendix B -- I disagree with the way you've used deprecation. I'll >> comment on each individual point: >> - I see no justification for deprecating STANDARD-CHAR. >> - I agree that STRING-CHAR should be deprecated, not deleted nor kept. >> - I think fonts and bits should be removed outright, not deprecated, >> because no portable program could possibly be using them. >> - I think the CHAR-INT function needs to be kept, although the INT-CHAR >> function should go away. This is for hashing. See comments below >> on character attributes. I've removed Appendix B and mention of deprecation. STANDARD-CHAR is simply (characterp :standard). String-char is back in as implementation-defined either character or base-character (and maybe should be voted as a deprecated type). >> >> No particular page -- the use of strings for naming registries, labelling >> characters, and naming external code formats is objectionable. Nothing >> else in Common Lisp is named by strings. Use of strings might lead to >> efficiency problems. We feel that keyword symbols are the appropriate >> objects to use for these three kinds of names. I changed these back to symbols. >> >> No particular page -- We agree with the deprecation or deletion of the two >> particular character attributes defined by CLtL, but not with the >> deprecation of the whole concept of character attributes. In fact on page >> 20 you say "characters are uniquely distinguished by their codes," which >> makes it impossible to have character attributes at all. The language must >> define how conforming programs should be written so that they will work >> both in implementations with character attributes and in implementations >> without them. For example, the value of (eql x (code-char (char-code x))) >> is unspecified. Another thing that needs to be said is that the exact >> character operations (char=, string=, etc.) respect all character >> attributes, while the inexact character operations (char-equal, >> string-equal, etc.) respect or ignore each character attribute in an >> implementation-defined but consistent fashion. Some of what you say on >> page 44 about attributes in general needs to be part of the spec, not >> deprecated. I would retain everything on that page except for INT-CHAR and >> the last bullet (referring to bits and fonts), and I would add a remark >> that FIND-SYMBOL and INTERN respect character attributes. If you want, >> perhaps I or someone else at Symbolics can provide exact text for what >> to say about character attributes that you could insert into your report. I moved the attribute list previously in Appendix B back into the description of characters. Let me know what text you would like to see for FIND-SYMBOL and INTERN and I'll add it to the list. >> No particular page -- On the subject of defining character registries in a >> separate document, and relating them to ISO standards for character >> encoding: I think that's fine. I don't see anything wrong with introducing >> the concept of character registry and the requirement that each character >> object relates to exactly one registry. However, I think the somewhat >> random list of character registries on pages 7-8 and again on page 21 does >> not belong in the language specification. Even the names of the Right. They are not part of the Common LISP standard. The revised document is considerably clearer in this regards. >> standardized character registries belong in the character registry >> standard, not in the Common Lisp language standard. I'm confused about the >> meaning of BASE, STANDARD, and CONTROL as character registry names; these >> are mentioned in your report but not explained very well. If these are >> character registries that are required to exist in all Common Lisp >> implementations, then unlike the others they do belong in the Common Lisp >> language standard, not in the character registry standard. By CONTROL, I meant a registry which contains the various control codes mentioned in the various ISO coded character set standards. BASE and STANDARD are no longer mentioned here. They are allowed as Common LISP repertiore names in characterp and the character type specifier. >> >> At the meeting there was some discussion about the issue of enumerating all >> characters in a character registry. People claimed incorrectly that it was >> impossible. In fact it's possible to do this, with questionable >> efficiency, by the following program: >> >> (dotimes (code char-code-limit) >> (let ((char (code-char code))) >> (when char >> (when (eq (char-registry-name char) desired-registry-name) >> ... process this char ...)))) >> >> Of course you have to change the EQ to EQUALP if you continue to use >> strings to name character registries. For more efficiency, you could add >> a way to iterate over all the codes in one character registry, but I think >> that is unnecessary. >> >> >> TYPOS: Right. I've made these corrections. >> >> 25 -- base-string is missing from the Table 4-1 amendment. >> >> 26 -- general-string is not an array of BASE characters, also the first >> two paragraphs under A.4.8 are garbled (the two separate sentences for >> strings for symbols got smushed together). >> >> 37 -- This says the default for the :ELEMENT-TYPE option to MAKE-STRING >> is SIMPLE-STRING. Actually it's CHARACTER. >>  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 22 Feb 89 16:57:27 EST Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Feb 89 13:33:35 PST Date: Wed, 22 Feb 89 03:48:56 PST From: Thom Linden To: Common Lisp mailing Message-ID: <890222.034856.baggins@almvma> Subject: cs proposal comments >> From: sandra%defun@cs.utah.edu (Sandra J Loosemore) >> Subject: comments on character proposal >> >> Getting rid of bits and fonts (section 2.1) seems like a very good >> idea to me. I would argue for deleting these "features" completely >> instead of merely deprecating them, because there now seems to be >> general agreement that the whole idea was brain-damaged in the first >> place, plus it's just about impossible to use them portably anyway >> (since implementations are free not to support them). Deprecating the >> features would simply perpetuate the current sad state of affairs in >> to the ANSI standard. I deleted Appendix B from the proposal. The attribute check list is incorporated into the character chapter as implementation dependent. >> >> I am not at all sure why we need to standardize the idea of character >> registries at all, much less state that a character can only belong to >> one registry, or define a standard set of registries. What does having >> registries buy the user, other than perhaps a way to test whether a >> character belongs to one or not? Why isn't it sufficient just to say >> that implementations can support extended characters, and leave it at >> that? The registries are introduced to allow an application a portable way to name, compose and decompose characters. Currently, there is no way to do this in any programming language. There are other possiblities. For example, simply labeling all characters uniquely; another to define a universal coded character set and use these numeric codes to 'name' characters. I don't think using numbers for naming characters is useful since I'll always forget what character 34539 actually is! Registries seem to provide a framework for useful categorization of characters. It also avoids the current mess that the coded character set standards are in. >> >> I'm confused about how you propose to handle characters that appear in >> more than one character repetoire, and whether characters with accent >> marks are considered distinct from characters without accents. For >> example, is the French "C" with a cedilla distinct from a normal >> French "C", and is that distinct from the standard-char "C"? We handle characters that appear in more than one repertoire by using registries. No character appears in more than one registry. The constituents of the registries are not defined by Common LISP. I believe that in most environments today, it is recognized that characters with accents are distinct from their vanilla cousins. As we have proposed registries, they contain semantically distinct characters. >> >> The way the document describes things now, it seems like the Common >> Lisp standard would have to include a statement of exactly what >> characters belong in each of the standard registries listed in section >> 2.2. Otherwise, implementors might go off and define their own >> character registries that happen to include some characters that ought >> to belong in one of these standard registries. For instance, the machine >> I happen to be sitting in front of right now supports an 8-bit native >> character set, and it seems perfectly reasonable for a Lisp runnning on >> this machine to include all 256 characters in its base character set, >> but some of those might actually be supposed to live off in some other >> registry. The registries are independent of any coded character sets. In particular, coded character sets are not registries. Your base repertoire (set of 256 characters) are possibly drawn from several registries. You are correct that lacking an international standard (or ANSI one), for character registries an implementation could define the a single registry containing all supported characters. It could also define NO registries and use only the conventional naming of characters. I expect an implementation taking the no-cost way would choose the second approach. On the other hand, an implementation supporting text processing across international boundaries is more likely to define some reasonable registries eg. Latin, Greek, etc.. >> >> Also in section 2.2, why is it necessary for there to be a total >> ordering, or even a partial ordering, of all characters? It seems >> like CHAR< and friends are not very useful except when comparing base >> characters anyway. It seems like it would difficult to get things >> like the Spanish N-with-twiddle character to collate correctly anyway, >> given the constraints you have put on how character codes are derived >> and the requirement that CHAR< be just like < on the char-codes. Right. This is now removed. >> >> It doesn't seem like STANDARD-CHAR-P belongs in the list of character >> predicates on p. 9, since no extended characters can possibly be >> STANDARD-CHAR-P anyway. Right. This is now removed. >> >> The stuff in section 2.3 seems mostly reasonable to me. It's not really >> clear why you need GENERAL-STRING (as distinct from STRING) and >> SIMPLE-GENERAL-STRING (as distinct from SIMPLE-STRING). Again, some >> rationale would be helpful. GENERAL-STRING means (VECTOR CHARACTER). This is not the meaning of STRING (a union type). I agree that GENERAL-STRING is not much of an abbreviation over (VECTOR CHARACTER). It still seems somewhat more mnemonic. >> >> In section 2.4, the general idea of specifying an external character >> encoding to OPEN seems reasonable. However, I'm confused by the >> business about having more than one coded character set mixed >> together. If a character appears in more than one coded character >> set, which encoding takes precedence? It seems like this has not been >> well thought-out. Also, seeing as though we have just voted down a >> proposal to add an EXTERNAL-WIDTH function, it seems like a very bad >> idea to lump it in here. Some encoding schemes allow disjoint coded characters sets to coexist. That is, a given character would appear on one but not the other. For example, a ISO8859/1 coded character set could coexist with a coded character set for Chinese. As for External-width, it was part of our subcommittee discussions long before the recent stream proposal. It will be a separate item in the list of character votes. >> >> Now for the general comments. >> >> One thing that is not clear to me from reading this document is how >> much of it has already been standardized by ISO. I share Larry's >> concern that we might standardize one thing, and then have ISO go off >> and standardize something completely different. I think it's a >> mistake to try to second-guess what ISO might do. The revision might make this clearer. I think this is a red herring anyhow. As a programming language committee we need to specify what is useful in the context of LISP. We can't expect a coded character set committee to figure it out. On the other hand, we can influence what gets standardized by defining our framework. The ISO Prolog std committee is interested in what we define. >> >> I am also concerned about trying standardize things that have not yet >> been implemented. I think it's a mistake to try to do language design >> in a standards committee. >> >> Finally, I have some problems with the presentation of your proposal. >> One problem, as I mentioned at the meeting, is that you've made it an >> all-or-nothing package, and I can't vote for the whole thing because >> there are some parts of it that do not seem appropriate, even though I >> would support some of the other changes individually. The other >> problem is that Appendix A is virtually unreadable. Some of the >> conceptual changes involve wording changes to several passages, and I >> know that there are some other changes in the appendix that are not >> mentioned in the introductory blurb at all. Is it totally impossible >> to recast the changes in standard cleanup format proposals? The >> advantage of that format is that it presents more context, including a >> clear statement of why the existing CLtL behavior is "broken" and a >> rationale for the proposed change. There will be several votes regarding this proposal. I don't intend to rewrite the document in a cleanup format. >> >> I know that we adopted things like the CLOS document that were >> presented as single mega-proposals, but those were primarily additions >> to the language and what you are proposing is essentially a large >> number of incompatible changes. I'm having a hard time identifying >> what all of those changes are. >> Actually, I don't think it's as large a number of changes as you imply. In any case, the vote split should help this out.  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 22 Feb 89 16:51:11 EST Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Feb 89 13:34:09 PST Date: Wed, 22 Feb 89 04:51:15 PST From: Thom Linden To: Common Lisp mailing cc: David Gray Message-ID: <890222.045115.baggins@almvma> Subject: cs proposal comments >> From: David N Gray >> Subject: characters proposal >> >> I have read the documented titled "Extensions to Common LISP to Support >> International Character Sets" dated January 1, 1989, and feel that it is >> not much of an improvement over what we saw in October. Following are >> some random comments about things I happened to notice; this is not >> intended to be a comprehensive analysis. >> >> First, documents such as this ought to be labelled with an X3J13 >> document number so that they can be referred to conveniently and >> unambiguously. >> >> "Appendix A" and "Appendix B" really should be chapters 3 and 4 since >> they are an essential part of the proposal, rather than being an >> appendage to it. Appendix B is now eliminated. Appendix A is really quite unlike chapters 1 and 2 in structure. >> >> Page 7 says that the definition of semi-standard-characters "is replaced >> by a more uniform approach with introduction of the Control Character >> Registry". Do you really mean that it _will_be_ replaced when the >> Control Character Registry is defined in some subsequent document? I >> certainly don't see anything in this document that could be considered a >> replacement. Yes. The revision is clearer on this. This document does not define names for character registries nor their constituents. >> >> This whole concept of registries seems rather strange. Is the intent >> that the alphabetic characters of the standard characters are to be in >> the "Latin" registry while characters such as period and comma are in >> "Latin-Punctuation"? Is #\NEWLINE in the "Control" registry? Where do >> the digits go -- "Mathematical"?. Is #\- a "Latin-Punctuation" or a >> "Mathematical"? Which registry is #\SPACE in? Now tell me what to do >> with the extra non-Latin alphabetic characters used in Sweedish? Does >> that require a separate registry for just those additional characters? >> Now we have simple text in a single language using characters from at >> least four different registries. Do you really think it possible to >> agree on a "fixed", non-extensible, set of "Mathematical" or "Pattern" >> characters? Actually, I believe the simplicity of the registry framework will make agreement easy. Currently, members of the coded character set committees spend vast amounts of time lobbying for inclusion of their favorite character(s) in the 'popular' coded character set standard. The effect of not being included means fewer installations will support their native language properly. I think a new group, hopefully formed within programming languages, should define the registries rather than the existing coded character set committees. There is no competition between registries, ie. no advantage of one over another. What this committee has to agree upon is 1) a useful set of registry names and 2) definition of the constituents of each registry. The only argument I would anticipate is "are the semantics of my alpha the same or different from your alpha" type debates. By the way, the registries are fixed only in that a Common LISP implementation cannot modify the standard definitions. This guarantees an application program can portably rely on the composition and decomposition functions to establish the availability of any given character. >> >> Page 9 says that an implementation needs to specify the total ordering >> of characters within each registry, but what about the ordering of >> characters in different registries? Is that completely undefined? There is no ordering of characters within registries. As mentioned in Hawaii, the character index (a number) was changed to character label (a symbol) throughout the proposal. >> >> Page 25 section A.4.5 doesn't specify the syntax of a registry name; did >> you intend it to be a string? These have been changed to be symbols. >> >> Page 27 has an example using (typep x '(character "standard")) but >> page 25 said that had to be a registry name; "standard" is not a >> registry name. The revision is clearer on this. character and characterp can take registry names, :base or :standard. The meaning of :base and :standard is defined by Common LISP as the base character repertoire and standard character repertoire respectively. >> >> Page 29 - *ALL-REGISTER-NAMES* -- a list of strings? Now a list of symbols. >> >> Page 33 -- FIND-CHAR -- does the index value within a registry have any >> portable meaning? Is that intended to be specified for the standard >> registries? Is "base" supposed to be accepted here? If not, how can >> you access the base codes? If I were going to construct a character >> from its index value, it would be more meaningful to use an index >> relative to some coded character set rather than these registries. FIND-CHAR takes a character label and registry. These are specified by the registry standard. Base is not a registry name. We have introduced a new function CHAR-CCS-VALUE which takes a character object and a coded character set name (a symbol) and returns the encoding of the character in the coded character set. >> >> Page 36, the last sentence doesn't make sense. The default for >> :ELEMENT-TYPE would have to be either CHARACTER or BASE-CHARACTER. Right. I've made this change. >> >> Page 37, section A.22.1.1 -- the part being deleted specifies the >> meaning of including tab and form-feed characters in a Common Lisp >> source file; do you really intend that to not have any standard meaning? >> If my editor uses tabs for indenting, does that mean that the resulting >> source file is not a standard-conforming program? That really depends on the definition of a conforming program. Is this defined yet? >> >> Page 38, the first reference to p360 of CLtL should be p353; the >> deletion here says that there shall not be any standard name for the >> commonly used control characters such as tab and form-feed. That still >> seems wrong to me. >> >> Page 41, what's the point of appending "ccs" to the name of the >> standard? Presumably that stands for "coded character set", but isn't >> that adequately implied by the fact that this string will follow the >> keyword :EXTERNAL-CODE-FORMAT ? The use of "default" seems odd since >> :DEFAULT is used everywhere else. This was to distinguish from someone referring to the set of characters (repertoire) represented in a given coded character set. Ie. to distinguish ISO8859/6-1987 coded character set from the ISO8850/6-1987 repertoire. In fact, the ISO coded character set standards never refer to repertoires in isolation (ie. without the codes), so I've dropped the 'ccs'. Also, "default" is now :DEFAULT as elsewhere. >> >> I agree with Moon that the excising of bits and fonts has not been done >> carefully enough for them to be compatible extensions. >> I think the new revision takes care of this by incorporating the attribute list as part of the language proper (ie. not deprecated).  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 22 Feb 89 16:47:02 EST Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Feb 89 13:32:18 PST Date: Wed, 22 Feb 89 00:36:12 PST From: Thom Linden To: Common Lisp mailing Message-ID: <890222.003612.baggins@almvma> Subject: cs proposal revisions I've sent out a revised cs document for your review. It reflects a number of your comments from the Hawaii meeting and over the net. The larger changes were: -- The 'depreciated' appendix is eliminated. I re-introduced the list of implementation-dependent attribute support items into the document proper. The other items in appendix B were simply eliminated. -- The functions sbchar and sgchar are eliminated. In general, the comments indicate that case discrimination by schar does not introduce a substantial performance penalty. -- Character registry names and constituents are NOT defined by Common LISP. The proposal defines only the framework for composition and decomposition of characters. The naming of registries and definition of their constituents are left completely as an ISO standard activity. -- Character registry names and constituents are NOT defined by Common LISP. The proposal defines only the framework for composition and decomposition of characters. The naming of registries and definition of their constituents are left completely as an ISO standard activity. Please send comments to the X3J13 mailing list. If time allows and it seems needed, I will send out another revision in time to allow for an actual vote at the March meeting. A straw vote list will follow shortly. Regards, Thom  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 16 Feb 89 17:08:35 EST Received: from NMFECC.ARPA by SAIL.Stanford.EDU with TCP; 16 Feb 89 14:02:13 PST Received: from tuva.sainet.mfenet by ccc.mfenet with Tell via MfeNet ; Thu, 16 Feb 89 13:59:35 PST Date: Thu, 16 Feb 89 13:59:35 PST From: POTHIERS%TUVA.SAINET.MFENET@NMFECC.ARPA Message-Id: <890216135935.20800216@NMFECC.ARPA> To: common-lisp@sail.stanford.edu Subject: WANTED: Code Profiler Date: Thu, 16-FEB-1989 14:57 MST X-VMS-Mail-To: ARPA%"common-lisp%sail.stanford.edu@nmfecc.arpa" Does any have (or know where I can get) a Common Lisp code profiler? I'm interested in something that will give be number of invocations &/or caller &/or timing information for all the user written functions in my system. I would really like to profile some of our stuff that uses PCL too. I don't mind having to hack at the code some to make it suit my puposes. Please direct any advice to me directly at: pothiers%tuva.sainet@nmfecc.arpa Thanks, Steve Pothier Science Applications International Corporation Tucson  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 12 Feb 89 17:00:27 EST Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 12 Feb 89 13:53:21 PST Received: from BOBOLINK.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 537730; Sun 12-Feb-89 16:51:03 EST Date: Sun, 12 Feb 89 16:50 EST From: Kent M Pitman Subject: File I/O To: dg1v+@andrew.cmu.edu cc: Common-Lisp@SAIL.Stanford.EDU In-Reply-To: Message-ID: <890212165052.3.KMP@BOBOLINK.SCRC.Symbolics.COM> There's not a separate function. Most reader functions (eg, READ and READ-LINE) take an eof-p argument that says whether to signal an error if you read past the end of a file. The default is T, but if you specify NIL then you can specify a value to be returned when you have read past the end of file. Here are some examples: (DEFUN SHOW-FILE (FILE) (WITH-OPEN-FILE (STREAM FILE) (DO ((LINE (READ-LINE STREAM NIL NIL) (READ-LINE STREAM NIL NIL))) ((NOT LINE)) (WRITE-LINE LINE)))) (DEFUN GET-LISP-FORMS-FROM-FILE (FILE) (WITH-OPEN-FILE (STREAM FILE) (LET ((UNIQUE (LIST NIL))) (DO ((FORM (READ STREAM NIL UNIQUE) (READ STREAM NIL UNIQUE)) (RESULT '() (CONS FORM RESULT))) ((EQ FORM UNIQUE) (NREVERSE RESULT)))))) By the way, the Common-Lisp list is -very- large (probably many hundreds of recipients) and probably overkill for this kind of simple `how to' question. Contacting your vendor or individually contacting just about any one of the people you see contributing to this list would probably have gotten you the same answer at lower cost to the community.  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 12 Feb 89 16:35:36 EST Received: from po2.andrew.cmu.edu by SAIL.Stanford.EDU with TCP; 12 Feb 89 13:26:36 PST Received: by po2.andrew.cmu.edu (5.54/3.15) id for common-lisp@sail.stanford.edu; Sun, 12 Feb 89 16:21:55 EST Received: via switchmail; Sun, 12 Feb 89 16:21:35 -0500 (EST) Received: from kennettsq.andrew.cmu.edu via qmail ID ; Sun, 12 Feb 89 16:17:52 -0500 (EST) Received: from kennettsq.andrew.cmu.edu via qmail ID ; Sun, 12 Feb 89 16:16:30 -0500 (EST) Received: from Version.6.25.N.CUILIB.3.45.SNAP.NOT.LINKED.kennettsq.andrew.cmu.edu.rt.r3 via MS.5.6.kennettsq.andrew.cmu.edu.rt_r3; Sun, 12 Feb 89 16:16:29 -0500 (EST) Message-Id: Date: Sun, 12 Feb 89 16:16:29 -0500 (EST) From: David Greene X-Andrew-Message-Size: 402+0 To: +dist+/afs/andrew.cmu.edu/usr0/postman/DistLists/Andrew-Hints.dl@andrew.cmu.edu, bb+andrew.programming.lisp@andrew.cmu.edu, common-lisp@sail.stanford.edu, Outbound News Subject: File I/O I am trying to read various types of ascii data files into a standard common LISP program (Ibuki Common Lisp). There are a number of ways to create streams and such, but how can I test for an End Of File so that my read won't return an error? I have gone through Steele, but apparently the appropriate function has eluded me. Thanks for any help. -David dg1v@andrew.cmu.edu dpg@isl1.ri.cmu.edu  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 7 Feb 89 21:10:27 EST Received: from vaxa.isi.edu by SAIL.Stanford.EDU with TCP; 7 Feb 89 17:58:33 PST Posted-Date: Tue, 07 Feb 89 17:55:58 PST Message-Id: <8902080156.AA04251@vaxa.isi.edu> Received: from LOCALHOST by vaxa.isi.edu (5.59/5.51) id AA04251; Tue, 7 Feb 89 17:56:01 PST To: common-lisp@sail.stanford.edu From: goldman@vaxa.isi.edu Subject: &environment extent Date: Tue, 07 Feb 89 17:55:58 PST Sender: goldman@vaxa.isi.edu Can someone tell me whether the ENVIRONMENT object passed as the second parameter to a macro-expander function is specified to have DYNAMIC or INDEFINITE extent? Thanks, Neil  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 24 Jan 89 16:30:54 EST Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 24 Jan 89 13:16:41 PST Date: Tue, 24 Jan 89 11:16:13 PST From: Thom Linden To: Common Lisp mailing Message-ID: <890124.111613.baggins@almvma> Subject: character proposal Below are the minimum changes going into the character proposal. This list was presented on a foil at the Hawaii meeting. -- some minor corrections (bugs) -- the registry document will: -- be an appendix to the standard, not required -- reference appropriate ISO standards (only) -- character 'index' will be changed to character 'label' throughout (labels are strings, not numeric values) -- add the function char-ccs-value which takes a character object and coded character set name and returns the value of the character within that encoding. -- add the function sgchar which is similar to sbchar but takes a general-string object. -- modify char-name, name-char, and #\name to accept character names of the form 'registry:label' As decided at the Hawaii meeting, the proposal will be voted on at the March meeting (rather than by mail). In particular, there were requests to partition the vote. If you have any specific partition you would favor (eg. vote on external-width separately), please let us know. (Note, the ballot is being split, not the document). I'll probably send out a few informal ballots to get a feeling for the partitioning as well identifing the controversial items. I will be revising the document and encourage any comments to be sent immediately. I hope to send out a revision at the end of this week. If there are additional comments (on the revision) I will repeat this process if necessary to obtain a 'clean' version for the March vote.  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 23 Jan 89 12:57:49 EST Received: from Think.COM by SAIL.Stanford.EDU with TCP; 23 Jan 89 09:45:51 PST Received: from fafnir.think.com by Think.COM; Mon, 23 Jan 89 12:22:12 EST Return-Path: Received: from verdi.think.com by fafnir.think.com; Mon, 23 Jan 89 12:42:29 EST Received: by verdi.think.com; Mon, 23 Jan 89 12:41:17 EST Date: Mon, 23 Jan 89 12:41:17 EST From: Guy Steele Message-Id: <8901231741.AA12978@verdi.think.com> To: krulwich-bruce@yale.arpa Cc: Common-Lisp@sail.stanford.edu In-Reply-To: Bruce Krulwich's message of Thu, 12 Jan 89 12:49:19 EST <8901121749.AA18587@ATHENA.CS.YALE.EDU> Subject: Order of "processing" of arguments Date: Thu, 12 Jan 89 12:49:19 EST From: Bruce Krulwich Michael Greenwald said: >Actually, CLtL pg 61 says that the arguments and parameters are >processed in order, from left to right. I don't know if "processed" >implies "evaluated", but I always assumed (perhaps incorrectly) it did. Guy Steele replied: >I interpret this as referring to how the (fully evaluated) arguments >are processed during lambda-binding, not to the order in which argument >forms in a function call are evaluated. After all, the arguments referred >to on page 61 might have come from a list given to APPLY, rather then >from EVAL on a function call. This seems vacuous to me. Does this mean that an implementation in which a procedure entry point knows how many arguments its receiving (through a link table, for instance, or simply by counting its arguments) and constructs a REST-arg list before doing the binding of the required args is in violation of CLtL because it processes the rightmost argument before the leftmost one?? I hope not. It seems to me that as long as actuals and formals are matched up correctly there is no reason for the language specification to specify the order of the "processing" of the arguments during lambda-binding. Bruce Krulwich krulwich@cs.yale.edu The implementation need only behave "as if" it processed them in that way. It is always permissible to dye one's whiskers green and then to use so large a fan that they cannot be seen. --Guy  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 23 Jan 89 10:49:06 EST Received: from crash.cs.umass.edu ([128.119.40.235]) by SAIL.Stanford.EDU with TCP; 23 Jan 89 07:40:00 PST Received: from vax3.cs.umass.edu by crash.cs.umass.edu (5.59/Ultrix2.0-B) id AA01738; Mon, 23 Jan 89 10:40:36 est Message-Id: <8901231540.AA01738@crash.cs.umass.edu> Date: Sun, 22 Jan 89 12:18 EST From: ELIOT@cs.umass.EDU Subject: Logical Operations on Numbers To: Common-Lisp@sail.stanford.EDU X-Vms-To: IN%"Common-Lisp@sail.stanford.edu" From: IN%"seb1525@draper.COM" 20-JAN-1989 12:12 Subj: LOGICAL OPERATIONS ON NUMBERS From: SEB1525@mvs.draper.COM To: common-lisp@SAIL.STANFORD.EDU Isn't SUBSETP of A and B, where A and B are integers, implementable by (eql B (logior A B))? Yes. It is also (zerop (logandc2 A B)). However, these expressions are not efficient. Suppose that the sets are large, hundreds or thousands of elements. In this case A and B are going to be 'bignums', certainly not FIXNUMS. Assuming that bignums are implemented so they can be operated on as a series of chunks we have: A = a1'a2'a3'...'an B = b1'b2'b3'...'bn SUBSET implemented directly is: (AND[i=1..n] (%subset ai bi)) Where %subset operates on a single chunk. AND[i=1..n] is a short circuit logical 'AND' operation. This requires n operations, and allocates NO new memory. SUBSET implemented as (eql B (logior A B)) requires n operations to compute the logior, perhaps some overhead to normalize the new bignum, plus n more operations to compute EQL, plus it allocates memory to store max(A, B). SUBSET implemented as (zerop (logandc2 A B)) requires n operations to compute the logandc2, perhaps some overhead to normalize the new bignum, and 1 operation to compute zero, plust it allocates memory to store the intermediate result. This is slightly more efficient, because ZEROP is microscopically more efficient that EQL. (ZEROP is FALSE for all bignums. EQL has to look at them.) Furthermore the intermediate result may be smaller than the intermediate result in the logior construct. I draw three conclusions from this. (1) A naive computation of subset in Common Lisp requires approximately twice the number of operations than it should, due to missing primitives. (2) An optimizing compiler should try to recognize the SUBSET operation and compile it efficiently. This may be difficult, because there are at least two (and probably many) ways to encode this operation using the existing Common Lisp primitives. (3) For logical completeness, clarity and consistency of source programs and efficient implementation of some algorithms Common Lisp should be extended to include a logical subset operation for integers. The name subsetp is already used (CLtL P.279) so I propose LOGSUBSETP with semantics equivalent to: (defun logsubsetp (a b) (zerop (logandc2 a b)))  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 20 Jan 89 15:53:11 EST Received: from decwrl.dec.com by SAIL.Stanford.EDU with TCP; 20 Jan 89 12:34:07 PST Received: from decvax.dec.com by decwrl.dec.com (5.54.5/4.7.34) for common-lisp@sail.stanford.edu; id AA14762; Fri, 20 Jan 89 12:31:42 PST Received: from thor.prime.com by cvbnet.prime.com (3.2/SMI-3.2) id AA04473; Fri, 20 Jan 89 15:28:47 EST Received: from giants.uucp by thor.prime.com (3.2/3.14) id AA08730; Fri, 20 Jan 89 15:22:47 EST Return-Path: Received: by giants.uucp (3.2/SMI-3.0DEV3) id AA04231; Fri, 20 Jan 89 15:23:00 EST Date: Fri, 20 Jan 89 15:23:00 EST From: decvax!cvbnet!giants.prime.com!tbardasz@decwrl.dec.com (Ted Bardasz) Message-Id: <8901202023.AA04231@giants.uucp> To: cvbnet!decvax!decwrl!SAIL.STANFORD.EDU!common-lisp@decwrl.dec.com Subject: New Mail Address Please change my mail address to: decvax!tbardasz@cvbnet.prime.com Thanks, Ted Bardasz  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 20 Jan 89 11:52:45 EST Received: from RELAY.CS.NET by SAIL.Stanford.EDU with TCP; 20 Jan 89 08:34:43 PST Received: from relay2.cs.net by RELAY.CS.NET id aj08866; 20 Jan 89 8:51 EST Received: from draper.com by RELAY.CS.NET id aa25245; 20 Jan 89 8:46 EST Return-path: seb1525@mvs.draper.com Received: from MVS.DRAPER.COM by DRAPER.COM via TCP; Fri Jan 20 08:16 EST Received: by MVS.DRAPER.COM with NETMAIL; FRI, 20 JAN 89 08:16 EST Date: FRI, 20 JAN 89 08:13 EST From: SEB1525@mvs.draper.com Subject: LOGICAL OPERATIONS ON NUMBERS To: common-lisp@SAIL.STANFORD.EDU Reply-to: seb1525@draper.com X-MVS-to: common-lisp@sail.stanford.edu Message-Id: Isn't SUBSETP of A and B, where A and B are integers, implementable by (eql B (logior A B)) ?  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 19 Jan 89 13:12:10 EST Received: from crash.cs.umass.edu ([128.119.40.235]) by SAIL.Stanford.EDU with TCP; 19 Jan 89 09:52:58 PST Received: from vax3.cs.umass.edu by crash.cs.umass.edu (5.59/Ultrix2.0-B) id AA06071; Thu, 19 Jan 89 12:53:38 est Message-Id: <8901191753.AA06071@crash.cs.umass.edu> Date: Thu, 19 Jan 89 12:53 EST From: ELIOT@cs.umass.EDU Subject: Logical Operations on Numbers To: Common-Lisp@sail.stanford.EDU X-Vms-To: IN%"Common-Lisp@sail.stanford.edu" Rather than duplicating the subset operations on both numbers and bitvectors why not make the generic arithmetic routines accept bitvectors as non-negative integers? The generic arithmetic routines already handle so many types that one more can't make a big difference. Many numeric routines make sense and extend the functionality if they could be applied to bitvector For example, ZEROP (null set), =, /=, logXXX, boole,lognot, logtest, logcount, integer-length. However, bitvectors have never been very useful to me because of the restriction that the bit-XXX operations can only work on arrays of the same DIMENSIONS. If this were relaxed and the smaller array was treated as being extended with zeros I think they would be much more useful. Chris Eliot  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 16 Jan 89 20:08:04 EST Received: from ALDERAAN.SCRC.Symbolics.COM ([128.81.41.109]) by SAIL.Stanford.EDU with TCP; 16 Jan 89 16:50:50 PST Received: from GANG-GANG.SCRC.Symbolics.COM by ALDERAAN.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 260510; Mon 16-Jan-89 19:48:35 EST Date: Mon, 16 Jan 89 19:48 EST From: Glenn S. Burke Subject: Logical Operations on Numbers To: jonl@lucid.com, ELIOT@cs.umass.EDU cc: common-lisp@sail.stanford.EDU In-Reply-To: <8901150357.AA10940@bhopal> Message-ID: <19890117004819.7.GSB@ANNISQUAM.SCRC.Symbolics.COM> Date: Sat, 14 Jan 89 19:57:36 PST From: Jon L White For what it's worth, Johan DeKleer at Xerox PARC asked for just such functionality back in 1984. I don't remember what the public response was then -- I seem to remember everyone trying to write clever, short code sequences that would "do the trick". But the gaping hole still stands. If just one more person seems to thinkg it is a good idea, then that should carry much force with the X3J13 committee. -- JonL -- Logical subsetp is in the critical path of a peephole optimizer i just wrote. For efficiency reasons, though, the code was reorganized so that in any given instantiation the size was fixed, and some complicated macrology ends up turning things into manipulation of lists of fixnums. (here's an application for the fixnum type which can enhance portability...) I could see having this kind of predicate for both integers and bitvectors, and could imagine a sufficiently powerful compiler handling it (and other bit and logical operations) efficiently.  Received: from MCC.COM (TCP 1200600076) by AI.AI.MIT.EDU 15 Jan 89 14:50:53 EST Received: from AMMON.ACA.MCC.COM by MCC.COM with TCP/SMTP; Fri 13 Jan 89 12:40:11-CST Date: Fri, 13 Jan 89 12:59 CST From: Clive B. Dawson Subject: Test message Message-ID: <19890113185930.2.CLIVE@AMMON.ACA.MCC.COM> bcc: CLisp-Dis@MCC.COM This message is just a test of a future common lisp mail distribution point from MCC.COM. Please disregard this message.  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 14 Jan 89 23:21:14 EST Received: from lucid.com by SAIL.Stanford.EDU with TCP; 14 Jan 89 19:59:57 PST Received: from bhopal ([192.9.200.13]) by heavens-gate.lucid.com id AA03650g; Sat, 14 Jan 89 19:55:17 PST Received: by bhopal id AA10940g; Sat, 14 Jan 89 19:57:36 PST Date: Sat, 14 Jan 89 19:57:36 PST From: Jon L White Message-Id: <8901150357.AA10940@bhopal> To: ELIOT@cs.umass.EDU Cc: common-lisp@sail.stanford.EDU In-Reply-To: ELIOT@cs.umass.EDU's message of Thu, 12 Jan 89 15:31 EST <8901122046.AA00579@crash.cs.umass.edu> Subject: Logical Operations on Numbers For what it's worth, Johan DeKleer at Xerox PARC asked for just such functionality back in 1984. I don't remember what the public response was then -- I seem to remember everyone trying to write clever, short code sequences that would "do the trick". But the gaping hole still stands. If just one more person seems to thinkg it is a good idea, then that should carry much force with the X3J13 committee. -- JonL --  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 13 Jan 89 20:24:21 EST Received: from fs3.cs.rpi.edu by SAIL.Stanford.EDU with TCP; 13 Jan 89 17:09:41 PST Received: by fs3.cs.rpi.edu (5.54/1.2-RPI-CS-Dept) id AA11907; Fri, 13 Jan 89 20:05:15 EST Date: Fri, 13 Jan 89 17:30:43 EST From: harrisr@turing.cs.rpi.edu (Richard Harris) Received: by turing.cs.rpi.edu (4.0/1.2-RPI-CS-Dept) id AA05864; Fri, 13 Jan 89 17:30:43 EST Message-Id: <8901132230.AA05864@turing.cs.rpi.edu> To: RWK%FUJI.ILA.Dialnet.Symbolics.Com@riverside.scrc.symbolics.com, common-lisp@sail.stanford.edu Subject: Re: commonlisp types Date: Mon, 9 Jan 89 21:42 EST From: Robert W. Kerns OK, next question: Does it open-code or otherwise optimize TYPEP, or just call TYPEP on the list? KCL just calls TYPEP on the list. One of the patches that I have made to KCL is a version of TYPEP that open-codes when the type is a constant, but my patch has the bug. Richard Harris  Received: from life.ai.mit.edu (TCP 20015020120) by AI.AI.MIT.EDU 13 Jan 89 03:35:52 EST Received: from ai.ai.mit.edu by life.ai.mit.edu; Fri, 13 Jan 89 03:25:00 EST Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Jan 89 23:41:27 PST Date: Thu, 12 Jan 89 22:17:09 PST From: Thom Linden To: Common Lisp mailing Message-Id: <890112.221709.baggins@almvma> Subject: cs proposal part 3 of 3 %---------------------------------------------------------------------- \setcounter{section}{9} \section{Symbols} % 10 %---------------------------------------------------------------------- \edithead {\csdag 3 (p163)} \editstart \\ \bf replace & \cltxt It is ordinarily not permitted to alter a symbol's print name. \\ \bf with & \cltxt It is an error to alter a symbol's print name. \editend \setcounter{subsection}{1} \subsection{The Print Name} % 10.2. \edithead {\csdag 5 (p168)} \editstart \\ \bf replace & \cltxt It is an extremely bad idea \\ \bf with & \cltxt It is an error and an extremely bad idea \editend %---------------------------------------------------------------------- \setcounter{section}{10} \section{Packages} % 11 %---------------------------------------------------------------------- \setcounter{subsection}{6} \subsection{Package System Functions and Variables} % 11.7. \edithead {\csdag 31 (p184,intern)} \editstart \\ \bf append & \cltxt All strings, base and extended, are acceptable {\em string} arguments. \editend %---------------------------------------------------------------------- \setcounter{section}{12} \section{Characters} % 13 %---------------------------------------------------------------------- \edithead {\csdag 6 after (p233)} \editstart \\ \bf insert & \cltxt {\clkwd char-code-limit} [{\clkwd Constant}] \\ & The value of {\clkwd char-code-limit} is a non-negative integer that is the upper exclusive bound on values produced by the function {\clkwd char-code}, which returns the {\em code} of a given character; that is, the values returned by {\clkwd char-code} are non-negative and strictly less than the value of {\clkwd char-code-limit}. There may be unassigned codes between 0 and {\clkwd char-code-limit} which are not legal arguments to {\clkwd code-char}. \\ & \cltxt {\clkwd char-index-limit {\em registry}} [{\clkwd Function}] \\ & This function returns a non-negative integer that is the upper exclusive bound on values produced by the function {\clkwd char-index} for the specified {\em registry}. There may be unsupported index values between 0 and {\clkwd char-index-limit}, i.e. {\clkwd (find-char {\em registry index})} may return {\clkwd nil}. \\ & \cltxt {\clkwd *all-registry-names*} [{\clkwd Constant}] \\ & The value of {\clkwd *all-registry-names*} is a list of all character registry names supported by the implementation. Only Common LISP Character Registry names or implementation defined character registries may be included in this list. In particular, "base" and "standard" are not character registry names and must not be included. \editend \setcounter{subsection}{0} \subsection{Character Attributes} % 13.1. \edithead {\csdag delete entire section (p233)} \editstart \editend \setcounter{subsection}{1} \subsection{Predicates on Characters} % 13.2. \edithead {\csdag 3 (p234)} \editstart \\ \bf replace & \cltxt argument is a "standard character" that is, an object of type {\clkwd standard-char}. Note that any character with a non-zero {\em bits} or {\em font} attribute is non-standard. \\ \bf with & \cltxt argument is one of the Common LISP standard character subrepertoire. \editend \\ \edithead {\csdag 4 (p234)} \editstart \\ \bf delete & \cltxt Note that any character with non-zero ... \editend \\ \edithead {\csdag 6 (p235)} \editstart \\ \bf replace & \cltxt Of the standard characters all but \#$\backslash${\clkwd Newline} are graphic. The semi-standard characters \#$\backslash${\clkwd Backspace}, \#$\backslash${\clkwd Tab}, \#$\backslash${\clkwd Rubout}, \#$\backslash${\clkwd Linefeed}, \#$\backslash${\clkwd Return}, and \#$\backslash${\clkwd Page} are not graphic. \\ \bf with & \cltxt Of the standard characters all but \#$\backslash${\clkwd Newline} are graphic. \editend \\ \edithead {\csdag 7 (p235)} \editstart \\ \bf delete & \cltxt Programs may assume that graphic ... \editend \\ \edithead {\csdag 8 (p235)} \editstart \\ \bf delete & \cltxt Any character with a non-zero bits... \editend \\ \edithead {\csdag 9 (p235)} \editstart \\ \bf delete & \cltxt {\clkwd string-char-p} ... \editend \\ \edithead {\csdag 10 (p235)} \editstart \\ \bf delete & \cltxt The argument {\em char} must be ... \editend \\ \edithead {\csdag 13 (p235)} \editstart \\ \bf replace & \cltxt If a character is alphabetic, then it is perforce graphic. Therefore any character with a non-zero bits attribute cannot be alphabetic. Whether a character is alphabetic is may depend on its font number. \\ \bf with & \cltxt If a character is alphabetic, then it is perforce graphic. \editend \\ \edithead {\csdag 22 (p236)} \editstart \\ \bf replace & \cltxt If a character is either uppercase or lowercase, it is necessarily alphabetic (and therefore is graphic, and therefore has a zero bits attribute). However, it is permissible in theory for an alphabetic character to be neither uppercase nor lowercase (in a non-Roman font, for example). \\ \bf with & \cltxt If a character is either uppercase or lowercase, it is necessarily alphabetic (and therefore is graphic). \editend \\ \edithead {\csdag 25 (p236)} \editstart \\ \bf replace & \cltxt The argument {\em char} must be a character object, and {\em radix} must be a non-negative integer. If {\em char} is not a digit of the radix specified \\ \bf with & \cltxt The argument {\em char} must be in the standard character subrepertoire and {\em radix} must be a non-negative integer. If {\em char} is not a standard character or is not a digit of the radix specified \editend \\ \edithead {\csdag 51 (p237)} \editstart \\ \bf delete & \cltxt If two characters have the same bits ... \editend \\ \edithead {\csdag 52 (p237)} \editstart \\ \bf replace & \cltxt If two characters differ in any attribute (code, bits, or font), then they are different. \\ \bf with & \cltxt If the codes of two characters differ, then they are different. \editend \\ \edithead {\csdag 94 (p239)} \editstart \\ \bf replace & \cltxt The predicate {\clkwd char-equal} is like {\clkwd char=}, and similarly for the others, except according to a different ordering such that differences of bits attributes and case are ignored, and font information is taken into account in an implementation dependent manner. \\ \bf with & \cltxt The predicate {\clkwd char-equal} is like {\clkwd char=}, and similarly for the others, except according to a different ordering such that differences of case are ignored. \editend \\ \edithead {\csdag 97 example (p239)} \editstart \\ \bf delete & \cltxt {\clkwd (char-equal \#$\backslash$A \#$\backslash$Control-A) is true} \editend \\ \edithead {\csdag 98 (p239)} \editstart \\ \bf delete & \cltxt The ordering may depend on the font ... \editend \setcounter{subsection}{2} \subsection{Character Construction and Selection} % 13.3. \edithead {\csdag 3 (p239)} \editstart \\ \bf replace & \cltxt The argument {\em char} must be a character object. {\clkwd char-code} returns the {\em code} attribute of the character object; this will be a non-negative integer less than the (normal) value \\ \bf with & \cltxt The argument {\em char} must be a character object. {\clkwd char-code} returns the {\em code} of the character object; this will be a non-negative integer less than the value \editend \\ \edithead {\csdag 4 (p240)} \editstart \\ \bf delete & \cltxt {\clkwd char-bits } ... \editend \\ \edithead {\csdag 5 (p240)} \editstart \\ \bf delete & \cltxt The argument {\em char} must be ... \editend \\ \edithead {\csdag 6 (p240)} \editstart \\ \bf delete & \cltxt {\clkwd char-font } ... \editend \\ \edithead {\csdag 7 (p240)} \editstart \\ \bf delete & \cltxt The argument {\em char} must be ... \editend \\ \edithead {\csdag 8 (p240)} \editstart \\ \bf replace & \cltxt {\clkwd code-char {\em code} \&optional {\em (bits 0) (font 0)} [{\em Function}]} \\ \bf with & \cltxt {\clkwd code-char {\em code} [{\em Function}]} \editend \\ \edithead {\csdag 9 (p240)} \editstart \\ \bf replace & \cltxt All three arguments must be non-negative integers. If it is possible in the implementation to construct a character object whose code attribute is {\em code}, whose bits attribute is {\em bits}, and whose font attribute is {\em font}, then such an object is returned; \\ \bf with & \cltxt The argument must be a non-negative integer. If it is possible in the implementation to construct a character object identified by {\em code}, then such an object is returned; \editend \\ \edithead {\csdag 10 (p240)} \editstart \\ \bf replace & \cltxt For any integers, {\em c, b,} and {\em f}, if {\clkwd (code-char {\em c b f})} is \\ \bf with & \cltxt For any integer, {\em c}, if {\clkwd (code-char {\em c})} is \editend \\ \edithead {\csdag 12 (p240)} \editstart \\ \bf delete & \cltxt {\clkwd (char-bits (code-char } ... \editend \\ \edithead {\csdag 13 (p240)} \editstart \\ \bf delete & \cltxt {\clkwd (char-font (code-char } ... \editend \\ \edithead {\csdag 14 (p240)} \editstart \\ \bf delete & \cltxt If the font and bits attributes ... \editend \\ \edithead {\csdag 15 (p240)} \editstart \\ \bf delete & \cltxt {\clkwd (char= (code-char (char-code ...} \editend \\ \edithead {\csdag 16 (p240)} \editstart \\ \bf delete & \cltxt is true. \editend \\ \edithead {\csdag 17 (p240)} \editstart \\ \bf delete & \cltxt {\clkwd make-char} ... \editend \\ \edithead {\csdag 18 (p240)} \editstart \\ \bf delete & \cltxt The argument {\em char} must be ... \editend \\ \edithead {\csdag 19 (p240)} \editstart \\ \bf delete & \cltxt If {\em bits} or {\em font} are zero ... \editend \\ \edithead {\csdag 19 (p240)} \editstart \\ \bf append & \cltxt {\clkwd find-char} {\em index registry} [{\em Function}] \\ & {\clkwd find-char} returns a character object. {\em index} is an integer value uniquely identifying a character within the character registry name {\em registry}. If the implementation does not support the specified character, {\clkwd nil} is returned. \editend \setcounter{subsection}{3} \subsection{Character Conversions} % 13.4. \edithead {\csdag 8 (p241)} \editstart \\ \bf replace & \cltxt {\clkwd char-upcase} returns a character object with the same font and bits attributes as {\em char}, but with possibly a different code attribute. \\ \bf with & \cltxt {\clkwd char-upcase} returns a character object with possibly a different code. \editend \\ \edithead {\csdag 10 (p241)} \editstart \\ \bf replace & \cltxt Similarly, {\clkwd char-downcase} returns a character object with the same font and bits attributes as {\em char}, but with possibly a different code attribute. \\ \bf with & \cltxt Similarly, {\clkwd char-downcase} returns a character object with possibly a different code. \editend \\ \edithead {\csdag 12 (p241)} \editstart \\ \bf delete & \cltxt Note that the action of ... \editend \\ \edithead {\csdag 13 (p241)} \editstart \\ \bf replace & \cltxt {\clkwd digit-char {\em weight} \&optional ({\em radix} 10) ({\em font} 0) [{\em Function}]} \\ \bf with & \cltxt {\clkwd digit-char {\em weight} \&optional ({\em radix} 10) [{\em Function}]} \editend \\ \edithead {\csdag 14 (p241)} \editstart \\ \bf replace & \cltxt All arguments must be integers. {\clkwd digit-char} determines whether or not it is possible to construct a character object whose font attribute is {\em font}, and whose {\em code} \\ \bf with & \cltxt All arguments must be integers. {\clkwd digit-char} determines whether or not it is possible to construct a character object whose {\em code} \editend \\ \edithead {\csdag 15 (p242)} \editstart \\ \bf replace & \cltxt {\clkwd digit-char} cannot return {\clkwd nil} if {\em font} is zero, {\em radix} \\ \bf with & \cltxt {\clkwd digit-char} cannot return {\clkwd nil}. {\em radix} \editend \\ \edithead {\csdag 22 (p242)} \editstart \\ \bf delete & \cltxt Note that no argument is provided for ... \editend \\ \edithead {\csdag 23 through 30 (p242, char-int, int-char)} \editstart \\ \bf delete & \cltxt {\clkwd char-int} {\em char} \editend \\ \edithead {\csdag 32 (p242)} \editstart \\ \bf replace & \cltxt All characters that have zero font and bits attributes and that are non-graphic \\ \bf with & \cltxt All characters that are non-graphic \editend \\ \edithead {\csdag 33 (p243)} \editstart \\ \bf replace & \cltxt The standard newline and space characters have the respective names {\clkwd Newline} and {\clkwd Space}. The semi-standard characters have the names {\clkwd Tab, Page, Rubout, Linefeed, Return,} and {\clkwd Backspace}. \\ \bf with & \cltxt The standard newline and space characters have the respective names {\clkwd Newline} and {\clkwd Space}. \editend \\ \edithead {\csdag 35 (p243)} \editstart \\ \bf delete & \cltxt {\clkwd char-name} will only locate "simple" ... \editend \\ \edithead {\csdag 36 (p243)} \editstart \\ \bf append & \cltxt {\clkwd name-char} may accept other names for characters in addition to those returned by {\clkwd char-name}. \editend \\ \edithead {\csdag 36 (p243)} \editstart \\ \bf append & \cltxt {\clkwd char-registry} {\em char} [{\em Function}] \\ & {\clkwd char-registry} returns a string value representing the character registry to which {\em char} belongs. \editend \\ \edithead {\csdag 36 (p243)} \editstart \\ \bf append & \cltxt {\clkwd char-index} {\em char} [{\em Function}] \\ & {\clkwd char-index} returns an integer value representing the character (registry) index of {\em char}. \editend \setcounter{subsection}{4} \subsection{Character Control-Bit Functions} % 13.5. \edithead {\csdag delete entire section (p243)} \editstart \editend %---------------------------------------------------------------------- \setcounter{section}{13} \section{Sequences} % 14 %---------------------------------------------------------------------- \setcounter{subsection}{0} \subsection{Simple Sequence Functions} % 14.1 \edithead {\csdag 21 (p249,make-sequence)} \editstart \\ \bf append & \cltxt If type {\clkwd string} is specified, the result is equivalent to {\clkwd make-string}. \editend %---------------------------------------------------------------------- \setcounter{section}{17} \section{Strings} % 18 %---------------------------------------------------------------------- \edithead {\csdag 1 (p299)} \editstart \\ \bf replace & \cltxt Specifically, the type {\clkwd string} is identical to the type {\clkwd (vector string-char),} which in turn is the same as {\clkwd (array string-char (*))}. \\ \bf with & \cltxt Specifically, the type {\clkwd string} is a subtype of {\clkwd vector} and consists of vectors specialized by subtypes of {\clkwd character}. \editend \setcounter{subsection}{0} \subsection{String Access} % 18.1. \edithead {\csdag 3 (p300)} \editstart \\ \bf insert & \cltxt {\clkwd sbchar} {\em simple-base-string index} [{\em Function}] \editend \\ \edithead {\csdag 4 (p300)} \editstart \\ \bf replace & \cltxt character object. (This character will necessarily satisfy the predicate {\clkwd string-char-p}). \\ \bf with & \cltxt character object. \editend \\ \edithead {\csdag 9 (p300)} \editstart \\ \bf replace & \cltxt {\clkwd setf} may be used with {\clkwd char} to destructively replace a character within a string. \\ \bf with & \cltxt {\clkwd setf} may be used with {\clkwd char} to destructively replace a character within a string. The new character must be of a type which can be stored in the string; it is an error otherwise. \editend \\ \edithead {\csdag 10 (p300)} \editstart \\ \bf insert & \cltxt For {\clkwd sbchar}, the string must be a simple base string. The new character must be of a type which can be stored in the string; it is an error otherwise. \editend \setcounter{subsection}{2} \subsection{String Construction and Manipulation} % 18.3. \edithead {\csdag 2 (p302)} \editstart \\ \bf replace & \cltxt {\clkwd make-string {\em size} \&key :initial-element [{\em Function}]} \\ \bf with & \cltxt {\clkwd make-string {\em size} \&key :initial-element :element-type [{\em Function}]} \editend \\ \edithead {\csdag 3 (p302,make-string)} \editstart \\ \bf replace & \cltxt This returns a string (in fact a simple string) of length {\em size}, each of whose characters has been initialized to the {\clkwd :initial-element} argument. If an {\clkwd :initial-element} argument is not specified, then the string will be initialized in an implementation-dependent way. \\ \bf with & \cltxt This returns a string of length {\em size}, each of whose characters has been initialized to the {\clkwd :initial-element} argument. If an {\clkwd :initial-element} argument is not specified, then the string will be initialized in an implementation-dependent way. The {\clkwd :element-type} argument names the type of the elements of the string; a string is constructed of the most specialized type that can accommodate elements of the given type. If {\clkwd :element-type} is omitted, the type {\clkwd simple-string} is the default. \editend \\ \edithead {\csdag 5 (p302,make-string)} \editstart \\ \bf replace & \cltxt A string is really just a one-dimensional array of "string characters" (that is, those characters that are members of type {\clkwd string-char}). More complex character arrays may be constructed using the function {\clkwd make-array}. \\ \bf with & \cltxt More complex character arrays may be constructed using the function {\clkwd make-array}. \editend \\ \edithead {\csdag 29 (p304,make-string)} \editstart \\ \bf replace & \cltxt If {\em x} is a string character (a character of type {\clkwd string-char}), then \\ \bf with & \cltxt If {\em x} is a character, then \editend %---------------------------------------------------------------------- \setcounter{section}{21} \section{Input/Output} % 22 \setcounter{subsection}{0} \subsection{Printed Representation of LISP Objects} % 22.1. \setcounter{subsubsection}{0} \subsubsection{What the Read Function Accepts} % 22.1.1. \edithead {\csdag Table 22-1: Standard Character Syntax Types (p336)} \editstart \\ \bf delete entry & \cltxt {\clkwd } {\em whitespace} \\ & {\clkwd } {\em whitespace} \\ & {\clkwd } {\em constituent} \\ & {\clkwd } {\em whitespace} \\ & {\clkwd } {\em constituent} \\ & {\clkwd } {\em whitespace} \editend \setcounter{subsubsection}{1} \subsubsection{Parsing of Numbers and Symbols} % 22.1.2. \edithead {\csdag Table 22-3: Standard Constituent Character Attributes (p340)} \editstart \\ \bf delete entry & \cltxt {\clkwd } {\em illegal} \\ & {\clkwd } {\em illegal} \\ & {\clkwd } {\em illegal} \\ & {\clkwd } {\em illegal} \\ & {\clkwd } {\em illegal} \\ & {\clkwd } {\em illegal} \editend \setcounter{subsubsection}{3} \subsubsection{Standard Dispatching Macro Character Syntax} % 22.1.4. \edithead {\csdag Table 22-4: Standard \# Macro Character Syntax (p352)} \editstart \\ \bf delete entry & \cltxt {\clkwd \#} {\em signals error} \\ & {\clkwd \#} {\em signals error} \\ & {\clkwd \#} {\em signals error} \\ & {\clkwd \#} {\em signals error} \\ & {\clkwd \#} {\em signals error} \\ & {\clkwd \#} {\em undefined} \editend \\ \edithead {\csdag 8 (p353)} \editstart \\ \bf replace & \cltxt The following names are standard across all implementations: \\ \bf with & \cltxt All non-graphic characters, including extended characters, are uniquely named in an implementation-dependent manner. The following names are standard across all implementations: \editend \\ \edithead {\csdag 11 through 18 inclusive delete (p353)} \editstart \\ \bf delete & \cltxt The following names are semi-standard; ... \editend \\ \edithead {\csdag 20 through 26 inclusive delete (p354)} \editstart \\ \bf delete & \cltxt The following convention is used in implementations ... \editend \\ \edithead {\csdag 108 (p360)} \editstart \\ \bf replace & \cltxt {\clkwd \#, \#, \#, \#, \#} \\ \bf with & \cltxt {\clkwd \#, \#} \editend \setcounter{subsubsection}{4} \subsubsection{The Readtable} % 22.1.5. \edithead {\csdag 3 (p360)} \editstart \\ \bf replace & \cltxt Even if an implementation supports characters with non-zero {\em bits} and {\em font} attributes, it need not (but may) allow for such characters to have syntax descriptions in the readtable. However, every character of type {\clkwd string-char} must be represented in the readtable. \\ \bf with & \cltxt All base and extended characters are representable in the readtable. \editend \setcounter{subsubsection}{5} \subsubsection{What the Print Function Produces} % 22.1.6. \edithead {\csdag 13 (p366)} \editstart \\ \bf replace & \cltxt is used. For example, the printed representation of the character \#$\backslash$A with control and meta bits on would be \#$\backslash${\clkwd CONTROL-META-A}, and that of \#$\backslash$a with control and meta bits on would be \#$\backslash${\clkwd CONTROL-META-$\backslash$a}. \\ \bf with & \cltxt is used (see 22.1.4). \editend \setcounter{subsection}{2} \subsection{Output Functions} % 22.3. \setcounter{subsubsection}{0} \subsubsection{Output to Character Streams} % 22.3.1. \edithead {\csdag 26 (p384)} \editstart \\ \bf replace & \cltxt ({\em not} the substring delimited by {\clkwd :start} and {\clkwd :end}). \\ \bf with & ({\em not} the substring delimited by {\clkwd :start} and {\clkwd :end}). Only characters which are members of the coded character set(s) associated with the output stream or \#$\backslash${\clkwd Newline} are valid to be written; it is an error otherwise. All character streams must provide appropriate line division behavior for \#$\backslash${\clkwd Newline}. \editend \\ \edithead {\csdag 27 after (p384)} \editstart \\ \bf insert & \cltxt {\clkwd external-width} {\em object} \&{\clkwd optional} {\em output-stream} [{\em Function}] \\ & {\clkwd external-width} returns the number of host system base character units required for the object on the output-stream. If not applicable to the output stream, the function returns {\clkwd nil}. This number corresponds to the current state of the stream and may change if there has been intervening output. If the output stream is not specified {\clkwd *standard-output*} is the default. \editend \footnote{ The X3 J13 proposal STREAM-INFO: ONE-DIMENSIONAL-FUNCTIONS modified to include these semantics is an acceptable alternative to the {\clkwd external-width} function proposed here.} \setcounter{subsubsection}{2} \subsubsection{Formatted Output to Character Streams} % 22.3.3. \edithead {\csdag 23 delete example (p387)} \editstart \\ \bf delete & \cltxt {\clkwd (format nil "Type} $\tilde{ }$ {\clkwd :C to $\tilde{ }$ :A."} . . . \editend \\ \edithead {\csdag 66 (p389)} \editstart \\ \bf replace & \cltxt $\tilde{ }${\clkwd :C} spells out the names of the control bits and represents non-printing characters by their names: {\clkwd Control-Meta-F, Control-Return, Space}. This is a "pretty" format for printing characters. \\ \bf with & \cltxt $\tilde{ }${\clkwd :C} represents non-printing characters by their names: {\clkwd Newline, Space}. This is a "pretty" format for printing characters. \editend %---------------------------------------------------------------------- %---------------------------------------------------------------------- \setcounter{section}{22} \section{File System Interface} % 23 \setcounter{subsection}{1} \subsection{Opening and Closing Files} % 23.2. \edithead {\csdag 2 (p418)} \editstart \\ \bf replace & \cltxt {\clkwd open {\em filename} \&key :direction :element-type} {\clkwd :if-exists :if-does-not-exist} [{\em Function}] \\ \bf with & \cltxt {\clkwd open {\em filename} \&key :direction :element-type} {\clkwd :external-code-format} {\clkwd :if-exists :if-does-not-exist} [{\em Function}] \editend \\ \edithead {\csdag 11 (p419)} \editstart \\ \bf replace & \cltxt {\clkwd string-char} \\ & The unit of transaction is a string-character. The functions {\clkwd read-char} and/or {\clkwd write-char} may be used on the stream. \\ \bf with & \cltxt The default value of {\clkwd :element-type} is an implementation-defined subtype of character. \\ & {\clkwd base-character} \\ & The unit of transaction is a base character. The functions {\clkwd read-char} and/or {\clkwd write-char} may be used on the stream. This is the default. \editend \\ \edithead {\csdag 16 (p419)} \editstart \\ \bf replace & \cltxt {\clkwd character} \\ & The unit of transaction is any character, not just a string-character. The functions {\clkwd read-char} and/or {\clkwd write-char} may be used on the stream. \\ \bf with & \cltxt {\clkwd character} \\ & The unit of transaction is any character. The functions {\clkwd read-char} and/or {\clkwd write-char} may be used on the stream. \editend \\ \edithead {\csdag 19 after (p420)} \editstart \\ \bf insert & \cltxt {\clkwd :external-code-format} \\ & This argument specifies a string or list of string(s) indicating an implementation recognized scheme for representing 1 or more coded character sets with non-homogeneous codes. \\ & The default value is "default" and is implementation defined but must include the base characters. \\ & As many coded character set names must be provided as the implementation requires for that external coding convention. \\ & References to standard ISO coded character set names must include the full ISO reference number and approval year followed by "ccs". The following are valid ISO reference names: "ISO8859/1-1987ccs", "ISO6937/2-1983ccs", "iso646-1983ccs", etc.. All implementation recognized schemes are formed from {\clkwd standard-p} characters. Within scheme names, alphabetic case is ignored. \editend %---------------------------------------------------------------------- %---------------------------------------------------------------------- \chapter{Deprecated Language Features} The X3 J13 Character subcommittee proposal will cause certain areas of \cite{steele84} to become obsolete. We have included in this appendix, potential additions to the standard document for areas we feel are important in the interest of compatibility. The character subcommittee recommends that the X3 J13 committee as a whole adopt a policy regarding obsolescence. This policy may be to keep the obsolete function in the interest of compatibility for existing applications, or to drop the obsolete function completely. One compromise is to document these functions in an appendix to the Common LISP Standard. The appendix would be for informational use only and not a part of the standard definition. %---------------------------------------------------------------------- \setcounter{section}{1} \section{Data Types} % 2 %---------------------------------------------------------------------- \setcounter{subsection}{14} \subsection{Overlap, Inclusion, and Disjointness of Types} % 2.15. \edithead {\csdag 14 (p34)} \editstart \\ \bf deprecated & \cltxt The type {\clkwd standard-char} is a subtype of {\clkwd base-character}; The type {\clkwd string-char} is implementation defined as either {\clkwd base-character} or {\clkwd character}. \editend %---------------------------------------------------------------------- \setcounter{section}{12} \section{Characters} % 13 %---------------------------------------------------------------------- \edithead {\csdag throughout} \editstart \\ \bf deprecated & \cltxt Earlier versions of Common LISP incorporated {\em font} and {\em bits} as attributes of character objects. There are several functions which were removed from the language or modified by this proposal. The deleted functions and constants include: \begin{itemize} \item char-font-limit \item char-bits-limit \item int-char \item char-int \item char-bits \item char-font \item make-char \item char-control-bit \item char-meta-bit \item char-super-bit \item char-hyper-bit \item char-bit \item set-char-bit \end{itemize} \editend \\ \edithead {\csdag (p233)} \editstart \\ \bf deprecated & \cltxt If supported by an implementation these attributes may effect the action of selected functions. In particular, the following effects noted: \\ & \begin{itemize} \item Attributes, such as those dealing with how the character is displayed or its typography, are not part of the character code. For example, bold-face, color or size are not considered part of the character code. \item If two characters differ in any attributes, then they are not {\clkwd char=}. \item If two characters have identical attributes, then their ordering by {\clkwd char}$<$ is consistent with the numerical ordering by the predicate $<$ on their code attributes. (Similarly for {\clkwd char}$>$, {\clkwd char}$>=$ and {\clkwd char}$<=$.) \item The effect, if any, on {\clkwd char-equal} of each attribute has to be specified as part of the definition of that attribute. \item The effect of {\clkwd char-upcase} and {\clkwd char-downcase} is to preserve attributes. \item The function {\clkwd char-int} is equivalent to {\clkwd char-code} if no attributes are associated with the character object. \item The function {\clkwd int-char} is equivalent to {\clkwd code-char} if no attributes are associated with the character object. \item It is implementation dependent whether characters within double quotes have attributes removed. \item It is implementation dependent whether attributes are removed from symbol names by {\clkwd read}. \item Even if an implementation supports characters with non-zero {\em bits} and {\em font} attributes, it need not (but may) allow for such characters to have syntax descriptions in the readtable. \end{itemize} \editend %---------------------------------------------------------------------- \begin{thebibliography}{wwwwwwww 99} \bibitem[Ida87]{ida87} M. Ida, et al., {\em JEIDA Common LISP Committee Proposal on Embedding Multi-Byte Characters }, ANSI X3J13 document 87-022, (1987). \bibitem[ISO 646]{iso646} ISO, {\em Information processing -- ISO 7-bit coded character set for information interchange }, ISO (1983). \bibitem[ISO 4873]{iso4873} ISO, {\em Information processing -- ISO 8-bit code for information interchange -- Structure and rules for implementation }, ISO (1986). \bibitem[ISO 6937/1]{iso6937/1} ISO, {\em Information processing -- Coded character sets for text communication -- Part 1: General introduction }, ISO (1983). \bibitem[ISO 6937/2]{iso6937/2} ISO, {\em Information processing -- Coded character sets for text communication -- Part 2: Latin alphabetic and non-alphabetic graphic characters }, ISO (1983). \bibitem[ISO 8859/1]{iso8859/1} ISO, {\em Information processing -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1 }, ISO (1987). \bibitem[ISO 8859/2]{iso8859/2} ISO, {\em Information processing -- 8-bit single-byte coded graphic character sets -- Part 2: Latin alphabet No. 2 }, ISO (1987). \bibitem[ISO 8859/6]{iso8859/6} ISO, {\em Information processing -- 8-bit single-byte coded graphic character sets -- Part 6: Latin/Arabic alphabet }, ISO (1987). \bibitem[ISO 8859/7]{iso8859/7} ISO, {\em Information processing -- 8-bit single-byte coded graphic character sets -- Part 7: Latin/Greek alphabet }, ISO (1987). \bibitem[Kerns87]{kerns87} R. Kerns, {\em Extended Characters in Common LISP }, X3J13 Character Subcommittee document, Symbolics Inc (1987). \bibitem[Kurokawa88]{kurokawa88} T. Kurokawa, et al., {\em Technical Issues on International Character Set Handling in Lisp }, ISO/IEC SC22 WG16 document N33, (1988). \bibitem[Linden87]{linden87} T. Linden, {\em Common LISP - Proposed Extensions for International Character Set Handling }, Version 01.11.87, IBM Corporation (1987). \bibitem[Steele84]{steele84} G. Steele Jr., {\em Common LISP: the Language }, Digital Press (1984). \bibitem[Xerox87]{xerox87} Xerox, {\em Character Code Standard, Xerox System Integration Standard }, Xerox Corp. (1987). \end{thebibliography} \end{document} % End of document.  Received: from life.ai.mit.edu (TCP 20015020120) by AI.AI.MIT.EDU 13 Jan 89 03:33:43 EST Received: from ai.ai.mit.edu by life.ai.mit.edu; Fri, 13 Jan 89 03:22:44 EST Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Jan 89 23:40:00 PST Date: Thu, 12 Jan 89 22:16:40 PST From: Thom Linden To: Common Lisp mailing Message-Id: <890112.221640.baggins@almvma> Subject: cs proposal part 2 of 3 %---------------------------------------------------------------------- % split into three parts this time as mailer had problems %---------------------------------------------------------------------- %---------------------------------------------------------------------- \newcommand{\edithead}{\begin{tabular}{l p{3.95in}} \multicolumn{2}{l} } \newcommand{\csdag}{\bf$\Rightarrow$\ddag} \newcommand{\editstart}{} \newcommand{\editend}{\\ & \end{tabular}} %---------------------------------------------------------------------- %---------------------------------------------------------------------- \appendix \chapter{Editorial Modifications to CLtL} The following sections specify the editorial changes needed in CLtL to support the proposal. Section/subsection numbers and titles match those found in \cite{steele84}. The notation {\csdag x (pn, function)} denotes a reference to paragraph x within the subsection (we count each individual example or metastatement as 1 paragraph of text). Also, {\bf (pn, function)}, or simply {\bf (pn)} is included as an additional aid to the reader indicating the page number and function modified. When an entire paragraph is deleted, the first few words of the paragraph is noted. If a section or paragraph of CLtL is {\em not} referenced, no editorial changes are required to support this proposal. \footnote{This may be an over optimistic statement since the changes are fairly pervasive. The editor should take the sense of Chapter 1 into account in resolving any discrepancies.} %---------------------------------------------------------------------- \setcounter{section}{1} \section{Data Types} % 2 %---------------------------------------------------------------------- \edithead {\csdag 8 (p12)} \editstart \\ \bf replace & \cltxt provides for a rich character set, including ways to represent characters of various type styles. \\ \bf with & \cltxt provides support for international language characters as well as characters used in specialized arenas, eg. mathematics. \editend \setcounter{subsection}{1} \subsection{Characters} % 2.2. \edithead {\csdag 1 (p20)} \editstart \\ \bf replace & \cltxt Characters are represented as data objects of type {\clkwd character}. There are two subtypes of interest, called {\clkwd standard-char} and {\clkwd string-char}. \\ \bf with & \cltxt Characters are represented as data objects of type {\clkwd character}. \editend \\ \edithead {\csdag 2 (p20)} \editstart \\ \bf replace & \cltxt This works well enough for printing characters. Non-printing characters \\ \bf with & \cltxt This works well enough for graphic characters. Non-graphic characters \editend \subsubsection{Standard Characters} % 2.2.1. \edithead {\csdag 0 section heading (p20)} \editstart \\ \bf replace & \cltxt Standard Characters \\ \bf with & \cltxt Base Characters \editend \\ \edithead {\csdag 1 before (p20)} \editstart \\ \bf insert & \cltxt A {\em character repertoire} defines a collection of characters independent of their specific rendered image or font. Character repertoires are specified independent of coding and their characters are only identified with a unique label, a graphic symbol, and a character description. A {\em coded character set} is a character repertoire plus an {\em encoding} providing a unique mapping between each character and a number which serves as the character representation. \\ & Many computers have some "base" coded character set (often a variant of ISO646-1983) which is a function of hardware instructions for dealing with characters, as well as the organization of the file system. This base character representation is likely to be the smallest transaction unit permitted for text stream I/O operations. \\ & The {\em base character repertoire} is used to refer to the collection of characters represented by the base coded character set. Common LISP does not define the base character encoding but does require all implementations to support a "standard" {\em subrepertoire} of the base character repertoire. \editend \\ \edithead {\csdag 1 before (p20)} \editstart \\ \bf insert & \cltxt The {\clkwd base-character} type is defined as a subtype of {\clkwd character}. A {\clkwd base-character} object can contain any member of the base character repertoire. Objects of type {\clkwd (and character (not base-character))} are referred to as {\em extended characters}. \editend \\ \edithead {\csdag 1 (p20)} \editstart \\ \bf delete & \cltxt Common LISP defines a "standard character set" ... \editend \\ \edithead {\csdag 1 (P20)} \editstart \\ \bf new & \cltxt As a subset of the base character repertoire, Common LISP defines a standard character subrepertoire for two purposes. Common LISP programs that are written in the standard character subrepertoire can be read by any Common LISP implementation; and Common LISP programs that use only standard characters as data objects are most likely to be portable. The standard characters are not defined by their glyphs, but by their roles within the language. There are two aspects to the roles of the standard characters: one is their role in reader and format control string syntax; the second is their role as components of the names of all Common LISP functions, macros, constants, and global variables. As long as an implementation chooses 96 glyphs and treats those 96 in a manner consistent with the language's specification for the standard characters (for example, the naming of functions), it doesn't matter what glyphs the I/O hardware uses to represent those characters: they are the standard characters. Any program or data text written wholly in those characters is portable through simple code conversion. The Common LISP standard character subrepertoire consists of a newline \#$\backslash${\clkwd Newline}, the graphic space character \#$\backslash${\clkwd Space}, and the following additional ninety-four graphic characters or their equivalents: \editend \\ \edithead {\csdag 2 (p21)} \editstart \\ \bf delete & \cltxt ! " \# ... \editend \\ \edithead {\csdag 2 new (p21)} \editstart \\ & {\bf Common LISP Standard Character Subrepertoire} \editend \footnote{\cltxt \#$\backslash${\clkwd Space} and \#$\backslash${\clkwd Newline} are omitted. graphic labels and descriptions are from ISO 6937/2. The first letter of the graphic label categorizes the character as follows: L - Latin, N - Numeric, S - Special .} \\ {\small \begin{tabular}{||l|c|l||l|c|l||} \hline ID & Glyph & Name or description & ID & Glyph & Name or description \\ \hline LA01 & a & small a & ND01 & 1 & digit 1 \\ \hline LA02 & A & capital A & ND02 & 2 & digit 2 \\ \hline LB01 & b & small b & ND03 & 3 & digit 3 \\ \hline LB02 & B & capital B & ND04 & 4 & digit 4 \\ \hline LC01 & c & small c & ND05 & 5 & digit 5 \\ \hline LC02 & C & capital C & ND06 & 6 & digit 6 \\ \hline LD01 & d & small d & ND07 & 7 & digit 7 \\ \hline LD02 & D & capital D & ND08 & 8 & digit 8 \\ \hline LE01 & e & small e & ND09 & 9 & digit 9 \\ \hline LE02 & E & capital E & ND10 & 0 & digit 0 \\ \hline LF01 & f & small f & SC03 & \$ & dollar sign \\ \hline LF02 & F & capital F & SP02 & ! & exclamation mark \\ \hline LG01 & g & small g & SP04 & " & quotation mark \\ \hline LG02 & G & capital G & SP05 & \apostrophe & apostrophe \\ \hline LH01 & h & small h & SP06 & ( & left parenthesis \\ \hline LH02 & H & capital H & SP07 & ) & right parenthesis \\ \hline LI01 & i & small i & SP08 & , & comma \\ \hline LI02 & I & capital I & SP09 & \_ & low line \\ \hline LJ01 & j & small j & SP10 & - & hyphen or minus sign \\ \hline LJ02 & J & capital J & SP11 & . & full stop, period \\ \hline LK01 & k & small k & SP12 & / & solidus \\ \hline LK02 & K & capital K & SP13 & : & colon \\ \hline LL01 & l & small l & SP14 & ; & semicolon \\ \hline LL02 & L & capital L & SP15 & ? & question mark \\ \hline LM01 & m & small m & SA01 & + & plus sign \\ \hline LM02 & M & capital M & SA03 & $<$ & less-than sign \\ \hline LN01 & n & small n & SA04 & = & equals sign \\ \hline LN02 & N & capital N & SA05 & $>$ & greater-than sign \\ \hline LO01 & o & small o & SM01 & \# & number sign \\ \hline LO02 & O & capital O & SM02 & \% & percent sign \\ \hline LP01 & p & small p & SM03 & \& & ampersand \\ \hline LP02 & P & capital P & SM04 & * & asterisk \\ \hline LQ01 & q & small q & SM05 & @ & commercial at \\ \hline LQ02 & Q & capital Q & SM06 & [ & left square bracket \\ \hline LR01 & r & small r & SM07 & $\backslash$ & reverse solidus \\ \hline LR02 & R & capital R & SM08 & ] & right square bracket \\ \hline LS01 & s & small s & SM11 & \{ & left curly bracket \\ \hline LS02 & S & capital S & SM13 & $|$ & vertical bar \\ \hline LT01 & t & small t & SM14 & \} & right curly bracket \\ \hline LT02 & T & capital T & SD13 & \bq & grave accent \\ \hline LU01 & u & small u & SD15 & $\hat{ }$ & circumflex accent \\ \hline LU02 & U & capital U & SD19 & $\tilde{ }$ & tilde \\ \hline LV01 & v & small v & & & \\ \hline LV02 & V & capital V & & & \\ \hline LW01 & w & small w & & & \\ \hline LW02 & W & capital W & & & \\ \hline LX01 & x & small x & & & \\ \hline LX02 & X & capital X & & & \\ \hline LY01 & y & small y & & & \\ \hline LY02 & Y & capital Y & & & \\ \hline LZ01 & z & small z & & & \\ \hline LZ02 & Z & capital Z & & & \\ \hline \end{tabular} } \\ \edithead {\csdag 3 (p21)} \editstart \\ \bf delete & \cltxt @ A B C... \editend \\ \edithead {\csdag 4 (p21)} \editstart \\ \bf delete & \cltxt \bq a b c... \editend \\ \edithead {\csdag 5 (p21)} \editstart \\ \bf delete & \cltxt The Common LISP Standard character set is apparently ... \editend \\ \edithead {\csdag 6 (p21)} \editstart \\ \bf replace & \cltxt Of the ninety-four non-blank printing characters \\ \bf with & \cltxt Of the ninety-five graphic characters \editend \\ \edithead {\csdag 9 (p21)} \editstart \\ \bf delete & \cltxt The following characters are called ... \editend \\ \edithead {\csdag 10 (p21)} \editstart \\ \bf delete & \cltxt {\clkwd \#$\backslash$Backspace \#$\backslash$Tab } ... \editend \\ \edithead {\csdag 11 (p21)} \editstart \\ \bf delete & \cltxt Not all implementations of Common ... \editend \subsubsection{Line Divisions} % 2.2.2. \edithead {\csdag 6 (p22)} \editstart \\ \bf replace & \cltxt a two-character sequence, such as {\clkwd \#$\backslash$Return } and then {\clkwd \#$\backslash$Newline }, is not acceptable, \\ \bf with & \cltxt a two-character sequence is not acceptable, \editend \\ \edithead {\csdag 8 (p22)} \editstart \\ \bf delete & \cltxt Implementation note: If an implementation uses ... \editend \subsubsection{Non-standard Characters} % 2.2.3. \edithead {\csdag delete entire section (p23)} \editstart \editend \subsubsection{Character Attributes} % 2.2.4. \edithead {\csdag 0 section heading (p23)} \editstart \\ \bf replace & \cltxt Character Attributes \\ \bf with & \cltxt Character Identity \editend \\ \edithead {\csdag 1 through 8 (p23)} \editstart \\ \bf delete all paragraphs& \cltxt Every object of type {\clkwd character} ... \editend \\ \edithead {\csdag 1 (p23)} \editstart \\ \bf new & \cltxt Common LISP characters are partitioned into a unique collection of repertoires called {\em Character Registries}. That is, each character is included in one and only one Character Registry. The label identifying each character within a Character Registry is a unique numerical value referred to as the {\em character index}. \\ & Characters are uniquely distinguished by their codes, which are drawn from the set of non-negative integers. That is, within Common LISP a unique numerical code is assigned to each semantically different character. Character codes are composed from a Character Registry and a character index. The convention by which a character index and Character Registry compose a character code is implementation dependent. \editend \subsubsection{String Characters} % 2.2.5. \edithead {\csdag delete entire section (p23)} \editstart \editend \setcounter{subsection}{4} \subsubsection{Character Registries} % 2.2.5. \edithead {\csdag new section (p23)} \editstart \\ \bf new & \cltxt Character registries provide portable specifications of character objects. Every character object is uniquely identified by a registry name and index. Character Registry names are strings formed from the Common LISP {\clkwd standard-p} characters. Within registry names, alphabetic case is ignored. \\ & Common LISP defines the following Character Registries: \footnote{This document defines a partial list of the Character Registry names. A subsequent document will define the complete Common LISP Character Registry Standard including the effect of the character predicates {\em alpha-char-p}, {\em lower-case-p}, etc..} \begin{itemize} \item Arabic \item Armenian \item Bo-po-mo-fo \item Control \item Cyrillic \item Georgian \item Greek \item Hangul \item Hebrew \item Hiragana \item Japanese-Punctuation \item Kanji-JIS-Level-1 \item Kanji-JIS-Level-2 \item Kanji-Gaiji \item Katakana \item Latin \item Latin-Punctuation \item Mathematical \item Pattern \item Phonetic \item Technical \end{itemize} \editend \\ \edithead {\csdag new section (p23)} \editstart \\ \bf new & \cltxt The Common LISP Character Registry Standard is fixed; an implementation may not extend the set of characters within any Common LISP Character Registry. \\ & An implementation may provide support for all or part of any Common LISP Character Registry and may provide new character registries which include characters having unique semantics (i.e. not defined in any other implementation-defined character registry or Common LISP Character Registry). Implementation registries must be uniquely named using only {\clkwd standard-p} characters. In addition, the repertoire names {\em base} and {\em standard} have reserved Common LISP usage. \\ & An implementation must document the registries it supports. For each registry supported, an implementation must define individual characters supported including at least the following: \begin{itemize} \item Character Labels, Glyphs, and Descriptions. \item $<$ Common LISP Character Registry name, character index $>$ pair if one exists otherwise $<$ implementation-defined character registry name, character index $>$ pair. \item Reader Canonicalization. \item Position in total ordering. The partial ordering of the Standard alphanumeric characters must be preserved. \item Effect of character predicates. In particular, \begin{itemize} \item {\clkwd alpha-char-p} \item {\clkwd lower-case-p} \item {\clkwd upper-case-p} \item {\clkwd both-case-p} \item {\clkwd graphic-char-p} \item {\clkwd standard-char-p} \item {\clkwd alphanumericp} \end{itemize} \item Interaction with File I/O. In particular, the coded character set standards \footnote{For example, "ISO8859/1-1987ccs".} and external encoding schemes \footnote{For example, {\em "Xerox System Integration Character Code Standard"}\cite{xerox87}.} which are supported must be specified. \end{itemize} \editend \subsection{Symbols} % 2.3. \edithead {\csdag 12 (p25)} \editstart \\ \bf replace & \cltxt A symbol may have uppercase letters, lowercase letters, or both in its print name. \\ \bf with & \cltxt A symbol may have characters from any supported character registry in its print name. It may have uppercase letters, lowercase letters, or both. \editend \setcounter{subsection}{4} \subsection{Arrays} \subsubsection{Vectors} \edithead {\csdag 6 (p29)} \editstart \\ \bf replace & \cltxt All implementations provide specialized arrays for the cases when the components are characters (or rather, a special subset of the characters); \\ \bf with & \cltxt All implementations provide specialized arrays for the cases when the components are characters (or optionally, special subsets of the characters); \editend \subsubsection{Strings} \edithead {\csdag 1 (p30)} \editstart \\ \bf replace & \cltxt A string is simply a vector of characters. More precisely, a string is a specialized vector whose elements are of type {\clkwd string-char}. \\ \bf with & \cltxt A string is simply a vector of characters. More precisely, a string is a specialized vector whose elements are of type {\clkwd character} or a subtype of character. \editend \setcounter{subsection}{14} \subsection{Overlap, Inclusion, and Disjointness of Types} % 2.15. \edithead {\csdag 14 (p34)} \editstart \\ \bf replace & \cltxt The type {\clkwd standard-char} is a subtype of {\clkwd string-char}; {\clkwd string-char} is a subtype of {\clkwd character}. \\ \bf with & \cltxt The type {\clkwd base-character} is a subtype of {\clkwd character}. \editend \\ \edithead {\csdag 15 (p34)} \editstart \\ \bf replace & \cltxt The type {\clkwd string} is a subtype of {\clkwd vector}, for {\clkwd string} means {\clkwd (vector string-char)}. \\ \bf with & \cltxt The type {\clkwd string} is a subtype of {\clkwd vector}, {\clkwd string} consists of vectors specialized by subtypes of {\clkwd character}. \editend \\ \edithead {\csdag 15 after (p34)} \editstart \\ \bf insert & \cltxt The type {\clkwd base-string} means {\clkwd (vector base-character)}. \editend \\ \edithead {\csdag 15 after (p34)} \editstart \\ \bf insert & \cltxt The type {\clkwd general-string} means {\clkwd (vector character)} and is a subtype of {\clkwd string}. \editend \\ \edithead {\csdag 20 (p34)} \editstart \\ \bf replace & \cltxt {\clkwd (simple-array string-char (*))}; \\ \bf with & \cltxt {\clkwd (simple-array character (*))}; \editend \\ \edithead {\csdag 20 after (p34)} \editstart \\ \bf insert & \cltxt The type {\clkwd simple-base-string} means {\clkwd (simple-array base-character (*))} and is the most efficient string which can hold the standard characters. {\clkwd simple-base-string} is a subtype of {\clkwd base-string}. \editend \\ \edithead {\csdag 20 after (p34)} \editstart \\ \bf insert & \cltxt The type {\clkwd simple-general-string} means {\clkwd (simple-array character (*))}. {\clkwd simple-general-string} is a subtype of {\clkwd general-string}. \editend %---------------------------------------------------------------------- \setcounter{section}{3} \section{Type Specifiers} % 4 %---------------------------------------------------------------------- \setcounter{subsection}{1} \subsection{Type Specifier Lists} % 4.2. \edithead {\csdag 8 Table 4-1 (alphabetic list) (p43)} \editstart \\ \bf remove & \\ & \cltxt {\clkwd standard-char} \\ & {\clkwd string-char} \editend \\ \edithead {\csdag 8 Table 4-1 (alphabetic list) (p43)} \editstart \\ \bf insert & \\ & \cltxt {\clkwd base-character} \\ & {\clkwd general-string} \\ & {\clkwd simple-base-string} \\ & {\clkwd simple-general-string} \editend \setcounter{subsection}{2} \subsection{Predicating Type Specifiers} % 4.3. \edithead {\csdag 2 (p43)} \editstart \\ \bf delete & \cltxt As an example, the entire ... \editend \\ \edithead {\csdag 3 delete example (p43)} \editstart \\ \bf delete & \cltxt {\clkwd (deftype string-char () } ... \editend \setcounter{subsection}{4} \subsection{Type Specifiers That Specialize} % 4.5. \edithead {\csdag 5 after (p46)} \editstart \\ \bf insert & \cltxt {\clkwd (character {\em registries})} \\ & This denotes a character type specialized to members of the specified registries. {\em registries} may be a single character registry name or a list of character registry names. \editend \setcounter{subsection}{5} \subsection{Type Specifiers That Abbreviate} % 4.6. \edithead {\csdag 20 (p49)} \editstart \\ \bf replace & \cltxt Means the same as {\clkwd (array string-char ({\em size}))}: the set of strings of the indicated size. \\ \bf with & \cltxt Means the union of the vector types specialized by subtypes of character and the indicated size. For the purpose of declaration, it is equivalent to {\clkwd (general-string ({\em size}))}. \editend \\ \edithead {\csdag 23 (p49)} \editstart \\ \bf replace & \cltxt Means the same as {\clkwd (simple-array string-char ({\em size}))}: the set of simple strings of the indicated size. \\ \bf with & \cltxt Means the union of the simple vector types specialized by subtypes of character and the indicated size. For the purpose of declaration, it is equivalent to {\clkwd (simple-general-string ({\em size}))}. \editend \\ \edithead {\csdag 23 after (p49)} \editstart \\ \bf insert & \cltxt {\clkwd (base-string {\em size})} \\ & Means the same as {\clkwd (array base-character ({\em size}))}: the set of base strings of the indicated size. \\ & {\clkwd (simple-base-string {\em size})} \\ & Means the same as {\clkwd (simple-array base-character ({\em size}))}: the set of simple base strings of the indicated size. \editend \\ \edithead {\csdag 23 after (p49)} \editstart \\ \bf insert & \cltxt {\clkwd (general-string {\em size})} \\ & Means the same as {\clkwd (array base-character ({\em size}))}: the set of base strings of the indicated size. \\ & {\clkwd (simple-general-string {\em size})} \\ & Means the same as {\clkwd (simple-array general-character ({\em size}))}: the set of simple general strings of the indicated size. \editend \setcounter{subsection}{7} \subsection{Type Conversion Function} % 4.8. \edithead {\csdag 6 (p51)} \editstart \\ \bf replace & \cltxt Some strings, symbols, and integers may be converted to characters. If {\em object} is a string of length 1, then the sole element of the print name is returned. If {\em object} is an integer {\em n}, then {\clkwd (int-char } {\em n}{\clkwd )} is returned. See {\clkwd character}. \\ \bf with & \cltxt Some strings amd symbols may be converted to characters. If {\em object} is a string of length 1, then the sole element of the print name is returned. See {\clkwd character}. \editend \\ \edithead {\csdag 6 after (p52)} \editstart \\ \bf insert & \begin{itemize} \cltxt \item Any string subtype may be converted to any other string subtype, provided the new string can contain all actual elements of the old string. It is an error if it cannot. \end{itemize} \editend %---------------------------------------------------------------------- \setcounter{section}{5} \section{Predicates} % 6 %---------------------------------------------------------------------- \edithead {\csdag 2 (p71)} \editstart \\ \bf replace & \cltxt but {\clkwd standard-char} begets {\clkwd standard-char-p} \\ \bf with & \cltxt but {\clkwd bit-vector} begets {\clkwd bit-vector-p} \editend \setcounter{subsection}{1} \subsection{Data Type Predicates} % 6.2. \setcounter{subsubsection}{1} \subsubsection{Specific Data Type Predicates} % 6.2.2. \edithead {\csdag 36 (p75)} \editstart \\ \bf replace & \cltxt {\clkwd characterp} {\em object} \\ \bf with & \cltxt {\clkwd characterp} {\em object} \&{\clkwd optional} {\em repertoire} \editend \\ \edithead {\csdag 37 (p75)} \editstart \\ \bf replace & \cltxt {\clkwd characterp} is true if its argument is a character, and otherwise is false. \\ \bf with & \cltxt If {\em repertoire} is omitted, {\clkwd characterp} is true if its argument is a character object, and otherwise is false. If a {\em repertoire} argument is specified, {\clkwd characterp} is true if its argument is a character object and a member of the specified repertoire, and otherwise is false. For example, {\clkwd (characterp \#$\backslash$A} {\clkwd "Latin")} is true since \#$\backslash$A is a member of the Common LISP Latin Character Registry. {\em repertoire} may be any supported character registry name or the reserved repertoire names "base" and "standard". {\clkwd (characterp x "base")} is true if its argument is a member of the base character repertoire and false otherwise. {\clkwd (characterp x "standard")} is true if its argument is a member of the standard character repertoire and false otherwise. \editend \\ \edithead {\csdag 38 (p75)} \editstart \\ \bf replace & \cltxt {\clkwd (characterp x) $\equiv$ (typep x \apostrophe character)} \\ \bf with & \cltxt {\clkwd (characterp x "standard") $\equiv$ (typep x \apostrophe (character "standard")} \editend \\ \edithead {\csdag 72 (p76)} \editstart \\ \bf replace & \cltxt See also {\clkwd standard-char-p, string-char-p, streamp,} \\ \bf with & \cltxt See also {\clkwd standard-char-p, streamp,} \editend \setcounter{subsubsection}{2} \subsubsection{Equality Predicates} % 6.2.3. \edithead {\csdag 75 (p81)} \editstart \\ \bf replace & \cltxt which ignores alphabetic case and certain other attributes of characters; \\ \bf with & \cltxt which ignores alphabetic case of characters; \editend %---------------------------------------------------------------------- \setcounter{section}{6} \section{Control Structure} % 7 %---------------------------------------------------------------------- \setcounter{subsection}{1} \subsection{Generalized Variables} % 7.2. \edithead {\csdag 19 modify table (p95)} \editstart \\ \bf replace & \cltxt char string-char \\ & schar string-char \\ \bf with & \cltxt char character \\ & schar character \\ & sbchar base-character \editend \\ \edithead {\csdag 22 table entry (p96)} \editstart \\ \bf delete & \cltxt char-bit first set-char-bit \editend %----------------------------------------------------------------------  Received: from life.ai.mit.edu (TCP 20015020120) by AI.AI.MIT.EDU 12 Jan 89 23:55:34 EST Received: from ai.ai.mit.edu by life.ai.mit.edu; Thu, 12 Jan 89 23:42:45 EST Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Jan 89 19:54:57 PST Date: Thu, 12 Jan 89 13:33:25 PST From: Thom Linden To: Common Lisp mailing Message-Id: <890112.133325.baggins@almvma> Subject: cs proposal part 1 \documentstyle{report} % Specifies the document style. \pagestyle{headings} \title{\bf Extensions to Common LISP to Support International Character Sets} \author{ Michael Beckerle\thanks{Gold Hill Computers} \and Paul Beiser\thanks{Hewlett-Packard} \and Robert Kerns\thanks{Independent consultant} \and Kevin Layer\thanks{Franz, Inc.} \and Thom Linden\thanks{IBM Research, Subcommittee Chair} \and Larry Masinter\thanks{Xerox Research} \and David Unietis\thanks{Lucid, Inc.} } \date{January 1, 1989} % Deleting this command produces today's date. \begin{document} \maketitle % Produces the title. \setcounter{secnumdepth}{4} \setcounter{tocdepth}{4} \tableofcontents %---------------------------------------------------------------------- %---------------------------------------------------------------------- \newfont{\cltxt}{cmr10} \newfont{\clkwd}{cmtt10} \newcommand{\apostrophe}{\clkwd '} \newcommand{\bq}{\clkwd\symbol{'22}} %---------------------------------------------------------------------- %---------------------------------------------------------------------- \chapter{Introduction} This is a proposal to the X3 J13 committee for both extending and modifying the Common LISP language definition to provide a standard basis for Common LISP support of the variety of characters used to represent the native languages of the international community. This proposal was created by the Character Subcommittee of X3 J13. We would like to acknowledge discussions with T. Yuasa and other members of the JIS Technical Working Group, comments from members of X3 J13, and the proposals \cite{ida87}, \cite{linden87}, \cite{kerns87}, and \cite{kurokawa88} for providing the motivation and direction for these extensions. As all these documents and discussions were created expressly for LISP standardization usage, we have borrowed freely from their ideas as well as the texts themselves. This document is separated into three parts. The first part explains the major language changes and their motivations. While intended as commentary to a general audience, and not explicitly as part of the standard document, the X3 J13 editor may include sections at her/his discretion. The second part, Appendix A, provides the page by page set of editorial changes to \cite{steele84}. The final part, Appendix B, contains language elements deleted from \cite{steele84} which we view as important from a compatibility viewpoint but consider deprecated Common LISP features. \section{Objectives} The major objectives of this proposal are: \begin{itemize} \item To provide a consistent, well-defined scheme allowing support of both very large character sets and multiple character sets. \footnote{The distinction between the terms {\em character repertoire} and {\em coded character set} is made later. The usage of the term {\em character set}, avoided after this introduction, encompasses both terms.} Many software applications are intended for international use, or have requirements for incorporation of language elements of multiple native languages within a single application. Also, many applications require specialized languages including, for example, scientific and typesetting symbols. In order to ensure some portability of these applications, data expressed in a mixture of these languages must be treated uniformly by the software language. All character and string manipulations should operate uniformly, regardless of the character set(s) of the character objects. This applies to array indexing, readtable definitions, read symbol construction and I/O operations. \item To ensure efficient performance of string and character operations. Many native languages, such as Japanese and Chinese, use character sets which contain more characters than the Latin alphabet. Supporting larger sized character sets frequently means employing larger data fields to uniquely encode each character. Common LISP implementations using larger sized character sets can incur performance penalties in terms of space, time, or both. The use of large and/or multiple character sets by an implementation implies the need for a more complex character type representation. Given a more complex character representation, the efficiency of language operations on characters (e.g. string operations) could be affected. \item To assure forward compatibility of the proposed model and definition with existing Common LISP implementations. Developers should not be required to re-write large amounts of either LISP code or data representations in order to apply the proposed changes to existing implementations. The proposed changes should provide an easy portability path for existing code to many possible implementations. \end{itemize} There are a number of issues, some under the general rubric of internationalization, which this proposal does {\em not} cover. Among these issues are: \begin{itemize} \item Time and date formats \item Monetary formats \item Numeric punctuation \item Fonts \item Lexicographic orderings \item Right-to-left and bidirectional languages \end{itemize} %---------------------------------------------------------------------- %---------------------------------------------------------------------- %---------------------------------------------------------------------- %---------------------------------------------------------------------- \chapter{Overview} We use several terms within this document which are new in the context of Common LISP. Definitions for the following prominent terms are provided for the reader's convenience. A {\em character repertoire} defines a collection of characters independent of their specific rendered image or font. This corresponds to the mathematical notion of a {\em set} \footnote{We avoid the term {\em character set} as it has been (over)used in the context of character repertoire as well as in the context of coded character set.}. Character repertoires are specified independent of coding and their characters are only identified with a unique label, a graphic symbol, and a character description. A {\em coded character set} is a character repertoire plus an {\em encoding} providing a unique mapping between each character and a number which serves as the character representation. There are numerous internationally standardized coded character sets; for example, \cite{iso8859/1} and \cite{iso646}. A character may be included in one or more character repertoires. Similarly, a character may be included in one or more coded character sets. For example, the Latin letter "A" is contained in the coded character set standards: ISO 8859/1, ISO 8859/2, ISO 6937/2, and others. Common LISP characters are partitioned into a unique collection of repertoires called {\em Character Registries}. That is, each character is included in one and only one Character Registry. The label identifying each character within a Character Registry is a unique numerical value referred to as the {\em character index}. In Common LISP a {\em character} data object is identified by its {\em character code}, a unique numerical code. Each character code is composed from a Character Registry shared by all characters of a particular Registry, and a character index, a numerical value which is unique within the Character Registry. Character data objects which are classified as {\em graphic}, or displayable, are each associated with a {\em glyph}. The glyph is the visual representation of the character. The primary purpose of introducing these terms is to provide a consistent naming to Common LISP concepts which are related to those found in ISO standardization of coded character sets. \footnote{The bibliography includes several relevant ISO coded character set standards.} They also serve as a demarkation between these standardization activities. For example, while Common LISP is free to define unique repertoires and facilities to manipulate them, it should not define coded character sets. A secondary purpose is to detach the language specification from underlying hardware representation. From a language specification viewpoint it is inconsequential whether characters occupy one or more (8-bit) bytes or whether a Common LISP implementation's internal representation for characters is distinct from or identical to any given external representation (for example, a text interchange representation \cite{iso6937/2}). We specifically do not propose any standard coded character sets. %---------------------------------------------------------------------- \section{Character Identity} Characters are uniquely distinguished by their codes, which are drawn from the set of non-negative integers. That is, within Common LISP a unique numerical code is assigned to each semantically different character. Character codes are composed from a Character Registry and a character index. The convention by which a character index and Character Registry compose a character code is implementation dependent. It is important to separate the notion of glyph from the notion of character data object when defining a scheme under which issues of identity can be rigorously decided by a computer language. Glyphs are the visual aspects of characters, writable on surfaces, and sometimes called 'graphics'. A language specification valid for more than a narrow range of systems can only make assumptions about the existence of {\em abstract} glyphs (for example, the Latin letter A) and not about glyph variants (for example, the italicized Latin letter {\em A}) or characteristics of display devices. Thus, a key element of this proposal is the removal of the {\em font} and {\em bits} attributes from the language specification. One ramification is that the distinction between {\clkwd string-char} and {\clkwd character} is eliminated. {\bf All} characters can be inserted into (type compatible) strings. In addition, all functions dealing with the {\em bits} and {\em font} attributes are either removed or modified by this proposal. A second ramification is the introduction of new functions to compose and decompose character objects. The {\clkwd characterp} predicate is extended to support testing membership of a character in a given Character Registry. \footnote{ For example, testing membership in the Japanese Katakana Character Registry. } Also, a global variable {\clkwd *all-registry-names*} is added to support application determination of supported Character Registries. A third ramification is that I/O functions must be modified to manage the interaction between the Common LISP treatment of characters and the external environment. The definition in \cite{steele84} of semi-standard characters has been eliminated. This is replaced by a more uniform approach with introduction of the Control Character Registry (see below). %---------------------------------------------------------------------- \section{Character Repertoires and Registries} A Common LISP program must be able to compose and decompose characters in a portable uniform manner, independent of any underlying representation. One possible composition is by the pair $<$ coded character set standard, decimal representation $>$ \footnote{This syntax is for illustration only and is not being proposed.}. Thus, for example, one might compose the Latin 'A' with the pair $<$ "ISO8859/2-1987ccs", 65 $>$, $<$ "ISO8859/6-1987ccs", 65 $>$, or $<$ "ISO646-1983ccs", 65 $>$, etc.. The difficulty here is two-fold. First, there are several ways to compose the same character and second, there may be multiple answers to the question: {\em To what coded character set does character object x belong?}.\footnote{Even worse, the answer might change yearly.} The identical problems occur if the pair $<$ character repertoire standard, decimal representation $>$ is used. \footnote{Existing repertoires seem to be defined exclusively in the context of coded character sets and not as standards in their own right.} The concept of Character Registry is introduced by this proposal to resolve the problem of character composition and decomposition. Each character is universally defined by the pair $<$ Character Registry name, character index $>$. For this to be a portable definition, it must have a standard meaning. Thus this proposal relies on a {\em Character Registry Standard}. There is no existing Character Registry Standard. Until such an ANSI or ISO standard exists, Common LISP defines the {\em Common LISP Character Registry Standard}. \footnote{It is the intention of X3 J13 to promote and adopt an eventual ANSI or ISO Character Registry Standard. In particular, we acknowledge that X3 J13 is {\em not} the appropriate forum to define the standard. We believe it is a required component of all programming languages providing support for international characters.} Common LISP defines the following Character Registries: \footnote{In the interest of brevity, this document will define only a partial list of the Character Registry names. A subsequent document will define the complete Common LISP Character Registry Standard including the effect of the character predicates {\em alpha-char-p}, {\em lower-case-p}, etc..} \footnote{ Character Registry names are strings formed from the Common LISP {\clkwd standard-p} characters. Within registry names, alphabetic case is ignored.} \begin{itemize} \item Arabic \item Armenian \item Bo-po-mo-fo \item Control \item Cyrillic \item Georgian \item Greek \item Hangul \item Hebrew \item Hiragana \item Japanese-Punctuation \item Kanji-JIS-Level-1 \item Kanji-JIS-Level-2 \item Kanji-Gaiji \item Katakana \item Latin \item Latin-Punctuation \item Mathematical \item Pattern \item Phonetic \item Technical \end{itemize} The Common LISP Character Registry Standard is fixed; an implementation may not extend the set of characters within any Common LISP Character Registry. An implementation may provide support for all or part of any Common LISP Character Registry and may provide new character registries which include characters having unique semantics (i.e. not defined in any other implementation-defined character registry or Common LISP Character Registry). Implementation registries must be uniquely named using only {\clkwd standard-p} characters. In addition, the repertoire names {\em base} and {\em standard} have reserved Common LISP usage. An implementation must document the registries it supports. For each registry supported, an implementation must define individual characters supported including at least the following: \begin{itemize} \item Character Labels, Glyphs, and Descriptions. \item $<$ Common LISP Character Registry name, character index $>$ pair if one exists otherwise $<$ implementation-defined character registry name, character index $>$ pair. \item Reader Canonicalization. \item Position in total ordering. The partial ordering of the Standard alphanumeric characters must be preserved. \item Effect of character predicates. In particular, \begin{itemize} \item {\clkwd alpha-char-p} \item {\clkwd lower-case-p} \item {\clkwd upper-case-p} \item {\clkwd both-case-p} \item {\clkwd graphic-char-p} \item {\clkwd standard-char-p} \item {\clkwd alphanumericp} \end{itemize} \item Interaction with File I/O. In particular, the coded character set standards \footnote{For example, "ISO8859/1-1987ccs".} and external encoding schemes \footnote{For example, {\em "Xerox System Integration Character Code Standard"}\cite{xerox87}.} which are supported must be specified. \end{itemize} The intent of the provision for multiple character registries is that native language glyphs (with associated digits and punctuation) \footnote{For example, the glyphs on the keycaps of a particular terminal, or any other glyph sets with a common use in graphics or symbolic communication. } should each be mapped by the I/O interface into registries inside Common LISP, all the members of which share a common registry name. Which glyph sets are supported by the overall computing system, the details of the mapping of glyphs to character codes, and any implementation unique character registry names used, are left unspecified by Common LISP. The diversity of glyph sets and coded character set conventions in use worldwide and the desirability of allowing Common LISP to manipulate symbolic elements from many languages, perhaps simultaneously, mandate such a flexible approach. %---------------------------------------------------------------------- \section{Hierarchy of Types} Providing support for extensive character repertoires may impact Common LISP implementation performance in terms of space, time, or both. \footnote{This does not apply to all implementations. Unique hardware support and user community requirements must be taken into consideration.} In particular, many existing implementations support variants of the ISO 8859/1 standard. Supporting large repertoires argues for a multi-byte internal representation for each character, even if an application primarily (or exclusively) uses the ISO 8859/1 characters. This proposal extends the definition of the character and string type hierarchy to include specialized subtypes of character and string. An implementation is free to associate compact internal representation tailored to each subtype. The {\clkwd string} type specifier, when used as a declaration (for example, in {\clkwd make-sequence}) is defined to mean the most general string subtype supported by the implementation. This definition emphasizes portability of existing Common LISP applications to international character environments over performance. Applications emphasizing efficiency of text processing in non-international environments will require some modification to utilize subtypes with compact internal representations. It has been suggested that either a single type is sufficient to support international characters, or that a hierarchy of types could be used, in a manner transparent to the user. A desire to provide flexibility which encourages implementations to support international characters without compromising application efficiency led us to accept the need for more than one type. We believe that these choices reflect a minimal modification of this aspect of the type system, and that exposing the types for string and character construction while requiring uniform treatment for characters otherwise is the most reasonable approach. \subsection{Character Type} The following type specifier is added as a subtype of {\clkwd character}. \begin{itemize} \item {\clkwd base-character} \end{itemize} An implementation may support additional subtypes of {\clkwd character} which may or may not be supertypes of {\clkwd base-character}. In addition, an implementation may define {\clkwd base-character} as equivalent to {\clkwd character}. Characters of type {\clkwd base-character} are referred to as {\em base characters}. Characters of type {\clkwd (and character (not base-character))} are referred to as {\em extended characters}. The base characters are distinguished in the following respects: \begin{itemize} \item The standard characters are a subrepertoire of the base characters. \item Only members of the base character repertoire can be elements of a base string. \item The base characters are, in general, the default characters for I/O operations. \end{itemize} No upper bound is specified for the number of glyphs in the base character repertoire--that is implementation dependent. The lower bound is 96, the number of standard characters defined for Common LISP. \footnote{Or, in contrast, the base repertoire may include all the Common LISP Character Registries.} The distinction of base characters is largely a pragmatic choice. It permits efficient handling of common situations, is in some sense privileged for host system I/O, and can serve as an intermediate basis for portability, less general than the standard characters, but possibly more useful across a narrower range of implementations. Many computers have some "base" character representation which is a function of hardware instructions for dealing with characters, as well as the organization of the file system. The base character representation is likely to be the smallest transaction unit permitted for text file and terminal I/O operations. On a system with a record based I/O paradigm, the base character representation is likely to be the smallest record quantum. On many computer systems, this representation is a byte. However, there are often multiple coded character sets supportable on a computer, through the use of special display and entry hardware, which are varying interpretations of the basic system character representation. For example, ISO 8859/1 and ISO 6937/2 are two different interpretations of the same 1-byte code representations. Many countries have their own glyph-to-code mappings for 1-byte character codes addressing the special requirements of national languages. Differentiating between these, without reference to display hardware, is a matter of convention, since they all use the same set of code representations. When a single byte is not enough, two or more bytes are sometimes used for character encoding. This makes character handling even more difficult on machines where the natural representation size is a byte, since not only is the semantic value of a character code a matter of convention, which may vary within the same computing system, but so is the identification of a set of bits as a complete character code. It is the intention of this proposal that the base characters of Common LISP be the natural characters of the host system: its composition should be determined by the code capacity of the natural file system and I/O transaction representations, and its assumed display glyphs should be those of the terminals most commonly employed. There are several advantages to this scheme. Internal representation of strings of just base characters can be more compact than strings including extended characters. Source programs are likely to consist predominantly of base characters since the standard characters are a subset of the base character repertoire. Parsing of pure base character text can be more efficient than parsing of text including extended characters. I/O can be performed more simply with base characters. The standard characters are the 96 characters used in the Common LISP definition {\bf or their equivalents}. This was the Common LISP \cite{steele84} definition, but {\em equivalents} is a vague term. The standard characters are not defined by their glyphs, but by their roles within the language. There are two aspects to the roles of the standard characters: one is their role in reader and format control string syntax; the second is their role as components of the names of all Common LISP functions, macros, constants, and global variables. As long as an implementation chooses 96 glyphs and treats those 96 in a manner consistent with the language's specification for the standard characters (e.g. the naming of functions), it doesn't matter what glyphs the I/O hardware uses to represent those characters: they are the standard characters. Any program or data text written wholly in those characters is portable through simple code conversion. \footnote{For example, the currency glyph, \$ , might be replaced uniformly by the currency glyph available on a particular display.} Additional mechanisms, such as in \cite{linden87}, which support establishment of equivalency between otherwise distinct characters are not excluded by this proposal. \footnote{We believe this is an important issue but it requires additional implementation experience. We also encourage new proposals from JIS and ISO LISP Working Groups on this issue.} \subsection{String Type} The {\clkwd string} type is defined as a vector of characters. More precisely, a string is a specialized vector whose elements are of type {\clkwd character} or a subtype of character. The following string subtypes are distinguished with standardized names: {\clkwd base-string}, {\clkwd general-string}, {\clkwd simple-base-string}, and {\clkwd simple-general-string}. All strings which are not base strings are referred to as {\em extended strings}. A base string can only contain base characters. A {\clkwd general-string} can contain any implementation supported base or extended characters, in any mixture. \footnote{This type might be more appropriately named {\clkwd most-general-string}. {\clkwd general-string} was subjectively judged to be less offensive.} All Common LISP functions defined to operate on strings treat base and extended strings uniformly with the following caveat: for any function which inserts a character into a string, it is an error to insert an extended character into a base string. \footnote{An implementation may, optionally, provide automatic coersion to an extended string.} An implementation may support string subtypes more general than {\clkwd base-string} but more specialized than {\clkwd general-string}. For example, a hypothetical implementation supporting Arabic and Cyrillic Character Registries might provide: \begin{itemize} \item {\clkwd general-string} -- may contain Arabic, Cyrillic or base characters in any mixture. \item {\clkwd region-specialized-string} -- may contain installation selected repertoire (Arabic/Cyrillic) or base characters in any mixture. \item {\clkwd base-string} -- may contain base characters \end{itemize} Though, clearly, portability of applications using {\clkwd region-specialized-string} is limited, a performance advantage might argue for its use. \footnote{{\clkwd region-specialized-string} is used here for illustration only; it is not being proposed as a standardized string subtype.} Alternatively, an implementation supporting a large base character repertoire including, say, Japanese Character Registries may define {\clkwd base-character} as equivalent to {\clkwd character}. We expect that applications sensitive to the performance of character handling in some host environments will utilize the string subtypes to provide performance improvement. Applications with emphasis on international portability will likely utilize only {\clkwd general-string}s. The {\clkwd coerce} function is extended to allow for explicit coercion between base strings and extended strings. During reader construction of symbols, if all the characters in the symbol's name are of type {\clkwd base-character}, then the name of the symbol may be stored as a base string. Otherwise it will be stored as an extended string. The base string type allows for more compact representation of strings of base characters, which are likely to predominate in any system. Note that in any particular implementation the base characters need not be the most compactly representable, since others might have a smaller repertoire. However, in most implementations base strings are likely to be more space efficient than extended strings. %---------------------------------------------------------------------- \section{Streams and System I/O} A lot of the work of ensuring that a Common LISP implementation operates correctly in a multiple coded character set environment must be performed by the I/O interface. The system I/O interface, abstracted in Common LISP as streams, is responsible for ensuring that text input from outside LISP is properly mapped into character objects internally, and that the inverse mapping is performed on output. It is beyond the scope of a language definition to specify the details of this operation, but options are specified which allow runtime indication from the user as to what coded character sets a stream uses, and how the mappings should be done. It is expected that implementations will provide reasonable defaults and invocation options to accommodate desired use at an installation. One keyword argument is proposed as an addition to {\clkwd open}: \begin{itemize} \item {\clkwd :external-code-format} whose value would be: \begin{itemize} \item A name or list indicating an implementation recognized scheme for representing 1 or more coded character sets. \footnote{ For example, the so/si convention used by IBM on 370 machines could be selected by a list including the name {\em "ibm-shift-delimited"}. The run-encoding convention defined by XEROX could be selected by {\em "xerox-run-encoded"}. The convention based on ASCII which uses leading bit patterns to distinguish two-byte codes from one-byte codes could be selected by {\em "ascii-high-byte-delimited"}. } As many coded character set names must be provided as the implementation requires for that external coding convention. \footnote{ For example, if {\em "ibm-shift-delimited"} were the {\clkwd :external-code-format} argument, two coded character set specifiers would have to be provided. } \end{itemize} \end{itemize} These arguments are provided for input, output, and bidirectional streams. It is an error to try to write a character other than a member of the specified coded character sets to a stream. (This excludes the \#$\backslash${\clkwd Newline} character. Implementations must provide appropriate line division behavior for all character streams.) An implementation supporting multiple coded character sets must allow for the external representation of characters to be separately (and perhaps multiply) specified to {\clkwd open}, since there can be circumstances under which more than one external representation for characters is in use, or more than one coded character set is mixed together in an external representation convention. In addition to supporting conversion at the system interface, the language must allow user programs to determine how much space data objects will require when output in whichever external representations are available. The new function {\clkwd external-width} takes a character or string object as its required argument. It also takes an optional {\em output-stream}. It returns the number of host system character representation quantum units \footnote{ Same as the storage width of a base character, usually a byte. } required to externally store that object, using the representation convention associated with the stream. If the object cannot be represented in that convention, the function returns {\clkwd nil}. This function is necessary to determine if strings can be written to fixed length fields in databases or terminal screen templates. Note that this function does not address the problem of calculating screen width of strings printed in proportional fonts. \footnote{ The X3 J13 proposal STREAM-INFO: ONE-DIMENSIONAL-FUNCTIONS modified to include these semantics is an acceptable alternative to the {\clkwd external-width} function proposed here.} %---------------------------------------------------------------------- %----------------------------------------------------------------------  Received: from life.ai.mit.edu (TCP 20015020120) by AI.AI.MIT.EDU 12 Jan 89 23:51:23 EST Received: from ai.ai.mit.edu by life.ai.mit.edu; Thu, 12 Jan 89 23:42:14 EST Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Jan 89 19:59:33 PST Date: Thu, 12 Jan 89 16:53:24 PST From: Thom Linden To: Common Lisp mailing Message-Id: <890112.165324.baggins@almvma> Subject: cs proposal Hopefully the character proposal covers all the varied comments we received previously. Thanks again to everyone for the constructive criticism. In particular, I wish to express our thanks to Yuasa-san, Kurokawa-san and the JIS Lisp committee. Regards, Thom  Received: from life.ai.mit.edu (TCP 20015020120) by AI.AI.MIT.EDU 12 Jan 89 23:50:50 EST Received: from ai.ai.mit.edu by life.ai.mit.edu; Thu, 12 Jan 89 23:41:42 EST Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Jan 89 19:59:21 PST Date: Thu, 12 Jan 89 13:36:53 PST From: Thom Linden To: Common Lisp mailing Message-Id: <890112.133653.baggins@almvma> Subject: cs proposal I've just sent out two messages containing the latest character proposal (no DRAFT this time). We will only vote on this at Hawaii if the full J13 agrees otherwise (which I expect) a network ballot will be sent right after Hawaii. Aloha, Thom  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 12 Jan 89 16:48:26 EST Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 12 Jan 89 13:31:37 PST Received: from GROUSE.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 520433; Thu 12-Jan-89 16:29:56 EST Date: Thu, 12 Jan 89 16:29 EST From: Robert A. Cassels Subject: Logical Operations on Numbers To: ELIOT@cs.umass.EDU, common-lisp@sail.stanford.EDU In-Reply-To: <8901122046.AA00579@crash.cs.umass.edu> Message-ID: <19890112212955.5.CASSELS@GROUSE.SCRC.Symbolics.COM> Date: Thu, 12 Jan 89 15:31 EST From: ELIOT@cs.umass.EDU Section 12.7 (pp 220-225) describes CL operations for manipulating finite sets using integers. Unfortunately there does not seem to be any predicate to determine if one set is a subset of another using this representation. 'logtest' serves as an intersection test, 'logbitp' serves as a member test but to determine subset relations seems to require computing the set difference (with logandc2) and comparing the result with zero. If the sets are moderately large (say several hundred elements) this involves expensive bignum operations that I would like to avoid. One can imagine a compiler noticing the pattern (LOGTEST .. (LOGNOT ..)) and compiling a call to a special routine which didn't do the explicit LOGNOT computation. I don't know of any compiler which does this, though. I have also thought of using bitvectors, but the operations on bitvectors (p 294) only operate on bitvectors of the same length. For vectors, it's not too hard to imagine that the shorter one should be treated as if it were extended with zeros (presumably at the higher index end). It's a little harder to decide what to do in the multidimensional case. Furthermore, the bitvector functions only include bitwise operations, but no subset test here either. Isn't SUBSET considered an important set manipulation primitive? Chris Eliot University of Massashusetts at Amherst Symbolics Common Lisp defines: SCL:BIT-VECTOR-SUBSET-P - Function (BIT-VECTOR-1 BIT-VECTOR-2 &key (:START1 0) :END1 (:START2 0) :END2) ;; BIT-VECTOR-1 is a subset of BIT-VECTOR-2 SCL:BIT-VECTOR-POSITION - Function (BIT BIT-VECTOR &key (:START 0) :END) ;; equivalent to (POSITION BIT BIT-VECTOR :START START :END END) SCL:BIT-VECTOR-ZERO-P - Function (BIT-VECTOR &key (:START 0) :END) SCL:BIT-VECTOR-EQUAL - Function (BIT-VECTOR-1 BIT-VECTOR-2 &key (:START1 0) :END1 (:START2 0) :END2) ;; equivalent to (EQUAL (SUBSEQ BIT-VECTOR-1 :START START1 :END END1) ;; (SUBSEQ BIT-VECTOR-2 :START START2 :END END2)) SCL:BIT-VECTOR-DISJOINT-P - Function (BIT-VECTOR-1 BIT-VECTOR-2 &key (:START1 0) :END1 (:START2 0) :END2) SCL:BIT-VECTOR-CARDINALITY - Function (BIT-VECTOR &key (:START 0) :END) ;; counts the "1" bits At the present time, -SUBSET-P, -EQUAL, and -DISJOINT-P all return NIL if the vectors have different lengths. A more CL-consistent way of doing cardinality is probably by analogy with the COUNT function: BIT-VECTOR-COUNT - Function (BIT BIT-VECTOR &key (:START 0) :END) ;; equivalent to (COUNT BIT BIT-VECTOR :START START :END END)  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 12 Jan 89 16:44:55 EST Received: from crash.cs.umass.edu ([128.119.40.235]) by SAIL.Stanford.EDU with TCP; 12 Jan 89 13:25:08 PST Received: from vax3.cs.umass.edu by crash.cs.umass.edu (5.59/Ultrix2.0-B) id AA00629; Thu, 12 Jan 89 16:25:58 est Message-Id: <8901122125.AA00629@crash.cs.umass.edu> Date: Thu, 12 Jan 89 16:19 EST From: MURRAY@cs.umass.EDU Subject: argument processing To: common-lisp@sail.stanford.EDU X-Vms-To: IN%"common-lisp@sail.stanford.EDU" Subj: Order of "processing" of arguments To: Common-Lisp@SAIL.Stanford.EDU > From: Bruce Krulwich > ... > It seems to me that as long as actuals and formals are matched up correctly > there is no reason for the language specification to specify the order of the > "processing" of the arguments during lambda-binding. The order of processing of lambda-binding is important, because &optional or &key parameters can have code that is executed if their arguments are not supplied in a call. By specifying the left-right order of processing, it defines that any arguments bound "on the left" are accessable to code executed "on the right". Kelly Murray  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 12 Jan 89 16:22:54 EST Received: from crash.cs.umass.edu ([128.119.40.235]) by SAIL.Stanford.EDU with TCP; 12 Jan 89 12:45:18 PST Received: from vax3.cs.umass.edu by crash.cs.umass.edu (5.59/Ultrix2.0-B) id AA00579; Thu, 12 Jan 89 15:46:00 est Message-Id: <8901122046.AA00579@crash.cs.umass.edu> Date: Thu, 12 Jan 89 15:31 EST From: ELIOT@cs.umass.EDU Subject: Logical Operations on Numbers To: common-lisp@sail.stanford.EDU X-Vms-To: IN%"common-lisp@sail.stanford.EDU" Section 12.7 (pp 220-225) describes CL operations for manipulating finite sets using integers. Unfortunately there does not seem to be any predicate to determine if one set is a subset of another using this representation. 'logtest' serves as an intersection test, 'logbitp' serves as a member test but to determine subset relations seems to require computing the set difference (with logandc2) and comparing the result with zero. If the sets are moderately large (say several hundred elements) this involves expensive bignum operations that I would like to avoid. I have also thought of using bitvectors, but the operations on bitvectors (p 294) only operate on bitvectors of the same length. Furthermore, the bitvector functions only include bitwise operations, but no subset test here either. Isn't SUBSET considered an important set manipulation primitive? Chris Eliot University of Massashusetts at Amherst  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 12 Jan 89 13:09:56 EST Received: from ATHENA.CS.YALE.EDU by SAIL.Stanford.EDU with TCP; 12 Jan 89 09:49:45 PST Received: by ATHENA.CS.YALE.EDU; Thu, 12 Jan 89 12:49:19 EST Date: Thu, 12 Jan 89 12:49:19 EST From: Bruce Krulwich Full-Name: Bruce Krulwich Message-Id: <8901121749.AA18587@ATHENA.CS.YALE.EDU> Received: by yale-hp-crown (szechuan) via WIMP-MAIL (Version 1.3/1.5) ; Thu Jan 12 12:51:16 To: Common-Lisp@SAIL.Stanford.EDU Subject: Order of "processing" of arguments Newsgroups: arpa.common-lisp In-Reply-To: <46940@yale-celray.yale.UUCP> Organization: Computer Science, Yale University, New Haven, CT 06520-2158 Michael Greenwald said: >Actually, CLtL pg 61 says that the arguments and parameters are >processed in order, from left to right. I don't know if "processed" >implies "evaluated", but I always assumed (perhaps incorrectly) it did. Guy Steele replied: >I interpret this as referring to how the (fully evaluated) arguments >are processed during lambda-binding, not to the order in which argument >forms in a function call are evaluated. After all, the arguments referred >to on page 61 might have come from a list given to APPLY, rather then >from EVAL on a function call. This seems vacuous to me. Does this mean that an implementation in which a procedure entry point knows how many arguments its receiving (through a link table, for instance, or simply by counting its arguments) and constructs a REST-arg list before doing the binding of the required args is in violation of CLtL because it processes the rightmost argument before the leftmost one?? I hope not. It seems to me that as long as actuals and formals are matched up correctly there is no reason for the language specification to specify the order of the "processing" of the arguments during lambda-binding. Bruce Krulwich krulwich@cs.yale.edu  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 12 Jan 89 07:44:22 EST Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by SAIL.Stanford.EDU with TCP; 12 Jan 89 04:27:20 PST Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 307893; 12 Jan 89 06:41:54 EST Date: Thu, 12 Jan 89 06:09 EST From: Robert W. Kerns Subject: Re: commonlisp types To: Robert W. Kerns , quiroz%cs.rochester.edu@RIVERSIDE.SCRC.SYMBOLICS.COM, common-lisp%sail.stanford.edu@RIVERSIDE.SCRC.SYMBOLICS.COM In-Reply-To: <19890110024213.3.RWK@F.ILA.Dialnet.Symbolics.COM> Message-ID: <19890112110920.0.RWK@F.ILA.Dialnet.Symbolics.COM> Date: Mon, 9 Jan 89 21:42 EST From: Robert W. Kerns BTW, our mailer didn't like the address Robert W. Kerns on the excuse that FUJI.ILA.Dialnet.Symbolics.COM is an unknown host. "It's not my PLANET, Monkey Boy!" -- John Wharten (villan from Buckaroo Bonzai) Sumimasen, ga... I think that's supposed to be "Wharfin" or something.  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 10 Jan 89 12:48:55 EST Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by SAIL.Stanford.EDU with TCP; 10 Jan 89 09:30:06 PST Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 307284; 10 Jan 89 12:28:11 EST Date: Mon, 9 Jan 89 21:42 EST From: Robert W. Kerns Subject: Re: commonlisp types To: quiroz%cs.rochester.edu@RIVERSIDE.SCRC.SYMBOLICS.COM, common-lisp%sail.stanford.edu@RIVERSIDE.SCRC.SYMBOLICS.COM In-Reply-To: <8901070112.AA09737@lesath.cs.rochester.edu> Message-ID: <19890110024213.3.RWK@F.ILA.Dialnet.Symbolics.COM> Date: Fri, 06 Jan 89 20:12:09 -0500 From: quiroz@cs.rochester.edu : So I'm curious. Does any compiler actually get this right? KCL. See script at the end of this message. OK, next question: Does it open-code or otherwise optimize TYPEP, or just call TYPEP on the list? If you don't know, I'll check it next time I use KCL (which will be *after* X3J13). BTW, our mailer didn't like the address Robert W. Kerns on the excuse that FUJI.ILA.Dialnet.Symbolics.COM is an unknown host. "It's not my PLANET, Monkey Boy!" -- John Wharten (villan from Buckaroo Bonzai) As a workaround, you can use RWK%FUJI.ILA.Dialnet.Symbolics.Com@Riverside.SCRC.Symbolics.Com which is essentially what I have to do to send to you. Or you can use RWK@AI.AI.MIT.Edu, which forwards to the same place.  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 10 Jan 89 11:14:40 EST Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 10 Jan 89 07:56:38 PST Date: Mon, 09 Jan 89 19:50:46 PST From: Thom Linden To: Common Lisp mailing Message-ID: <890109.195046.baggins@IBM.com> Subject: Character proposal The revised proposal should be transmitted fairly soon. Due to this delay, I won't be asking for a vote unless J13 agrees it is ready. The content of the scheduled time for characters will be to review the substantial changes. I will bring copies to the meeting as well as send over the network. Regards, Thom  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 7 Jan 89 04:07:10 EST Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by SAIL.Stanford.EDU with TCP; 7 Jan 89 00:54:42 PST Received: from LUCID.COM by Riverside.SCRC.Symbolics.COM via INTERNET with SMTP id 306559; 7 Jan 89 03:53:00 EST Received: from bhopal ([192.9.200.13]) by heavens-gate id AA08351g; Sat, 7 Jan 89 00:50:24 PST Received: by bhopal id AA02943g; Sat, 7 Jan 89 00:52:38 PST Date: Sat, 7 Jan 89 00:52:38 PST From: Jon L White Message-Id: <8901070852.AA02943@bhopal> To: RWK@FUJI.ILA.Dialnet.Symbolics.COM Cc: jonl%lucid.com@Riverside.SCRC.Symbolics.Com, common-lisp%sail.stanford.edu@Riverside.SCRC.Symbolics.Com In-Reply-To: Robert W. Kerns's message of Fri, 6 Jan 89 15:16 EST <19890106201603.1.RWK@CALVARY.ILA.Dialnet.Symbolics.COM> Subject: commonlisp types re: [TYPE-SPECIFIER-P] I'd like to encourage you to make YOUR definition explicit for us, as a starting point. Well, what I can tell you in reasonable terms won't be that helpful. We simpy hook in to the part of SUBTYPEP that has to resolve these questions, and "catch" any signals about unrecognized types. For symbols, the question of a recognized type is fairly easy -- there's a list in CLtL of some basic types, and then there's more basic types coming from DEFSTRUCT, and finally there's "recursion" via DEFTYPE. Can you think of an easier answer for this? re: Anyone know of an implementation for which this fails? Yes, Symbolics. You must have missed my query about any implementations for which it succeeds! Any implementation which does source-rewriting to optimize TYPEP has to concern itself with this issue. (The issue is the same as for doing INLINEing, but Symbolics fails to use the same mechanism for optimizations as it does for inlining.) Lucid succeeds (and one or two others that I tried). Oddly enough, Lucid also "fails" to use the same mechanism for compiler optimizers as it does for INLINEing -- and it gets the optimizations right, but certain cases of lexical inlining screws wrong. -- JonL --  Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 7 Jan 89 00:24:48 EST Received: from cayuga.cs.rochester.edu (CS.ROCHESTER.EDU) by SAIL.Stanford.EDU with TCP; 6 Jan 89 21:11:52 PST Received: from lesath.cs.rochester.edu by cayuga.cs.rochester.edu (5.59/k) id AA09897; Fri, 6 Jan 89 20:12:20 EST Received: from loopback by lesath.cs.rochester.edu (3.2/k) id AA09737; Fri, 6 Jan 89 20:12:14 EST Message-Id: <8901070112.AA09737@lesath.cs.rochester.edu> To: common-lisp@sail.stanford.edu Subject: Re: commonlisp types In-Reply-To: Your message of Fri, 06 Jan 89 15:33:00 -0500. <19890106203322.2.RWK@CALVARY.ILA.Dialnet.Symbolics.COM> Date: Fri, 06 Jan 89 20:12:09 -0500 From: quiroz@cs.rochester.edu : So I'm curious. Does any compiler actually get this right? KCL. See script at the end of this message. BTW, our mailer didn't like the address Robert W. Kerns on the excuse that FUJI.ILA.Dialnet.Symbolics.COM is an unknown host. Cesar KCl (Kyoto Common Lisp) June 3, 1987 --- UofR version of September 9, 1988 Loading /u/quiroz/.kclrc Loading /u/quiroz/work/kcl/defsys/defsys.o Finished loading /u/quiroz/work/kcl/defsys/defsys.o Finished loading /u/quiroz/.kclrc > (defun bar (x) (symbolp x)) bar > (defun foo (x) (flet ((bar (y) (integerp y))) (typep x '(satisfies bar)))) foo > (foo 'x) t >(compile 'bar) End of Pass 1. End of Pass 2. OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3 bar >(compile 'foo) End of Pass 1. End of Pass 2. OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3 foo >(foo 'x) t >  Received: from SAIL.Stanford.EDU (TCP 4425400302) by AI.AI.MIT.EDU 6 Jan 89 17:04:48 EST Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by SAIL.Stanford.EDU with TCP; 6 Jan 89 13:46:18 PST Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 306344; 6 Jan 89 15:56:48 EST Received: from CALVARY.ILA.Dialnet.Symbolics.COM by F.ILA.Dialnet.Symbolics.COM via CHAOS with CHAOS-MAIL id 7601; Fri 6-Jan-89 15:15:53 EST Date: Fri, 6 Jan 89 15:16 EST From: Robert W. Kerns Subject: commonlisp types To: Jon L White cc: common-lisp%sail.stanford.edu@Riverside.SCRC.Symbolics.Com In-Reply-To: <8901040858.AA01403@bhopal> Message-ID: <19890106201603.1.RWK@CALVARY.ILA.Dialnet.Symbolics.COM> Date: Wed, 4 Jan 89 00:58:57 PST From: Jon L White re: How do you define "valid type specifier"? Very syntactically. I think its perfectly acceptable to have a set of combination rules for making "words" in the type-specifier syntax, even though some such "words" would be gibberish. The important thing is that base-level types -- those defined in CLtL -- along with DEFSTRUCT extensions be recognizable. They don't have the problems that SATISFIES generates, or that a broken user definition generates (such as your DEFTYPE FOO example). I'm not saying there's a fundamental problem here, just that there's a choice to be made, and that writing precise and understandable definitions is non-trivial. I'd like to encourage you to make YOUR definition explicit for us, as a starting point. By the bye, on another note, I haven't seen any implementation that has the bug Kent wondered about earlier: (defun bar (x) (symbolp x)) (defun foo (x) (flet ((bar (y) (integerp y))) (typep x '(satisfies bar)))) (foo 'x) The correct answer is T, but I bet a lot of implementations return NIL in compiled code. Anyone know of an implementation for which this fails? Yes, Symbolics. You must have missed my query about any implementations for which it succeeds! Any implementation which does source-rewriting to optimize TYPEP has to concern itself with this issue. (The issue is the same as for doing INLINEing, but Symbolics fails to use the same mechanism for optimizations as it does for inlining.)  Received: from SAIL.Stanford.EDU (TCP 4425400302) by AI.AI.MIT.EDU 6 Jan 89 17:06:50 EST Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by SAIL.Stanford.EDU with TCP; 6 Jan 89 13:46:18 PST Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 306345; 6 Jan 89 15:57:47 EST Received: from CALVARY.ILA.Dialnet.Symbolics.COM by F.ILA.Dialnet.Symbolics.COM via CHAOS with CHAOS-MAIL id 7603; Fri 6-Jan-89 15:33:07 EST Date: Fri, 6 Jan 89 15:33 EST From: Robert W. Kerns Subject: commonlisp types To: gls%Think.COM@Riverside.Symbolics.COM, jwz%spice.cs.cmu.edu@Riverside.SCRC.Symbolics.COM, common-lisp%sail.stanford.edu@Riverside.SCRC.Symbolics.COM In-Reply-To: <881222151736.1.KMP@BOBOLINK.SCRC.Symbolics.COM> Supersedes: <19890103102924.8.RWK@F.ILA.Dialnet.Symbolics.COM> Comments: Retransmission of failed mail. Message-ID: <19890106203322.2.RWK@CALVARY.ILA.Dialnet.Symbolics.COM> Date: Thu, 22 Dec 88 15:17 EST From: Kent M Pitman Fyi, it turns out this rationale doesn't hold as much water as you'd think. Consider: (defun bar (x) (symbolp x)) (defun foo (x) (flet ((bar (y) (integerp y))) (typep x '(satisfies bar)))) (foo 'x) The correct answer is T, but I bet a lot of implementations return NIL in compiled code. Like the Symbolics system, Boo, Hiss! In terms of source transformations, this would have to compile the TYPEP as follows: (defun foo (x) (flet ((bar (y) (integerp y))) (let ((#:G0002 x)) (macrolet ((bar (a) `(funcall (symbol-function 'bar) ,a))) (bar #:G0002))))) Which is obviously going to require either a codewalker or a typewalker to identify either locally defined functions or functions used in the type expansion to shadow with MACROLET. So I'm curious. Does any compiler actually get this right? Really, this is a general problem with any form of source-code rewrites. The Symbolics compiler does get this right with inlined functions, but I'll bet it doesn't with some other internal in-lined things that work as source transformations.