Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 22 Feb 89 17:32:16 EST
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Feb 89  13:30:31 PST
Date: Wed, 22 Feb 89 00:14:09 PST
From: Thom Linden <baggins@IBM.com>
To: Common Lisp mailing <common-lisp@sail.stanford.edu>
Message-ID: <890222.001409.baggins@almvma>
Subject: cs proposal part 2

%----------------------------------------------------------------------
%----------------------------------------------------------------------

\newcommand{\edithead}{\begin{tabular}{l p{3.95in}}
  \multicolumn{2}{l} }

\newcommand{\csdag}{\bf$\Rightarrow$\ddag}

\newcommand{\editstart}{}

\newcommand{\editend}{\\ & \end{tabular}}

%----------------------------------------------------------------------
%----------------------------------------------------------------------
\appendix
\chapter{Editorial Modifications to CLtL}

The following sections specify the editorial changes needed in
CLtL to support the proposal.  Section/subsection numbers and titles
match those found in \cite{steele84}.  The notation
{\csdag x (pn, function)} denotes a reference to paragraph x within the
subsection (we count each individual example or metastatement
as 1 paragraph of text).  Also, {\bf (pn, function)}, or simply
{\bf (pn)} is included as an additional
aid to the reader indicating the page number and function modified.
When an entire paragraph is deleted,
the first few words of the paragraph is noted.

If a section or paragraph of CLtL is {\em not} referenced,
no editorial changes are required to support this proposal.
\footnote{This may be an over optimistic statement since the changes
are fairly pervasive.  The editor should take the sense of
Chapter 1 into account in resolving any discrepancies.}

%----------------------------------------------------------------------
\setcounter{section}{1}
\section{Data Types}                        % 2
%----------------------------------------------------------------------


\edithead {\csdag 8 (p12)}
\editstart
\\ \bf replace &
\cltxt
   provides for a
   rich character set, including ways to represent characters of various
   type styles.
\\ \bf with &
\cltxt
   provides support for international language characters as well
   as characters used in specialized arenas, eg. mathematics.
\editend

\setcounter{subsection}{1}
\subsection{Characters}                     % 2.2.

\edithead {\csdag 1 (p20)}
\editstart
\\ \bf replace &
\cltxt
  Characters are represented as data objects of type {\clkwd character}.
  There are two subtypes of interest, called
  {\clkwd standard-char} and {\clkwd string-char}.
\\ \bf with &
\cltxt
  Characters are represented as data objects of type
  {\clkwd character}.
\editend
\\
\edithead {\csdag 2 (p20)}
\editstart
\\ \bf replace &
\cltxt
  This works well enough for printing characters. Non-printing
  characters
\\ \bf with &
\cltxt
  This works well enough for graphic characters.  Non-graphic
  characters
\editend

\subsubsection{Standard Characters}         % 2.2.1.

\edithead {\csdag 1 before (p20)}
\editstart
\\ \bf insert &
\cltxt
  A {\em character repertoire} defines a collection of characters
  independent of their specific rendered image or font.
  Character
  repertoires are specified independent of coding and their characters
  are only identified with a unique label, a graphic symbol, and
  a character description.
  A {\em coded character set} is a character repertoire plus
  an {\em encoding} providing a unique mapping between each character
  and a number which serves as the character representation.
\\ &
  Common LISP requires all implementations to support a {\em standard}
  character subrepertoire.  Typically, an implementation
  incorporates the standard
  characters as a subset of a larger repertoire corresponding
  to a frequently used set of characters, or base coded character
  set.
  The term {\em base character repertoire} refers to
  the collection of characters represented by
  the base coded character set.
\editend
\\
\edithead {\csdag 1 before (p20)}
\editstart
\\ \bf insert &
\cltxt
  The {\clkwd base-character} type is defined as a subtype of
  {\clkwd character}.  A {\clkwd base-character}
  object can contain any member of the base character repertoire.
  Objects of type
  {\clkwd (and character (not base-character))} are referred to
  as {\em extended characters}.
\editend
\\
\edithead {\csdag 1 (p20)}
\editstart
\\ \bf delete &
\cltxt
  Common LISP defines a "standard character set" ...
\editend
\\
\edithead {\csdag 1 (P20)}
\editstart
\\ \bf new &
\cltxt
  The Common LISP
  standard character subrepertoire consists of
  a newline \#$\backslash${\clkwd Newline}, the
  graphic space character \#$\backslash${\clkwd Space},
  and the following additional
  ninety-four graphic characters or their equivalents:
\editend
\\
\edithead {\csdag 2 (p21)}
\editstart
\\ \bf delete &
\cltxt
  ! " \# ...
\editend
\\
\edithead {\csdag 2 new (p21)}
\editstart
\\ &
  {\bf Common LISP Standard Character Subrepertoire}
\editend
\footnote{\cltxt \#$\backslash${\clkwd Space}
and \#$\backslash${\clkwd Newline} are omitted.
graphic labels and descriptions are from ISO 6937/2.
The first letter of the graphic label categorizes the
character as follows: L - Latin, N - Numeric, S - Special
.}
\\
{\small \begin{tabular}{||l|c|l||l|c|l||}    \hline
  Label  &    Glyph    &  Name or description
& Label  &    Glyph    &  Name or description
\\ \hline
  LA01  &  a  &  small a
& ND01  &  1  &  digit 1
\\ \hline
  LA02  &  A  &  capital A
& ND02  &  2  &  digit 2
\\ \hline
  LB01  &  b  &  small b
& ND03  &  3  &  digit 3
\\ \hline
  LB02  &  B  &  capital B
& ND04  &  4  &  digit 4
\\ \hline
  LC01  &  c  &  small c
& ND05  &  5  &  digit 5
\\ \hline
  LC02  &  C  &  capital C
& ND06  &  6  &  digit 6
\\ \hline
  LD01  &  d  &  small d
& ND07  &  7  &  digit 7
\\ \hline
  LD02  &  D  &  capital D
& ND08  &  8  &  digit 8
\\ \hline
  LE01  &  e  &  small e
& ND09  &  9  &  digit 9
\\ \hline
  LE02  &  E  &  capital E
& ND10  &  0  &  digit 0
\\ \hline
  LF01  &  f  &  small f
& SC03  &  \$    &  dollar sign
\\ \hline
  LF02  &  F  &  capital F
& SP02  &  !     &  exclamation mark
\\ \hline
  LG01  &  g  &  small g
& SP04  &  "     &  quotation mark
\\ \hline
  LG02  &  G  &  capital G
& SP05  &  \apostrophe     &  apostrophe
\\ \hline
  LH01  &  h  &  small h
& SP06  &  (     &  left parenthesis
\\ \hline
  LH02  &  H  &  capital H
& SP07  &  )     &  right parenthesis
\\ \hline
  LI01  &  i  &  small i
& SP08  &  ,     &  comma
\\ \hline
  LI02  &  I  &  capital I
& SP09  &  \_    &  low line
\\ \hline
  LJ01  &  j  &  small j
& SP10  &  -     &  hyphen or minus sign
\\ \hline
  LJ02  &  J  &  capital J
& SP11  &  .     &  full stop, period
\\ \hline
  LK01  &  k  &  small k
& SP12  &  /     &  solidus
\\ \hline
  LK02  &  K  &  capital K
& SP13  &  :     &  colon
\\ \hline
  LL01  &  l  &  small l
& SP14  &  ;     &  semicolon
\\ \hline
  LL02  &  L  &  capital L
& SP15  &  ?     &  question mark
\\ \hline
  LM01  &  m  &  small m
& SA01  &  +     &  plus sign
\\ \hline
  LM02  &  M  &  capital M
& SA03  &  $<$   &  less-than sign
\\ \hline
  LN01  &  n  &  small n
& SA04  &  =   &  equals sign
\\ \hline
  LN02  &  N  &  capital N
& SA05  &  $>$   &  greater-than sign
\\ \hline
  LO01  &  o  &  small o
& SM01  &  \#    &  number sign
\\ \hline
  LO02  &  O  &  capital O
& SM02  &  \%    &  percent sign
\\ \hline
  LP01  &  p  &  small p
& SM03  &  \&    &  ampersand
\\ \hline
  LP02  &  P  &  capital P
& SM04  &  *     &  asterisk
\\ \hline
  LQ01  &  q  &  small q
& SM05  &  @     &  commercial at
\\ \hline
  LQ02  &  Q  &  capital Q
& SM06  &  [     &  left square bracket
\\ \hline
  LR01  &  r  &  small r
& SM07  &  $\backslash$   &  reverse solidus
\\ \hline
  LR02  &  R  &  capital R
& SM08  &  ]     &  right square bracket
\\ \hline
  LS01  &  s  &  small s
& SM11  &  \{    &  left curly bracket
\\ \hline
  LS02  &  S  &  capital S
& SM13  &  $|$     &  vertical bar
\\ \hline
  LT01  &  t  &  small t
& SM14  &  \}    &  right curly bracket
\\ \hline
  LT02  &  T  &  capital T
& SD13  &  \bq   &  grave accent
\\ \hline
  LU01  &  u  &  small u
& SD15  &  $\hat{ }$  &  circumflex accent
\\ \hline
  LU02  &  U  &  capital U
& SD19  &  $\tilde{ }$ &  tilde
\\ \hline
  LV01  &  v  &  small v
& & &
\\ \hline
  LV02  &  V  &  capital V
& & &
\\ \hline
  LW01  &  w  &  small w
& & &
\\ \hline
  LW02  &  W  &  capital W
& & &
\\ \hline
  LX01  &  x  &  small x
& & &
\\ \hline
  LX02  &  X  &  capital X
& & &
\\ \hline
  LY01  &  y  &  small y
& & &
\\ \hline
  LY02  &  Y  &  capital Y
& & &
\\ \hline
  LZ01  &  z  &  small z
& & &
\\ \hline
  LZ02  &  Z  &  capital Z
& & &
\\
\hline
\end{tabular} }
\\
\edithead {\csdag 3 (p21)}
\editstart
\\ \bf delete &
\cltxt
  @ A B C...
\editend
\\
\edithead {\csdag 4 (p21)}
\editstart
\\ \bf delete &
\cltxt
  \bq a b c...
\editend
\\
\edithead {\csdag 5 (p21)}
\editstart
\\ \bf delete &
\cltxt
  The Common LISP Standard character set is apparently ...
\editend
\\
\edithead {\csdag 6 (p21)}
\editstart
\\ \bf replace &
\cltxt
  Of the ninety-four non-blank printing characters
\\ \bf with &
\cltxt
  Of the ninety-five graphic characters
\editend
\\
\edithead {\csdag 9 (p21)}
\editstart
\\ \bf delete &
\cltxt
  The following characters are called ...
\editend
\\
\edithead {\csdag 10 (p21)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd \#$\backslash$Backspace \#$\backslash$Tab } ...
\editend
\\
\edithead {\csdag 11 (p21)}
\editstart
\\ \bf delete &
\cltxt
  Not all implementations of Common ...
\editend

\subsubsection{Line Divisions}              % 2.2.2.

\edithead {\csdag 6 (p22)}
\editstart
\\ \bf replace &
\cltxt
  a two-character sequence, such as
  {\clkwd \#$\backslash$Return } and then
  {\clkwd \#$\backslash$Newline },
  is not acceptable,
\\ \bf with &
\cltxt
  a two-character sequence is not acceptable,
\editend
\\
\edithead {\csdag 8 (p22)}
\editstart
\\ \bf delete &
\cltxt
  Implementation note: If an implementation uses ...
\editend

\subsubsection{Non-standard Characters}     % 2.2.3.

\edithead {\csdag delete entire section (p23)}
\editstart
\editend

\subsubsection{Character Attributes}        % 2.2.4.

\edithead {\csdag 0 section heading (p23)}
\editstart
\\ \bf replace &
\cltxt
  Character Attributes
\\ \bf with &
\cltxt
  Character Identity
\editend
\\
\edithead {\csdag 1 through 8 (p23)}
\editstart
\\ \bf delete all paragraphs&
\cltxt
  Every object of type {\clkwd character} ...
\editend
\\
\edithead {\csdag 1 (p23)}
\editstart
\\ \bf new &
\cltxt
Characters are uniquely distinguished by their codes,
which are drawn from the set of
non-negative integers.  That is, within Common LISP
a unique numerical code
is assigned to each semantically different character.
\\ &
Common LISP
characters are partitioned into a unique collection of
repertoires called {\em
character registries}.  That is, each character is included
in one and only one character registry.
\\ &
Character codes are composed from a character registry and a
character label.  The convention by which a character registry and
character label compose a character code is implementation
dependent.
\editend

\subsubsection{String Characters}           % 2.2.5.

\edithead {\csdag delete entire section (p23)}
\editstart
\editend

\setcounter{subsection}{4}
\subsubsection{Character Registries}           % 2.2.5.

\edithead {\csdag new section (p23)}
\editstart
\\ \bf new &
\cltxt
An implementation must document the registries it supports.
Registries must be uniquely
named using only {\clkwd standard-p} characters.
For each registry supported,
an implementation must define the individual characters supported
including at least the following:
\begin{itemize}
\item Character Labels,
Glyphs, and Descriptions.
\item Reader Canonicalization.
\item Effect of character predicates.
\begin{itemize}
\item {\clkwd alpha-char-p}
\item {\clkwd lower-case-p}
\item {\clkwd upper-case-p}
\item {\clkwd both-case-p}
\item {\clkwd graphic-char-p}
\item {\clkwd alphanumericp}
\end{itemize}
\item Interaction with File I/O.  In particular, the
coded character set standards
\footnote{For example, ISO8859/1-1987.} and
external encoding schemes
which are supported must be specified.
\end{itemize}
\editend

\subsection{Symbols}                        % 2.3.

\edithead {\csdag 12 (p25)}
\editstart
\\ \bf replace &
\cltxt
  A symbol may have uppercase letters, lowercase letters, or both
  in its print name.
\\ \bf with &
\cltxt
  A symbol may have characters from any supported character registry
  in its print name.
  It may have uppercase letters, lowercase letters, or both.
\editend

\setcounter{subsection}{4}
\subsection{Arrays}
\subsubsection{Vectors}

\edithead {\csdag 6 (p29)}
\editstart
\\ \bf replace &
\cltxt
  All implementations provide specialized arrays for the cases when
  the components are characters (or rather, a special subset of the
  characters);
\\ \bf with &
\cltxt
  All implementations provide specialized arrays for the cases when
  the components are characters (or optionally, special subsets of
  the characters);
\editend

\subsubsection{Strings}

\edithead {\csdag 1 (p30)}
\editstart
\\ \bf replace &
\cltxt
  A string is simply a vector of characters.  More precisely, a string
  is a specialized vector whose elements are of type
  {\clkwd string-char}.
\\ \bf with &
\cltxt
  A string is simply a vector of characters.  More precisely, a string
  is a specialized vector whose elements are of type
  {\clkwd character} or a subtype
  of character.
\editend

\setcounter{subsection}{14}
\subsection{Overlap, Inclusion, and Disjointness of Types} % 2.15.


\edithead {\csdag 14 (p34)}
\editstart
\\ \bf replace &
\cltxt
  The type {\clkwd standard-char} is a subtype of {\clkwd string-char};
  {\clkwd string-char} is a subtype of {\clkwd character}.
\\ \bf with &
\cltxt
  The type {\clkwd base-character} is a subtype of
  {\clkwd character}.
  The type {\clkwd string-char} is implementation defined as either
  {\clkwd base-character} or {\clkwd character}.
\editend
\\
\edithead {\csdag 15 (p34)}
\editstart
\\ \bf replace &
\cltxt
  The type {\clkwd string} is a subtype of {\clkwd vector},
  for {\clkwd string} means {\clkwd (vector string-char)}.
\\ \bf with &
\cltxt
  The type {\clkwd string} is a subtype of {\clkwd vector},
  {\clkwd string} consists of vectors specialized by subtypes of
  {\clkwd character}.
\editend
\\
\edithead {\csdag 15 after (p34)}
\editstart
\\ \bf insert &
\cltxt
  The type {\clkwd base-string} means
  {\clkwd (vector base-character)}.
\editend
\\
\edithead {\csdag 15 after (p34)}
\editstart
\\ \bf insert &
\cltxt
  The type {\clkwd general-string} means
  {\clkwd (vector character)} and is a subtype of {\clkwd string}.
\editend
\\
\edithead {\csdag 20 (p34)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd (simple-array string-char (*))};
\\ \bf with &
\cltxt
  {\clkwd (and string simple-array)};
\editend
\\
\edithead {\csdag 20 after (p34)}
\editstart
\\ \bf insert &
\cltxt
  The type {\clkwd simple-base-string} means
  {\clkwd (simple-array base-character (*))} and
  is the most efficient string which can hold
  the standard characters. {\clkwd simple-base-string}
  is a subtype of {\clkwd base-string}.
\editend
\\
\edithead {\csdag 20 after (p34)}
\editstart
\\ \bf insert &
\cltxt
  The type {\clkwd simple-general-string} means
  {\clkwd (simple-array character (*))}.
  {\clkwd simple-general-string}
  is a subtype of {\clkwd general-string}.
\editend
\\
\edithead {\csdag 22 after (p34)}
\editstart
\\ \bf replace &
\cltxt
  The type {\clkwd simple-string} is a subtype of
  {\clkwd string}. (Note that although
  {\clkwd string}
  is a subtype of {\clkwd vector, simple-string} is not
  a subtype of {\clkwd simple-vector}.
\\ \bf with &
\cltxt
  The type {\clkwd simple-string} is a subtype of
  {\clkwd string}, {\clkwd simple-string} consists of
  simple vectors specialized by subtypes of
  {\clkwd character}. (Note that although
  {\clkwd string}
  is a subtype of {\clkwd vector, simple-string} is not
  a subtype of {\clkwd simple-vector}.
\editend


%----------------------------------------------------------------------
\setcounter{section}{3}
\section{Type Specifiers}                   % 4
%----------------------------------------------------------------------
\setcounter{subsection}{1}
\subsection{Type Specifier Lists} % 4.2.


\edithead {\csdag 8 Table 4-1 (alphabetic list) (p43)}
\editstart
\\ \bf remove &
\\ &
\cltxt
  {\clkwd standard-char}
\\ &
  {\clkwd string-char}
\editend
\\
\edithead {\csdag 8 Table 4-1 (alphabetic list) (p43)}
\editstart
\\ \bf insert &
\\ &
\cltxt
  {\clkwd base-character}
\\ &
  {\clkwd base-string}
\\ &
  {\clkwd general-string}
\\ &
  {\clkwd simple-base-string}
\\ &
  {\clkwd simple-general-string}
\editend

\setcounter{subsection}{2}
\subsection{Predicating Type Specifiers} % 4.3.

\edithead {\csdag 2 (p43)}
\editstart
\\ \bf delete &
\cltxt
  As an example, the entire ...
\editend
\\
\edithead {\csdag 3 delete example (p43)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (deftype string-char () } ...
\editend

\setcounter{subsection}{4}
\subsection{Type Specifiers That Specialize} % 4.5.

\edithead {\csdag 5 after (p46)}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd (character {\em repertoire})}
\\  &
  This denotes a character type specialized to members
  of the specified repertoire.  {\em Repertoire} may be
  {\clkwd :base} or {\clkwd :standard} or any supported
  character registry name or a list of names.
\editend

\setcounter{subsection}{5}
\subsection{Type Specifiers That Abbreviate} % 4.6.

\edithead {\csdag 20 (p49)}
\editstart
\\ \bf replace &
\cltxt
  Means the same as {\clkwd (array string-char ({\em size}))}: the set of
  strings of
  the indicated size.
\\ \bf with &
\cltxt
  Means the union of the vector types specialized by subtypes of
  character
  and the indicated size.
  For the purpose of object creation, it is equivalent to
  {\clkwd (general-string ({\em size}))}.
\editend
\\
\edithead {\csdag 23 (p49)}
\editstart
\\ \bf replace &
\cltxt
  Means the same as {\clkwd (simple-array string-char ({\em size}))}: the
  set of simple strings of the indicated size.
\\ \bf with &
\cltxt
  Means the union of the simple vector types specialized by subtypes of
  character and the indicated size.
  For the purpose of object creation, it is equivalent to
  {\clkwd (simple-general-string ({\em size}))}.
\editend
\\
\edithead {\csdag 23 after (p49)}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd (base-string {\em size})}
\\ &
  Means the same as {\clkwd (array base-character ({\em size}))}: the
  set of base strings of the indicated size.
\\ &
  {\clkwd (simple-base-string {\em size})}
\\ &
  Means the same as {\clkwd (simple-array base-character ({\em size}))}:
  the set of simple base strings of the indicated size.
\editend
\\
\edithead {\csdag 23 after (p49)}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd (general-string {\em size})}
\\ &
  Means the same as {\clkwd (array character ({\em size}))}: the
  set of base strings of the indicated size.
\\ &
  {\clkwd (simple-general-string {\em size})}
\\ &
  Means the same as
  {\clkwd (simple-array general-character ({\em size}))}:
  the set of simple general strings of the indicated size.
\editend

\setcounter{subsection}{7}
\subsection{Type Conversion Function} % 4.8.

\edithead {\csdag 6 (p51)}
\editstart
\\ \bf replace &
\cltxt
  Some strings, symbols, and integers may be converted to
  characters.  If {\em object} is a string of length 1,
  then the sole element of the print name is returned.
  If {\em object} is a symbol whose print name is of length
  1, then the sole element of the print name is returned.
  If {\em object} is an integer {\em n}, then {\clkwd (int-char }
  {\em n}{\clkwd )} is returned.  See {\clkwd character}.
\\ \bf with &
\cltxt
  Some strings amd symbols may be converted to
  characters.  If {\em object} is a string of length 1,
  then the sole element of the print name is returned.
  If {\em object} is a symbol whose print name is of length
  1, then the sole element of the print name is returned.
  See {\clkwd character}.
\editend
\\
\edithead {\csdag 6 after (p52)}
\editstart
\\ \bf insert &
\begin{itemize}
\cltxt
\item Any string subtype may be converted to any other string
subtype, provided the new string can contain all actual
elements of the old string.  It is an error if it cannot.
\end{itemize}
\editend


%----------------------------------------------------------------------
\setcounter{section}{5}
\section{Predicates}                        % 6
%----------------------------------------------------------------------
\edithead {\csdag 2 (p71)}
\editstart
\\ \bf replace &
\cltxt
  but {\clkwd standard-char} begets {\clkwd standard-char-p}
\\ \bf with &
\cltxt
  but {\clkwd bit-vector} begets {\clkwd bit-vector-p}
\editend

\setcounter{subsection}{1}
\subsection{Data Type Predicates} % 6.2.

\setcounter{subsubsection}{1}
\subsubsection{Specific Data Type Predicates} % 6.2.2.

\edithead {\csdag 36 (p75)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd characterp} {\em object}
\\ \bf with &
\cltxt
  {\clkwd characterp} {\em object} \&{\clkwd optional}
  {\em repertoire}
\editend
\\
\edithead {\csdag 37 (p75)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd characterp} is true if its argument is a character,
  and otherwise is false.
\\ \bf with &
\cltxt
  If {\em repertoire} is omitted, {\clkwd characterp}
  is true if its argument is a character object,
  and otherwise is false.
  If a {\em repertoire} argument is specified,
  {\clkwd characterp} is true if its argument
  is a character object and a member of the specified repertoire,
  and
  otherwise is false.
  For example, {\clkwd (characterp  \#$\backslash$A}
  {\clkwd :Latin)}
  is true since \#$\backslash$A is a member of the
  Latin character registry.  {\em repertoire} may be any supported
  character registry name or the names
  {\clkwd :base} or {\clkwd :standard}. {\clkwd (characterp x :base)} is
  true if its argument is a member of the base character
  repertoire and false
  otherwise.
  {\clkwd (characterp x :standard)} is
  true if its argument is a member of the standard character
  subrepertoire and false
  otherwise.
\editend
\\
\edithead {\csdag 38 (p75)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd (characterp x) $\equiv$ (typep x \apostrophe character)}
\\ \bf with &
\cltxt
  {\clkwd (characterp x :standard) $\equiv$ (typep x \apostrophe
  (character :standard)}
\editend
\\
\edithead {\csdag 72 (p76)}
\editstart
\\ \bf replace &
\cltxt
  See also {\clkwd standard-char-p, string-char-p, streamp,}
\\ \bf with &
\cltxt
  See also {\clkwd standard-char-p, streamp,}
\editend

\setcounter{subsubsection}{2}
\subsubsection{Equality Predicates} % 6.2.3.

\edithead {\csdag 75 (p81)}
\editstart
\\ \bf replace &
\cltxt
  which ignores alphabetic case and certain other attributes
  of characters;
\\ \bf with &
\cltxt
  which ignores alphabetic case
  of characters;
\editend

%----------------------------------------------------------------------
\setcounter{section}{6}
\section{Control Structure}                 % 7
%----------------------------------------------------------------------

\setcounter{subsection}{1}
\subsection{Generalized Variables} % 7.2.

\edithead {\csdag 19 modify table (p95)}
\editstart
\\ \bf replace &
\cltxt
  char               string-char
\\ &
  schar              string-char
\\ \bf with &
\cltxt
  char               character
\\ &
  schar              character
\editend
\\
\edithead {\csdag 22 table entry (p96)}
\editstart
\\ \bf delete &
\cltxt
  char-bit           first                  set-char-bit
\editend

%----------------------------------------------------------------------
\setcounter{section}{9}
\section{Symbols}                           % 10
%----------------------------------------------------------------------

\edithead {\csdag 3 (p163)}
\editstart
\\ \bf replace &
\cltxt
  It is ordinarily not permitted to alter a symbol's print name.
\\ \bf with &
\cltxt
  It is an error to alter a symbol's print name.
\editend

\setcounter{subsection}{1}
\subsection{The Print Name} % 10.2.

\edithead {\csdag 5 (p168)}
\editstart
\\ \bf replace &
\cltxt
  It is an extremely bad idea
\\ \bf with &
\cltxt
  It is an error and an extremely bad idea
\editend

%----------------------------------------------------------------------
\setcounter{section}{10}
\section{Packages}                           % 11
%----------------------------------------------------------------------

\setcounter{subsection}{6}
\subsection{Package System Functions and Variables} % 11.7.

\edithead {\csdag 31 (p184,intern)}
\editstart
\\ \bf append &
\cltxt
  All strings, base and extended, are acceptable {\em string}
  arguments.
\editend

%----------------------------------------------------------------------
\setcounter{section}{12}
\section{Characters}                        % 13
%----------------------------------------------------------------------


\edithead {\csdag 6 after (p233)}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd char-code-limit}   [{\clkwd Constant}]
\\ &
  The value of {\clkwd char-code-limit} is a non-negative integer
  that is the upper exclusive bound on values produced by the
  function {\clkwd char-code}, which returns the {\em code}
  of a given character; that is, the values returned by
  {\clkwd char-code} are non-negative and strictly less than
  the value of {\clkwd char-code-limit}.
  There may be unassigned codes between 0 and
  {\clkwd char-code-limit} which
  are not legal arguments to {\clkwd code-char}.
\\  &
\cltxt
  {\clkwd *all-character-registry-names*}   [{\clkwd Variable}]
\\ &
  The value of {\clkwd *all-character-registry-names*} is a list of
  all character registry names supported by the implementation.
\editend


\setcounter{subsection}{0}
\subsection{Character Attributes} % 13.1.

\edithead {\csdag replace entire section (p233)}
\editstart
\\ \bf with &
\cltxt
  Earlier versions of Common LISP incorporated {\em font} and
  {\em bits} as attributes of character objects.  These are
  considered implementation-defined attributes and
  if supported by an implementation
  effect the action of selected functions.  In particular,
  the following effects are noted:
\\ &
\begin{itemize}
\item Attributes, such as those
  dealing with how the character is displayed or its typography,
  are not part of the character code.
  For example, bold-face, color
  or size are not considered part of the character code.
\item If two characters differ in any attributes,
  then they are not {\clkwd char=}.
\item If two characters have identical
  attributes, then their ordering by
  {\clkwd char}$<$ is consistent with the numerical ordering by the
  predicate $<$ on
  their code attributes. (Similarly for {\clkwd char}$>$,
  {\clkwd char}$>=$ and {\clkwd char}$<=$.)
\item The effect, if any, on {\clkwd char-equal} of each
  attribute has to be specified as part of
  the definition of that attribute.
\item The effect of {\clkwd char-upcase} and {\clkwd char-downcase}
  is to preserve attributes.
\item The function {\clkwd char-int} is equivalent to {\clkwd char-code}
  if no attributes are associated with
  the character object.
\item The function {\clkwd int-char} is equivalent to {\clkwd code-char}
  if no attributes are associated with
  the character object.
\item It is implementation dependent whether characters within
  double quotes have attributes removed.
\item  It is implementation dependent whether
  attributes are removed from symbol names by {\clkwd read}.
\end{itemize}
\editend

\setcounter{subsection}{1}
\subsection{Predicates on Characters} % 13.2.


\edithead {\csdag 3 (p234)}
\editstart
\\ \bf replace &
\cltxt
  argument is a "standard character" that is, an object of type
  {\clkwd standard-char}.
   Note that any character with a non-zero {\em bits} or {\em font}
   attribute
   is non-standard.
\\ \bf with &
\cltxt
  argument is one of the Common LISP standard character subrepertoire.
\editend
\\
\edithead {\csdag 4 (p234)}
\editstart
\\ \bf delete &
\cltxt
  Note that any character with non-zero ...
\editend
\\
\edithead {\csdag 6 (p235)}
\editstart
\\ \bf replace &
\cltxt
  Of the standard characters all but \#$\backslash${\clkwd Newline}
  are graphic.
  The semi-standard characters \#$\backslash${\clkwd Backspace},
  \#$\backslash${\clkwd Tab},
  \#$\backslash${\clkwd Rubout},
  \#$\backslash${\clkwd Linefeed},
  \#$\backslash${\clkwd Return},
  and \#$\backslash${\clkwd Page} are not graphic.
\\ \bf with &
\cltxt
  Of the standard characters all but \#$\backslash${\clkwd Newline}
  are graphic.
\editend
\\
\edithead {\csdag 7 (p235)}
\editstart
\\ \bf delete &
\cltxt
  Programs may assume that graphic ...
\editend
\\
\edithead {\csdag 8 (p235)}
\editstart
\\ \bf delete &
\cltxt
  Any character with a non-zero bits...
\editend
\\
\edithead {\csdag 9 (p235)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd string-char-p} ...
\editend
\\
\edithead {\csdag 10 (p235)}
\editstart
\\ \bf delete &
\cltxt
  The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 13 (p235)}
\editstart
\\ \bf replace &
\cltxt
  If a character is alphabetic, then it is perforce graphic.  Therefore
  any character
  with a non-zero bits attribute cannot be alphabetic.  Whether a
  character is
  alphabetic is may depend on its font number.
\\ \bf with &
\cltxt
  If a character is alphabetic, then it is perforce graphic.
\editend
\\
\edithead {\csdag 22 (p236)}
\editstart
\\ \bf replace &
\cltxt
  If a character is either uppercase or lowercase, it is necessarily
  alphabetic (and
  therefore is graphic, and therefore has a zero bits attribute).
  However, it is permissible in theory for an alphabetic character
  to be neither
  uppercase nor lowercase (in a non-Roman font, for example).
\\ \bf with &
\cltxt
  If a character is either uppercase or lowercase, it is necessarily
  alphabetic (and
  therefore is graphic).
\editend
\\
\edithead {\csdag 25 (p236)}
\editstart
\\ \bf replace &
\cltxt
  The argument {\em char} must be a character object, and {\em radix}
  must be a non-negative
  integer. If {\em char} is not a digit of the radix specified
\\ \bf with &
\cltxt
  The argument {\em char} must be in the standard character
  subrepertoire and
  {\em radix} must be a non-negative integer.
  If {\em char} is not a standard character or is not a digit of the
  radix specified
\editend
\\
\edithead {\csdag 51 (p237)}
\editstart
\\ \bf delete &
\cltxt
  If two characters have the same bits ...
\editend
\\
\edithead {\csdag 52 (p237)}
\editstart
\\ \bf replace &
\cltxt
  If two characters differ in any attribute (code, bits, or font), then
  they are different.
\\ \bf with &
\cltxt
  If the codes of two characters differ, then
  they are different.
\editend
\\
\edithead {\csdag 94 (p239)}
\editstart
\\ \bf replace &
\cltxt
  The predicate {\clkwd char-equal} is like {\clkwd char=}, and
  similarly for the others, except
  according to a different ordering such that differences of bits
  attributes and case are ignored, and font information is taken into
  account in an implementation dependent manner.
\\ \bf with &
\cltxt
  The predicate {\clkwd char-equal} is like {\clkwd char=}, and
  similarly for the others, except
  according to a different ordering such that differences of case
  are ignored.
\editend
\\
\edithead {\csdag 97 example (p239)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (char-equal \#$\backslash$A \#$\backslash$Control-A) is true}
\editend
\\
\edithead {\csdag 98 (p239)}
\editstart
\\ \bf delete &
\cltxt
  The ordering may depend on the font ...
\editend

\setcounter{subsection}{2}
\subsection{Character Construction and Selection} % 13.3.

\edithead {\csdag 3 (p239)}
\editstart
\\ \bf replace &
\cltxt
  The argument {\em char} must be a character object.
  {\clkwd char-code} returns the {\em code} attribute of the
  character object;
  this will be a non-negative integer less than the (normal) value
\\ \bf with &
\cltxt
  The argument {\em char} must be a character object.
  {\clkwd char-code} returns the {\em code} of the
  character object;
  this will be a non-negative integer less than the value
\editend
\\
\edithead {\csdag 4 (p240)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd char-bits } ...
\editend
\\
\edithead {\csdag 5 (p240)}
\editstart
\\ \bf delete &
\cltxt
  The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 6 (p240)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd char-font } ...
\editend
\\
\edithead {\csdag 7 (p240)}
\editstart
\\ \bf delete &
\cltxt
  The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 8 (p240)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd code-char {\em code} \&optional {\em (bits 0) (font 0)}
  [{\em Function}]}
\\ \bf with &
\cltxt
  {\clkwd code-char {\em code}
  [{\em Function}]}
\editend
\\
\edithead {\csdag 9 (p240)}
\editstart
\\ \bf replace &
\cltxt
  All three arguments must be non-negative integers.  If it is possible
  in the
  implementation to construct a character object whose code attribute
  is {\em code},
  whose
  bits attribute is {\em bits}, and whose font attribute is {\em font},
  then such an object
  is returned;
\\ \bf with &
\cltxt
  The argument must be a non-negative integer.  If it is possible
  in the
  implementation to construct a character object identified by
  {\em code},
  then such an object is returned;
\editend
\\
\edithead {\csdag 10 (p240)}
\editstart
\\ \bf replace &
\cltxt
  For any integers, {\em c, b,} and {\em f}, if {\clkwd (code-char
  {\em c b f})} is
\\ \bf with &
\cltxt
  For any integer, {\em c}, if {\clkwd (code-char
  {\em c})} is
\editend
\\
\edithead {\csdag 12 (p240)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (char-bits (code-char } ...
\editend
\\
\edithead {\csdag 13 (p240)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (char-font (code-char } ...
\editend
\\
\edithead {\csdag 14 (p240)}
\editstart
\\ \bf delete &
\cltxt
  If the font and bits attributes ...
\editend
\\
\edithead {\csdag 15 (p240)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (char= (code-char (char-code ...}
\editend
\\
\edithead {\csdag 16 (p240)}
\editstart
\\ \bf delete &
\cltxt
  is true.
\editend
\\
\edithead {\csdag 17 (p240)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd make-char} ...
\editend
\\
\edithead {\csdag 18 (p240)}
\editstart
\\ \bf delete &
\cltxt
 The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 19 (p240)}
\editstart
\\ \bf delete &
\cltxt
 If {\em bits} or {\em font} are zero ...
\editend
\\
\edithead {\csdag 19 (p240)}
\editstart
\\ \bf append &
\cltxt
  {\clkwd find-char} {\em label registry}    [{\em Function}]
\\ &
  {\clkwd find-char} returns a character object.
  The arguments {\em label} and {\em registry} are names
  (objects coerceable to strings as if by the function {\clkwd string})
  of character registries and labels.
  {\em label}
  uniquely identifies a character within the character
  registry named {\em registry}.
  If the implementation does not support the specified
  character, {\clkwd nil} is returned.
\editend

\setcounter{subsection}{3}
\subsection{Character Conversions} % 13.4.

\edithead {\csdag 8 (p241)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd char-upcase} returns a character object with the same
  font and bits attributes as {\em char}, but with possibly a
  different code attribute.
\\ \bf with &
\cltxt
  {\clkwd char-upcase} returns a character object with possibly
  a different code.
\editend
\\
\edithead {\csdag 10 (p241)}
\editstart
\\ \bf replace &
\cltxt
  Similarly, {\clkwd char-downcase} returns a character object with the
  same font and bits attributes as {\em char}, but with possibly a
  different code attribute.
\\ \bf with &
\cltxt
  Similarly, {\clkwd char-downcase} returns a character object with
  possibly a different code.
\editend
\\
\edithead {\csdag 12 (p241)}
\editstart
\\ \bf delete &
\cltxt
  Note that the action of ...
\editend
\\
\edithead {\csdag 13 (p241)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd digit-char {\em weight} \&optional ({\em radix} 10)
  ({\em font} 0)      [{\em Function}]}
\\ \bf with &
\cltxt
  {\clkwd digit-char {\em weight} \&optional ({\em radix} 10)
       [{\em Function}]}
\editend
\\
\edithead {\csdag 14 (p241)}
\editstart
\\ \bf replace &
\cltxt
  All arguments must be integers.  {\clkwd digit-char} determines
  whether or not it is
  possible
  to construct a character object whose font attribute is {\em font},
  and whose {\em code}
\\ \bf with &
\cltxt
  All arguments must be integers.  {\clkwd digit-char} determines
  whether or not it is
  possible to construct a character object whose {\em code}
\editend
\\
\edithead {\csdag 15 (p242)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd digit-char} cannot return {\clkwd nil} if {\em font}
  is zero, {\em radix}
\\ \bf with &
\cltxt
  {\clkwd digit-char} cannot return {\clkwd nil}.
  {\em radix}
\editend
\\
\edithead {\csdag 22 (p242)}
\editstart
\\ \bf delete &
\cltxt
  Note that no argument is provided for ...
\editend
\\
\edithead {\csdag 23 through 30 (p242, char-int, int-char)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd char-int} {\em char}
\editend
\\
\edithead {\csdag 32 (p242)}
\editstart
\\ \bf replace &
\cltxt
  All characters that have zero font and bits attributes and that are
  non-graphic
\\ \bf with &
\cltxt
  All characters that are
  non-graphic
\editend
\\
\edithead {\csdag 33 (p243)}
\editstart
\\ \bf replace &
\cltxt
  The standard newline and space characters have the respective
  names {\clkwd Newline} and {\clkwd Space}.  The semi-standard
  characters have the names {\clkwd Tab, Page, Rubout, Linefeed,
  Return,} and {\clkwd Backspace}.
\\ \bf with &
\cltxt
  The standard newline and space characters have the respective
  names {\clkwd Newline} and {\clkwd Space}.
\editend
\\
\edithead {\csdag 35 (p243)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd char-name} will only locate "simple" ...
\editend
\\
\edithead {\csdag 36 (p243)}
\editstart
\\ \bf append &
\cltxt
  {\clkwd name-char} may accept other names for characters
  in addition to those returned by {\clkwd char-name}.
\editend
\\
\edithead {\csdag 36 (p243)}
\editstart
\\ \bf append &
\cltxt
  {\clkwd char-registry-name} {\em char}    [{\em Function}]
\\ &
  {\clkwd char-registry-name} returns a string representing
  the character registry to which {\em char} belongs.
\editend
\\
\edithead {\csdag 36 (p243)}
\editstart
\\ \bf append &
\cltxt
  {\clkwd char-label} {\em char}    [{\em Function}]
\\ &
  {\clkwd char-label} returns a string representing
  the character label of {\em char}.
\editend
\\
\edithead {\csdag 36 (p243)}
\editstart
\\ \bf append &
\cltxt
  {\clkwd char-ccs-value} {\em char name}    [{\em Function}]
\\ &
  {\clkwd char-ccs-value} returns the non-negative integer
  representing the encoding of the character {\em char} in
  The coded character set named by {\em name}.
  If the implementation does not support the specified
  coded character set, {\clkwd nil} is returned.  If the
  named coded character set does not contain the character,
  {\clkwd nil} is returned.
\editend

\setcounter{subsection}{4}
\subsection{Character Control-Bit Functions} % 13.5.

\edithead {\csdag delete entire section (p243)}
\editstart
\editend

%----------------------------------------------------------------------
\setcounter{section}{13}
\section{Sequences}                         % 14
%----------------------------------------------------------------------
\setcounter{subsection}{0}
\subsection{Simple Sequence Functions}         % 14.1

\edithead {\csdag 21 (p249,make-sequence)}
\editstart
\\ \bf append &
\cltxt
  If type {\clkwd string} is specified, the result is
  equivalent to {\clkwd make-string}.
\editend

%----------------------------------------------------------------------
\setcounter{section}{17}
\section{Strings}                           % 18
%----------------------------------------------------------------------

\edithead {\csdag 1 (p299)}
\editstart
\\ \bf replace &
\cltxt
  Specifically, the type {\clkwd string} is identical to the type
  {\clkwd (vector string-char),}
  which in turn is the same as {\clkwd (array string-char (*))}.
\\ \bf with &
\cltxt
  Specifically, the type {\clkwd string} is a subtype of
  {\clkwd vector}
  and consists of vectors specialized by subtypes of {\clkwd character}.
\editend

\setcounter{subsection}{0}
\subsection{String Access}  % 18.1.
\edithead {\csdag 4 (p300)}
\editstart
\\ \bf replace &
\cltxt
  character object.  (This character will necessarily satisfy the
  predicate
  {\clkwd string-char-p}).
\\ \bf with &
\cltxt
  character object.
\editend
\\
\edithead {\csdag 9 (p300)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd setf} may be used with {\clkwd char} to destructively
  replace a character within a string.
\\ \bf with &
\cltxt
  {\clkwd setf} may be used with {\clkwd char} to destructively
  replace a character within a string.
  The new character must be of a type which can be stored in the
  string; it is an error otherwise.
\editend

\setcounter{subsection}{2}
\subsection{String Construction and Manipulation}  % 18.3.

\edithead {\csdag 2 (p302)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd make-string {\em size} \&key :initial-element  [{\em Function}]}
\\ \bf with &
\cltxt
  {\clkwd make-string {\em size} \&key :initial-element  :element-type
  [{\em Function}]}
\editend
\\
\edithead {\csdag 3 (p302,make-string)}
\editstart
\\ \bf replace &
\cltxt
  This returns a string (in fact a simple string) of length {\em size},
  each of whose characters has been initialized to the
  {\clkwd :initial-element} argument.  If an {\clkwd :initial-element}
  argument is not specified, then the string will be initialized
  in an implementation-dependent way.
\\ \bf with &
\cltxt
  This returns a string of length {\em size},
  each of whose characters has been initialized to the
  {\clkwd :initial-element} argument.  If an {\clkwd :initial-element}
  argument is not specified, then the string will be initialized
  in an implementation-dependent way.
  The {\clkwd :element-type} argument names the type of the elements
  of the string; a string is constructed of the most specialized
  type that can accommodate elements of the given type.
  If {\clkwd :element-type} is omitted, the type
  {\clkwd character} is the default.
\editend
\\
\edithead {\csdag 5 (p302,make-string)}
\editstart
\\ \bf replace &
\cltxt
  A string is really just a one-dimensional array of "string
  characters" (that is,
  those characters that are members of type {\clkwd string-char}).
  More complex character arrays may be constructed using the function
  {\clkwd make-array}.
\\ \bf with &
\cltxt
  More complex character arrays may be constructed using the function
  {\clkwd make-array}.
\editend
\\
\edithead {\csdag 29 (p304,make-string)}
\editstart
\\ \bf replace &
\cltxt
  If {\em x} is a string character (a character of type
  {\clkwd string-char}), then
\\ \bf with &
\cltxt
  If {\em x} is a character, then
\editend

%----------------------------------------------------------------------
\setcounter{section}{21}
\section{Input/Output}                      % 22

\setcounter{subsection}{0}
\subsection{Printed Representation of LISP Objects}  % 22.1.

\setcounter{subsubsection}{0}
\subsubsection{What the Read Function Accepts}  % 22.1.1.

\edithead {\csdag Table 22-1: Standard Character Syntax Types (p336)}
\editstart
\\ \bf delete entry &
\cltxt
  {\clkwd <tab>} {\em whitespace}
\\ &
  {\clkwd <page>} {\em whitespace}
\\ &
  {\clkwd <backspace>} {\em constituent}
\\ &
  {\clkwd <return>} {\em whitespace}
\\ &
  {\clkwd <rubout>} {\em constituent}
\\ &
  {\clkwd <linefeed>} {\em whitespace}
\editend

\setcounter{subsubsection}{1}
\subsubsection{Parsing of Numbers and Symbols}  % 22.1.2.

\edithead {\csdag Table 22-3: Standard Constituent Character
Attributes (p340)}
\editstart
\\ \bf delete entry &
\cltxt
  {\clkwd <backspace>} {\em illegal}
\\  &
  {\clkwd <tab>} {\em illegal}
\\  &
  {\clkwd <linefeed>} {\em illegal}
\\  &
  {\clkwd <page>} {\em illegal}
\\  &
  {\clkwd <return>} {\em illegal}
\\  &
  {\clkwd <rubout>} {\em illegal}
\editend

\setcounter{subsubsection}{3}
\subsubsection{Standard Dispatching Macro Character Syntax}  % 22.1.4.

\edithead {\csdag Table 22-4: Standard \# Macro Character Syntax (p352)}
\editstart
\\ \bf delete entry &
\cltxt
  {\clkwd \#<backspace>} {\em signals error}
\\  &
  {\clkwd \#<tab>} {\em signals error}
\\  &
  {\clkwd \#<linefeed>} {\em signals error}
\\  &
  {\clkwd \#<page>} {\em signals error}
\\  &
  {\clkwd \#<return>} {\em signals error}
\\  &
  {\clkwd \#<rubout>} {\em undefined}
\editend
\\
\edithead {\csdag 8 (p353)}
\editstart
\\ \bf replace &
\cltxt
  The following names are standard across all implementations:
\\ \bf with &
\cltxt
  All non-graphic
  characters, including extended characters, are uniquely
  named in an implementation-dependent manner.
  In particular, an implementation may support names of the
  form {\em label:registry}.
  The following names are standard across all implementations:
\editend
\\
\edithead {\csdag 11 through 18 inclusive delete (p353)}
\editstart
\\ \bf delete &
\cltxt
  The following names are semi-standard; ...
\editend
\\
\edithead {\csdag 20 through 26 inclusive delete (p354)}
\editstart
\\ \bf delete &
\cltxt
  The following convention is used in implementations ...
\editend
\\
\edithead {\csdag 108 (p360)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd \#<space>, \#<tab>, \#<newline>, \#<page>, \#<return>}
\\ \bf with &
\cltxt
  {\clkwd \#<space>, \#<newline>}
\editend

\setcounter{subsubsection}{4}
\subsubsection{The Readtable}  % 22.1.5.

\edithead {\csdag 3 (p360)}
\editstart
\\ \bf replace &
\cltxt
  Even if an implementation supports characters with non-zero
  {\em bits} and {\em font}
  attributes, it need not (but may) allow for such characters to
  have syntax
  descriptions
  in the readtable.  However, every character of type
  {\clkwd string-char}
  must be represented in the readtable.
\\ \bf with &
\cltxt
  All base and extended characters
  are representable in the readtable.
\editend

\setcounter{subsubsection}{5}
\subsubsection{What the Print Function Produces}  % 22.1.6.

\edithead {\csdag 13 (p366)}
\editstart
\\ \bf replace &
\cltxt
  is used.  For example, the printed representation of the character
  \#$\backslash$A
  with control
  and meta bits on would be \#$\backslash${\clkwd CONTROL-META-A},
  and that of
  \#$\backslash$a with control and meta bits on would be
  \#$\backslash${\clkwd CONTROL-META-$\backslash$a}.
\\ \bf with &
\cltxt
  is used (see 22.1.4).
\editend

\setcounter{subsection}{2}
\subsection{Output Functions}  % 22.3.

\setcounter{subsubsection}{0}
\subsubsection{Output to Character Streams}  % 22.3.1.

\edithead {\csdag 26 (p384)}
\editstart
\\ \bf replace &
\cltxt
  ({\em not} the substring delimited by {\clkwd :start} and
  {\clkwd :end}).
\\ \bf with &
  ({\em not} the substring delimited by {\clkwd :start} and
  {\clkwd :end}).
  Only characters which are members of the coded character set(s)
  associated with the output stream or \#$\backslash${\clkwd Newline}
  are valid to be written;
  it is an error otherwise.  All character streams must provide
  appropriate line division behavior for
  \#$\backslash${\clkwd Newline}.
\editend
\\
\edithead {\csdag 27 after (p384)}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd external-coded-string-length} {\em object} \&{\clkwd optional}
  {\em output-stream}   [{\em Function}]
\\  &
  {\clkwd external-coded-string-length}
  returns the number of implementation defined
  units required for the object on the output-stream. If
  not applicable to the output stream, the function
  returns {\clkwd nil}.
  This number corresponds to the current state of the stream
  and may change if there has been intervening output.
  If the output stream is not specified {\clkwd *standard-output*}
  is the default.
\editend

\setcounter{subsubsection}{2}
\subsubsection{Formatted Output to Character Streams}  % 22.3.3.

\edithead {\csdag 23 delete example (p387)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (format nil "Type} $\tilde{ }$
  {\clkwd :C to $\tilde{ }$ :A."} . . .
\editend
\\
\edithead {\csdag 66 (p389)}
\editstart
\\ \bf replace &
\cltxt
  $\tilde{ }${\clkwd :C} spells out the names of the control bits and
  represents non-printing
  characters by their names: {\clkwd Control-Meta-F, Control-Return,
  Space}.
  This is a "pretty" format for printing characters.
\\ \bf with &
\cltxt
  $\tilde{ }${\clkwd :C}
  represents non-printing
  characters by their names: {\clkwd Newline,
  Space}.  This is a "pretty" format
  for printing characters.
\editend
%----------------------------------------------------------------------

%----------------------------------------------------------------------
\setcounter{section}{22}
\section{File System Interface}             % 23

\setcounter{subsection}{1}
\subsection{Opening and Closing Files}  % 23.2.

\edithead {\csdag 2 (p418)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd open {\em filename} \&key :direction :element-type}
  {\clkwd :if-exists :if-does-not-exist}
  [{\em Function}]
\\ \bf with &
\cltxt
  {\clkwd open {\em filename} \&key :direction :element-type}
  {\clkwd
  :external-coded-character-format}
  {\clkwd :if-exists :if-does-not-exist}
  [{\em Function}]
\editend
\\
\edithead {\csdag 11 (p419)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd string-char}
\\  &
  The unit of transaction is a string-character.  The functions
  {\clkwd read-char}
  and/or {\clkwd write-char} may be used on the stream.
\\ \bf with &
\cltxt
  The default value of {\clkwd :element-type} is
  implementation-defined as character or a subtype of character.
\\  &
  {\clkwd base-character}
\\  &
  The unit of transaction is a base character.  The functions
  {\clkwd read-char}
  and/or {\clkwd write-char} may be used on the stream.
\editend
\\
\edithead {\csdag 16 (p419)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd character}
\\  &
  The unit of transaction is any character, not just a string-character.
  The functions {\clkwd read-char} and/or {\clkwd write-char} may
  be used on the stream.
\\ \bf with &
\cltxt
  {\clkwd character}
\\  &
  The unit of transaction is any character.
  The functions {\clkwd read-char} and/or {\clkwd write-char} may
  be used on the stream.
\editend
\\
\edithead {\csdag 19 after (p420)}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd :external-coded-character-format}
\\  &
This argument specifies a name or list of
names(s) indicating an implementation recognized scheme for
representing 1 or more coded character sets with non-homogeneous codes.
\\  &
The default value is {\clkwd :default} and is
implementation defined but must include the
base characters.
\\  &
As many coded character set names must be provided as the
implementation requires for that external coding convention.
\\  &
References to standard ISO coded character set names must
include the full ISO reference number and approval year.
The following are valid ISO reference names:
:ISO8859/1-1987, :ISO6937/2-1983, :ISO646-1983, etc..
All implementation recognized schemes are formed from
{\clkwd standard-p} characters.
\editend
%----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\begin{thebibliography}{wwwwwwww 99}


\bibitem[Ida87]{ida87} M. Ida, et al.,
{\em
JEIDA Common LISP Committee Proposal on Embedding Multi-Byte Characters
},
ANSI X3J13 document 87-022, (1987).

\bibitem[ISO 646]{iso646} ISO,
{\em
Information processing -- ISO 7-bit coded character set
for information interchange
},
ISO (1983).

\bibitem[ISO 4873]{iso4873} ISO,
{\em
Information processing -- ISO 8-bit code for information
interchange -- Structure and rules for implementation
},
ISO (1986).

\bibitem[ISO 6937/1]{iso6937/1} ISO,
{\em
Information processing -- Coded character sets for text
communication -- Part 1: General introduction
},
ISO (1983).

\bibitem[ISO 6937/2]{iso6937/2} ISO,
{\em
Information processing -- Coded character sets for text
communication -- Part 2: Latin alphabetic and non-alphabetic
graphic characters
},
ISO (1983).

\bibitem[ISO 8859/1]{iso8859/1} ISO,
{\em
Information processing -- 8-bit single-byte coded
graphic character sets -- Part 1: Latin alphabet No. 1
},
ISO (1987).

\bibitem[ISO 8859/2]{iso8859/2} ISO,
{\em
Information processing -- 8-bit single-byte coded
graphic character sets -- Part 2: Latin alphabet No. 2
},
ISO (1987).

\bibitem[ISO 8859/6]{iso8859/6} ISO,
{\em
Information processing -- 8-bit single-byte coded
graphic character sets -- Part 6: Latin/Arabic alphabet
},
ISO (1987).

\bibitem[ISO 8859/7]{iso8859/7} ISO,
{\em
Information processing -- 8-bit single-byte coded
graphic character sets -- Part 7: Latin/Greek alphabet
},
ISO (1987).

\bibitem[Kerns87]{kerns87} R. Kerns,
{\em
Extended Characters in Common LISP
},
X3J13 Character Subcommittee document, Symbolics Inc (1987).

\bibitem[Kurokawa88]{kurokawa88} T. Kurokawa, et al.,
{\em
Technical Issues on International Character Set Handling in Lisp
},
ISO/IEC SC22 WG16 document N33, (1988).

\bibitem[Linden87]{linden87} T. Linden,
{\em
Common LISP - Proposed Extensions for International Character Set
Handling
},
Version 01.11.87, IBM Corporation (1987).

\bibitem[Steele84]{steele84} G. Steele Jr.,
{\em
Common LISP: the Language
},
Digital Press (1984).

\bibitem[Xerox87]{xerox87} Xerox,
{\em
Character Code Standard, Xerox System Integration Standard
},
Xerox Corp. (1987).

\end{thebibliography}

\end{document}             % End of document.


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 22 Feb 89 17:06:06 EST
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Feb 89  13:28:49 PST
Date: Wed, 22 Feb 89 00:13:28 PST
From: Thom Linden <baggins@IBM.com>
To: Common Lisp mailing <common-lisp@sail.stanford.edu>
Message-ID: <890222.001328.baggins@almvma>
Subject: cs proposal part1


\documentstyle{report}     % Specifies the document style.

\pagestyle{headings}

\title{\bf
Extensions to Common LISP to Support International
Character Sets}
\author{
Michael Beckerle\thanks{Gold Hill Computers} \and
Paul Beiser\thanks{Hewlett-Packard} \and
Jerry Duggan\thanks{Hewlett-Packard} \and
Robert Kerns\thanks{Independent consultant} \and
Kevin Layer\thanks{Franz, Inc.} \and
Thom Linden\thanks{IBM Research, Subcommittee Chair} \and
Larry Masinter\thanks{Xerox Research} \and
David Unietis\thanks{Lucid, Inc.}
}
\date{February 21, 1989} % Deleting this command produces today's date.

\begin{document}

\maketitle                 % Produces the title.

\setcounter{secnumdepth}{4}

\setcounter{tocdepth}{4}
\tableofcontents


%----------------------------------------------------------------------
%----------------------------------------------------------------------
\newfont{\cltxt}{cmr10}
\newfont{\clkwd}{cmtt10}

\newcommand{\apostrophe}{\clkwd '}
\newcommand{\bq}{\clkwd\symbol{'22}}


%----------------------------------------------------------------------
%----------------------------------------------------------------------
\chapter{Introduction}

This is a proposal to the X3 J13 committee
for both extending and modifying the Common LISP
language definition to provide a standard basis for Common LISP
support of the variety of characters used to represent the
native languages of the international community.

This proposal was created by the Character Subcommittee of X3 J13.
We would like to acknowledge discussions with T. Yuasa and other
members of the JIS Technical Working Group,
comments from members of X3 J13,
and the proposals \cite{ida87},
\cite{linden87}, \cite{kerns87}, and \cite{kurokawa88} for
providing the motivation and direction for these extensions.
As all these documents and discussions were created
expressly for LISP standardization usage,
we have borrowed freely from their ideas as well as the texts
themselves.

This document is separated into two parts. The first part explains the
major language changes and their motivations. While intended as
commentary to a general audience, and not explicitly as
part of the standard document, the X3 J13 editor may
include sections at her/his discretion.  The second part,
Appendix A, provides
the page by page set of editorial changes to \cite{steele84}.
\section{Objectives}

The major objectives of this proposal are:
\begin{itemize}
\item To provide a consistent, well-defined scheme allowing support
of both very large character sets and multiple character sets.
\footnote{The distinction between the terms {\em character repertoire}
and {\em coded character set} is made later.  The usage
of the term {\em character set},
avoided after this introduction, encompasses both terms.}

Many software applications are intended for international use, or
have requirements for incorporation of language elements of multiple
native languages within a single application.
Also, many applications require specialized languages including,
for example, scientific and typesetting symbols.
In order
to ensure some portability of these applications, data expressed in
a mixture of these
languages must be treated uniformly by the
software language.

All character and string manipulations should operate uniformly,
regardless of the character set(s) of the character objects.
This applies to array indexing, readtable definitions, read
symbol construction and I/O operations.


\item To ensure efficient performance of string and character
operations.

Many native
languages, such as Japanese and Chinese, use character
sets which contain more characters than the Latin alphabet.
Supporting larger sized character sets frequently means employing
larger data fields to uniquely encode each character.
Common LISP implementations using
larger sized character sets can
incur performance penalties in terms
of space, time, or both.

The use of large and/or multiple character sets by an
implementation
implies the need for a more complex character type representation.
Given a more complex character representation, the efficiency
of language operations on characters (e.g. string operations)
could be affected.

\item To assure forward compatibility of the proposed model
and definition with existing Common LISP implementations.

Developers should not be required to re-write large amounts of either
LISP code or data representations in order to apply the proposed
changes to existing implementations.
The proposed changes should provide an easy
portability path for existing code to many possible implementations.
\end{itemize}

There are a number of issues, some under the general rubric of
internationalization, which this proposal does {\em not} cover.
Among these issues are:
\begin{itemize}
\item Time and date formats
\item Monetary formats
\item Numeric punctuation
\item Fonts
\item Lexicographic orderings
\item Right-to-left and bidirectional languages
\end{itemize}

%----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\chapter{Overview}

We use several terms within this document which
are new in the context of Common LISP.
Definitions for the following prominent
terms are provided for the reader's convenience.

A {\em character repertoire} defines a collection of characters
independent of their specific rendered image or font.  This
corresponds to the mathematical notion of a {\em set}
\footnote{We avoid the term {\em character set} as it has been
(over)used in the context of character repertoire as well
as in the context of coded character set.}.
Character
repertoires are specified independent of coding and their characters
are only identified with a unique {\em character label},
a graphic symbol, and
a character description.

A {\em coded character set} is a character repertoire plus
an {\em encoding} providing a unique mapping between each character
and a number which serves as the character representation.
There are numerous internationally standardized coded character
sets; for example, \cite{iso8859/1} and \cite{iso646}.

A character may be included in one or more character repertoires.
Similarly, a character may be included in one or more
coded character sets.  For example, the Latin letter "A" is contained
in the coded character set standards: ISO 8859/1, ISO 8859/2,
ISO 6937/2, and others.

To universally identify each character, we define a unique
collection of repertoires called {\em character
registries} as a partitioning of all characters.
That is, each character is included
in one and only one character registry.

In Common LISP a {\em character} data object is identified by its
{\em character code}, a unique numerical code.
Each character code is composed from
a character registry and a character label.

Character data objects which are classified as {\em graphic},
or displayable, are each associated with a {\em glyph}.  The
glyph is the visual representation of the character.

The primary purpose of introducing these terms is to provide a
consistent naming to Common LISP concepts which are related
to those found in ISO standardization of coded
character sets.
\footnote{The bibliography includes several relevant ISO
coded character set standards.}
They also serve as a demarcation between these
standardization activities.  For example, while Common LISP is free to
define unique manipulation facilities for characters, registries
and coded character sets, it should
not define standard coded character sets nor standard character
registries.

A secondary purpose is to detach the language specification from
underlying hardware representation.  From a language
specification viewpoint it is inconsequential whether
characters occupy one or more (8-bit) bytes or whether
a Common LISP implementation's
internal representation for characters is distinct from or identical
to any of the numerous
external representations (for example, the text interchange
representation \cite{iso6937/2}).
We specifically do not propose any standard coded character sets.

%----------------------------------------------------------------------
\section{Character Identity}


Characters are uniquely distinguished by their codes,
which are drawn from the set of
non-negative integers.  That is, within Common LISP
a unique numerical code
is assigned to each semantically different character.

It is important to separate the notion of glyph from the notion of
character data object when defining a scheme under which issues of
identity can be rigorously decided by a computer language.  Glyphs are
the visual aspects of characters, writable on surfaces, and sometimes
called 'graphics'.  A language specification valid for more than a
narrow range of systems can only make assumptions about the existence
of {\em abstract} glyphs (for example, the Latin letter A) and not about
glyph variants (for example, the italicized Latin letter {\em A})
or characteristics of display devices.  Thus, an important element of
this proposal is the removal of the {\em font} and {\em bits}
attributes from the language specification.
\footnote{These and other attributes may still be supported as
implementation-defined extensions.}
All functions
dealing with the {\em bits} and {\em font} attributes are either
removed or modified by this proposal.
The deleted functions and constants include:
{\em char-font-limit,
char-bits-limit,
int-char,
char-int,
char-bits,
char-font,
make-char,
char-control-bit,
char-meta-bit,
char-super-bit,
char-hyper-bit,
char-bit,
set-char-bit}.

The definition in \cite{steele84} of semi-standard characters has
been eliminated.  This is replaced by a more uniform approach
to character naming with the introduction of character registries
(see below).


%----------------------------------------------------------------------
\section{Character Naming}

A Common LISP program must be able to name, compose and decompose
characters in a uniform, portable manner, independent of any
underlying representation.  One possible composition is by
the pair $<$ coded character set standard, decimal representation $>$
\footnote{This syntax is for illustration only and is not being
proposed.}.
Thus, for example, one might compose the Latin 'A' with the pair
$<$ ISO8859/2-1987, 65 $>$,
$<$ ISO8859/6-1987, 65 $>$, or
$<$ ISO646-1983, 65 $>$, etc..  The difficulty here is two-fold.
First, there are several ways to compose the same character and
second, there may be multiple answers to
the question: {\em To what coded character set
does character object x belong?}.\footnote{Even
worse, the answer might change yearly.}
The identical problems occur if the pair
$<$ character repertoire standard, decimal representation $>$ is used.
\footnote{Existing ISO repertoires seem to be defined exclusively
in the context of coded character sets and not as standards
in their own right.}

The concept of character registry is introduced by this proposal
to resolve the problem of character naming, composition and
decomposition.
Each character is universally defined by the
pair $<$ character registry name, character label $>$. For this
to be a portable definition, it must have a standard meaning.
Thus we propose the formation of an ISO Working Group to
define an international
{\em Character Registry Standard}.
At this writing there is no existing Character Registry Standard nor
ISO Working Group organized to define such a standard.
\footnote{It is the intention of X3 J13 to promote and adopt
an eventual ANSI or ISO Character Registry Standard.  In particular, we
acknowledge that X3 J13 is {\em not} the appropriate forum to
define the standard.  We believe
it is a required component of all programming languages
providing support for international characters.}

Common LISP character codes are composed from a character registry and
a character label.  The convention by which a character label and
character registry compose a character code is implementation
dependent.

We introduce new functions {\clkwd find-char, char-registry-name,} and
{\clkwd char-label} to
compose and decompose character objects.  We also extend the
{\clkwd characterp} predicate to
support testing
membership of a character in a given character registry.
\footnote{
For example,
testing membership in the Japanese Katakana character registry.
}
A global variable {\clkwd *all-character-registry-names*}
is added to
support application determination of supported character registries.

The naming and content of the standard character registries
is left unspecified by this proposal.
\footnote{The only constraint is that character registries be
named using only {\clkwd standard-p} characters.}
Below are some candidate character registry names:
\begin{itemize}
\item Arabic
\item Armenian
\item Bo-po-mo-fo
\item Control   (meaning the collection of standard text communication
control codes)
\item Cyrillic
\item Georgian
\item Greek
\item Hangul
\item Hebrew
\item Hiragana
\item Japanese-Punctuation
\item Kanji
\item Katakana
\item Latin
\item Latin-Punctuation
\item Mathematical
\item Pattern
\item Phonetic
\item Technical
\end{itemize}
The list above is provided as a starting point for discussion
and is not intended to be representative
nor exhaustive.  The Common LISP language definition does not
depend on these names nor any specific content (for example:
Where should the plus sign appear?).  It is application
programs which require a reliable definition of the
registry names and their constituents.  The Common LISP language
definition imposes the framework for constructing and manipulating
character objects.

The proposed ISO Character Registry Standard is fixed;
an implementation may not extend a standard registry's
constituent set of characters beyond the
standard definition.

An implementation may provide support for all or part of any
character registry
and may provide new character registries which include characters
having unique semantics (i.e. not defined in any standard
character registry).
Implementation registries must be uniquely
named using only {\clkwd standard-p} characters.

An implementation must document the registries it supports.
For each registry supported the documentation must include
at least the following:
\begin{itemize}
\item Character Labels,
Glyphs, and Descriptions.
\item Reader Canonicalization.
\item Effect of character predicates.
In particular,
\begin{itemize}
\item {\clkwd alpha-char-p}
\item {\clkwd lower-case-p}
\item {\clkwd upper-case-p}
\item {\clkwd both-case-p}
\item {\clkwd graphic-char-p}
\item {\clkwd alphanumericp}
\end{itemize}
\item Interaction with File I/O.  In particular, the
coded character sets
\footnote{For example, ISO8859/1-1987.} and
external encoding schemes
\footnote{For example, {\em Xerox System Integration Character
Code Standard}\cite{xerox87}.}
supported are documented.
\end{itemize}

Which coded character sets and encoding schemes
are supported by the overall computing system, the
details of the mapping of glyphs to characters
to character codes are
left unspecified by Common LISP.

The diversity of glyph sets and coded character
set conventions in use worldwide and the desirability
of allowing Common LISP applications
to portabily manipulate symbolic elements from many
languages, perhaps simultaneously, mandate such a flexible approach.

%----------------------------------------------------------------------
\section{Hierarchy of Types}

Providing support for extensive character repertoires may
impact Common LISP implementation performance in terms
of space, time, or both.
\footnote{This does not apply to all implementations.
Unique hardware support and user community requirements must
be taken into consideration.}
In particular, many existing
implementations support variants of the ISO 8859/1 standard.
Supporting large
repertoires argues for a multi-byte internal representation
for each character, even if an application primarily (or exclusively)
uses the ISO 8859/1 characters.

This proposal extends the definition of the character and string
type hierarchy to include specialized subtypes
of character and string.  An implementation is free to associate
compact internal representation tailored to each subtype.
The {\clkwd string} type specifier, when used for object
creation, for example in {\clkwd make-sequence},
is defined to mean the most general string subtype supported
by the implementation (similarily for the {\clkwd simple-string}
type specifier).  This definition emphasizes portability
of existing Common LISP applications to international
character environments over performance.  Applications emphasizing
efficiency of text processing in non-international environments
will require some modification to utilize subtypes with
compact internal representations.

It has been suggested that either a single type is
sufficient to support international characters,
or that a hierarchy of types could be used, in a manner
transparent to the user.  A desire to provide flexibility which
encourages implementations to support international
characters without compromising application efficiency
led us to accept the need for more than one type.
We believe that these choices reflect a minimal
modification of this aspect of the type system, and that
exposing the types for string and character construction while
requiring uniform treatment for characters otherwise
is the most reasonable approach.

\subsection{Character Type}

The following type specifier is added as a subtype
of {\clkwd character}:
\begin{itemize}
\item {\clkwd base-character}
\end{itemize}

An implementation may support additional subtypes of {\clkwd character}
which may or may not be supertypes of {\clkwd base-character}.
In addition, an implementation may define {\clkwd base-character}
as equivalent to {\clkwd character}.

Characters of type {\clkwd base-character} are referred to as
{\em base characters}.  Characters of type {\clkwd
(and character (not base-character))}
are referred to as {\em extended characters}.
The base characters are
distinguished in the following respects:
\begin{itemize}
\item
The standard characters are a subrepertoire of the base characters.
The selection of base characters which are not standard characters
is implementation defined.
\item
Only members of the base character repertoire
can be elements of a base string.
\item
The base characters are, in general, the default characters for I/O
operations.
\end{itemize}
No upper bound is specified for the number of glyphs in the base
character repertoire--that
is implementation dependent.  The lower bound is 96, the
number of standard characters defined for Common LISP.
\footnote{Or, in contrast, the base repertoire may include all
implementation supported characters.}

The distinction of base characters is largely a pragmatic
choice.  It permits efficient handling of common situations, is
in some sense privileged for host system I/O, and can serve as an
intermediate basis for portability, less general than the standard
characters, but possibly more useful across a narrower range of
implementations.

Many computers have some "base" character representation which
is a function of hardware instructions for dealing with characters,
as well as the organization of the file system.  The base character
representation is likely to be the smallest transaction unit permitted
for text file and terminal I/O operations.  On a system with a record
based I/O paradigm, the base character representation is likely to
be the smallest record quantum.  On many computer systems,
this representation is a byte.

However, there are often multiple
coded character sets supportable on a
computer, through the use of special display and entry hardware, which
are varying interpretations of the basic system character
representation.  For example, ISO 8859/1 and ISO 6937/2 are two
different interpretations of the same 1-byte code representations.
Many countries have their own glyph-to-code mappings for 1-byte
character codes addressing the special requirements of national
languages.  Differentiating between these, without reference to
display hardware, is a matter of convention, since they all use the
same set of code representations.  When a single byte is not enough,
two or more bytes are sometimes used for character encoding.  This
makes character handling even more difficult on machines where the
natural representation size is a byte, since not only is the semantic
value of a character code a matter of convention, which may vary
within the same computing system, but so is the identification of a
set of bits as a complete character code.

It is the intention of this proposal that the composition of
base characters is typically
determined by the code capacity of the natural file system and I/O
transaction representations, and the assumed display glyphs should be
those of the terminals most commonly employed.
There are several advantages to this scheme.  Internal representation
of strings of just base characters can be more compact than
strings including extended characters.
Source programs are likely to consist predominantly of base characters
since the standard characters are a subset of the base character
repertoire. Parsing of pure base character text
can be more efficient than parsing of text including
extended characters.
I/O can be performed more simply
with base characters.

The standard characters are the 96 characters used in the Common LISP
definition {\bf or their equivalents}.

This was the Common LISP \cite{steele84} definition, but
{\em equivalents} is a vague term.

The standard characters are not defined by their glyphs, but by their
roles within the language.  There are two aspects to the roles of the
standard characters: one is their role in reader and format control
string syntax; the second is their role as components of the names of
all Common LISP
functions, macros, constants, and global variables.  As
long as an implementation chooses 96 glyphs
and treats those 96 in a manner consistent with
the language's specification for the standard characters (e.g.
the naming of functions), it doesn't matter what glyphs the I/O
hardware uses to represent those characters: they are the standard
characters.  Any program or
data text written wholly in those characters
is portable through simple code conversion.
\footnote{For example, the currency glyph, \$ , might be replaced
uniformly by the currency glyph available on a particular display.}

Additional
mechanisms, such as in \cite{linden87}, which support establishment of
equivalency between otherwise distinct characters are not excluded by
this proposal.
\footnote{We believe this is an important issue but it requires
additional implementation experience.  We also encourage
new proposals from JIS and ISO LISP Working Groups on this issue.}

\subsection{String Type}

The {\clkwd string} type
is defined as
a vector of characters.  More precisely, a string
is a specialized vector whose elements are of type
{\clkwd character} or a subtype of character.  Similarly, a simple
string is a specialized simple vector whose elements are of type
{\clkwd character} or a subtype of character.  The following string
subtypes are
distinguished with standardized names: {\clkwd base-string},
{\clkwd general-string}, {\clkwd simple-base-string}, and
{\clkwd simple-general-string}.
All strings which are not base strings
are referred to as {\em extended strings}.

A base string can only contain base characters.
{\clkwd general-string} is equivalent to {\clkwd (vector character)}
and can contain any implementation supported base or extended characters,
in any mixture.

All Common LISP functions defined to operate on strings treat
base and extended strings uniformly with the following
caveat: for any function which inserts a character into a string, it
is an error to insert an extended character
into a base string.
\footnote{An implementation may, optionally, provide automatic
coersion to an extended string.}

An implementation may support string subtypes in addition
to {\clkwd base-string} and
{\clkwd general-string}.
For example, a hypothetical
implementation supporting Arabic and Cyrillic character registries
might provide as extended characters:
\begin{itemize}
\item {\clkwd general-string} -- may contain Arabic, Cyrillic or
base characters in any mixture.
\item {\clkwd region-specialized-string} -- may contain installation
selected repertoire (Arabic/Cyrillic) or base characters in any
mixture.
\item {\clkwd base-string} -- may contain base characters
\end{itemize}
Though, clearly, portability of applications using
{\clkwd region-specialized-string} is limited, a performance
advantage might argue for its use.
\footnote{{\clkwd region-specialized-string} is used here for
illustration only; it is not being proposed as a standardized
string subtype.}

Alternatively,
an implementation
supporting a large base character repertoire
including, say, Japanese Kanji may define
{\clkwd base-character}
as equivalent to {\clkwd character}.

We expect that applications sensitive to the performance
of character handling in some host environments will
utilize the string subtypes to provide performance
improvement.  Applications with emphasis on international
portability will likely utilize only {\clkwd general-string}s.

The {\clkwd coerce} function is extended to
allow for explicit coercion between base strings and extended strings.
It is an error to coerce an extended character to a base character.

During reader
construction of symbols, if all the characters
in the symbol's name are of type {\clkwd base-character},
then the name of the symbol may be stored as a base string.
Otherwise it will be stored as an extended string.

The base string type allows for more compact representation of strings
of base characters, which are likely to predominate in any system.
Note that in any particular implementation the base characters
need not be the
most compactly representable, since others might have
a smaller repertoire.
However, in most implementations base strings are
likely to be more space efficient than extended strings.


%----------------------------------------------------------------------
\section{Streams and System I/O}

A lot of the work of ensuring that a
Common LISP implementation operates correctly in a
multiple coded character set environment must be performed by
the I/O interface.
The system I/O interface, abstracted in
Common LISP as streams, is responsible
for ensuring that text input from outside LISP is properly mapped
into character objects internally, and that the inverse mapping
is performed on output.  It is beyond the scope of a language
definition to specify the details of this operation, but options
are specified which allow runtime indication from the user as to
what coded character sets a stream uses, and how the mappings
should be done.  It is expected that implementations will provide
reasonable defaults and invocation options to accommodate desired use
at an installation.

One keyword argument is proposed as an addition to {\clkwd open}:
\begin{itemize}
\item {\clkwd :external-coded-character-format}
whose value would be:
\begin{itemize}
\item
A name or list of names indicating an implementation recognized
scheme for representing 1 or more coded character sets.
\footnote{
For example, the so/si convention used by IBM on 370
machines could be selected by a list including
the name {\clkwd :ibm-shift-delimited}.
The run-encoding convention defined by XEROX could be
selected by {\clkwd :xerox-run-encoded}.
The convention based on
ASCII which uses leading bit patterns to distinguish two-byte codes
from one-byte codes could be selected by
{\clkwd :ascii-high-byte-delimited}.
}
As many coded character set names must be provided as the
implementation requires for that external coding convention.
\footnote{
For example, if {\clkwd :ibm-shift-delimited} were the
argument, two
coded character set specifiers would have to be provided.
}
\end{itemize}
\end{itemize}

These arguments are provided for input, output, and
bidirectional streams.
It is an error to try to write a character other than a
member of the specified coded character sets
to a stream.  (This excludes the
\#$\backslash${\clkwd Newline} character.
Implementations must provide appropriate line division behavior
for all character streams.)

An implementation supporting multiple coded character sets
must allow for the external
representation of characters to be separately (and perhaps
multiply) specified to {\clkwd open},
since there can be circumstances under
which more than one external representation for characters
is in use, or more than one coded character set
is mixed together in an
external representation convention.

In addition to supporting conversion at the system interface, the
language must allow user programs to determine how much space data
objects will require when output in whichever external representations
are available.

The new function {\clkwd external-coded-string-length}
takes a character
or string object as its required argument.  It also takes an optional
{\em output-stream}.
It returns the number of implementation-defined
representation units
\footnote{
Often the same as the storage width of a base character, usually a byte.
}
required to externally store that object, using the
representation convention associated with the stream.
If the object cannot be represented in
that convention, the function returns {\clkwd nil}.
This function is necessary
to determine if strings can be written to fixed length
fields in databases or terminal screen templates.  Note that this
function does not
address the problem of calculating
screen width of strings printed in proportional fonts.

Related to the I/O interface,
we also introduce the function {\clkwd char-ccs-value}
which takes a character object and a coded character set name
(eg. {\clkwd :ISO8859/1-1987}) and returns the encoding of
the character within the coded character set.

%----------------------------------------------------------------------
%----------------------------------------------------------------------


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 22 Feb 89 17:00:34 EST
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Feb 89  13:32:43 PST
Date: Wed, 22 Feb 89 02:09:18 PST
From: Thom Linden <baggins@IBM.com>
To: Common Lisp mailing <common-lisp@sail.stanford.edu>
Message-ID: <890222.020918.baggins@almvma>
Subject: Jan 1 cs proposal comments

>>   From: "David A. Moon" <Moon@SCRC-STONY-BROOK.ARPA>
>>   Subject: Comments on the Character proposal dated January 1, 1989
>>
>>   Page 6 -- *all-registry-names* should be renamed to
>>   *all-character-registry-names*; the word "registry" by itself
>>   is too general.

I made this change to the latest version of the proposal.

>>
>>   Page 9 -- the fourth bullet requires a defined total ordering of all
>>   characters.  This seems unnecessary, and is impossible to implement in any
>>   system (such as Symbolics Genera) that allows dynamic addition of character
>>   registries by third-party software vendors and by users; in such a system
>>   character codes have to be allocated dynamically and therefore their order
>>   cannot be fixed ahead of time.

You are quite right.  This bullet is removed.

>>
>>   Page 9 -- This says an implementation must define the result of
>>   standard-char-p on the characters it supports.  I think that is incorrect.
>>   Common Lisp fully defines the result of standard-char-p, which is NIL
>>   for all characters added by an implementation.

Right.  This bullet is removed.

>>
>>   Page 14 -- This EXTERNAL-WIDTH function probably should be part of a
>>   database facility or a terminal screen template facility; I'm not sure it
>>   is useful by itself.  Also note that its result is only meaningful with
>>   respect to a specific state of the stream.  To give two examples, with the
>>   SO/SI encoding the answer can vary by 1 depending on whether the stream is
>>   already shifted into the correct state for the first character; with the
>>   universal encoding Symbolics uses, the answer can vary by a lot depending on
>>   whether the character repertoires appearing in the string have been used
>>   earlier on the same stream (and hence have been assigned encoding numbers).
>>   Because of this dependence on the state of the stream, I cannot think of
>>   any correct use of EXTERNAL-WIDTH that does not involve immediately
>>   outputting the string to the stream.  Therefore I believe the same effect
>>   can be achieved without adding any new functions, by calling FILE-POSITION,
>>   outputting to the stream, calling FILE-POSITION again, and subtracting.  If
>>   you still want to propose this feature, you should change the name: use
>>   "length" instead of "width", since that's the word Common Lisp always uses,
>>   and use a name that relates to the :EXTERNAL-CODE-FORMAT option to OPEN;
>>   for example, STRING-LENGTH-IN-EXTERNAL-CODE-FORMAT or
>>   EXTERNAL-CODED-STRING-LENGTH.

I changed the name to EXTERNAL-CODED-STRING-LENGTH.  The description
already contained a comment regarding current state.  Actually, I
favored the STREAM-INFO proposal which was voted down.  This is
much less ambitious but I still feel more useful than actually
forcing I/O, backing up and rewriting.  It's also not clear
that your alternative has the same effect since it seems that
some unwanted side-effects would occur such as premature appearance
on a display screen.

>>
>>   Page 24 -- I can't figure out what you intend the meaning of SIMPLE-STRING
>>   to be.  Your report mostly does not mention it, but it doesn't say to
>>   remove it either.  If I have correctly correlated page 24 back to CLtL, you
>>   are defining SIMPLE-STRING to be synonymous with SIMPLE-GENERAL-STRING.
>>   Maybe what you really meant, though, was what you said in November you
>>   would do, which was to make SIMPLE-STRING mean (AND STRING SIMPLE-ARRAY),
>>   in other words a union of several subtypes.  This is particular confusing
>>   because Common Lisp uses the name SIMPLE-VECTOR to mean what you might call
>>   a simple general vector, that is, (SIMPLE-ARRAY T 1) rather than
>>   (SIMPLE-ARRAY * 1).  Here are my suggestions for what to do with the
>>   various names for string subtypes:
>>
>>     STRING                  As a union of all strings, this is fine.
>>     GENERAL-STRING          I think (VECTOR CHARACTER) is just as good.
>>     BASE-STRING             I think (VECTOR BASE-CHARACTER) is just as good.
>>     SIMPLE-STRING           Should mean (SIMPLE-ARRAY CHARACTER 1).
>>     SIMPLE-BASE-STRING      This is fine.
>>     SIMPLE-GENERAL-STRING   This name is horrible, use SIMPLE-STRING.
>>
>>   My rationale for these suggestions largely comes from thinking about
>>   which of these names would ever be used in type declarations and about
>>   how these names relate to the other names already in Common Lisp.  To
>>   repeat older comments:
>>
>>     Pages 19 and 20 introduce a new type named simple-base-string, in addition
>>     to simple-string.  If you think about how simple-string would be used for
>>     compiler optimization, it makes sense for simple-string to be the name for
>>     the single simplest representation, rather than a name for a whole family
>>     of representations that would have to be discriminated at run time.  Thus
>>     what you call simple-base-string should be called simple-string, and what
>>     you call simple-string should just be called (simple-array character (*)).
>>     This would not be an incompatible change in the meaning of simple-string.
>>     Simple-string would be analogous to simple-vector.
>>
>>   I changed my mind slightly on that and now claim that while SIMPLE-STRING
>>   should still be a single representation, not a union, it should be the
>>   representation that can hold all characters.  This is both because of the
>>   principle that correct programs should be easier to write than
>>   extra-efficient programs, and because of the powerful analogy with the name
>>   SIMPLE-VECTOR.  Then the name SIMPLE-BASE-STRING is also needed for
>>   convenient type declarations of the more efficient but less functional
>>   string representation.  That name is good, by analogy to BASE-CHARACTER.
>>
>>   Adopting the above suggestions helps you decide what to do about the
>>   SCHAR, SBCHAR, and SGCHAR mess.  First of all, you only need two functions,
>>   not three, because there are only two specified specialized representations.
>>   SCHAR should be for what I've called SIMPLE-STRING, SBCHAR should be
>>   for SIMPLE-BASE-STRING, and SGCHAR is not needed.  (In fact I would prefer
>>   to remove all of the specialized versions of AREF from the language, in
>>   favor of THE or type declarations, but I know that would only pass over
>>   some peoples' dead bodies so I won't push it.)
>>
>>   In case you are wondering, I have no quarrel with the name BASE-CHARACTER
>>   and would not want to see it removed.  I guess I differ from Larry here,
>>   unless I erred when I wrote down his comments during the meeting.

The statement on p24 making SIMPLE-STRING == (SIMPLE-ARRAY CHARACTER (*))
was in error.  P25 had it right.  Since we changed SCHAR to accept
all simple strings there is no reason for SGCHAR and SBCHAR and
these are eliminated.

  String and simple-string are (more clearly I hope) defined as union
types.  I've changed the terminology from 'for the purpose of
declaration' to 'for object creation'.   Perhaps there is a better
term but the effect seems to be identical to what you suggest. That is,
correct, portable programs are easier to write, one simply uses
string and simple-string.  More efficient, less portable programs
need to specify the specialized subtype(s) explicitly.
  Having both string and simple-string defined as union types seems
desirable on the basis of uniformity.
  Of the type abbreviations I think BASE-CHARACTER is the most
useful and GENERAL-STRING, SIMPLE-BASE-STRING and SIMPLE-GENERAL-STRING
less so.  I don't believe that any of these really complicate the
language.

>>
>>   Page 25 -- The discussion of STRING and SIMPLE-STRING thinks that there
>>   is a distinction between declaration and discrimination, but Common Lisp
>>   no longer has such a distinction.  Even when Common Lisp did have such
>>   a distinction, the meanings for declaration stated here were incorrect.

I changed this to 'object creation'.  Perhaps there is a better term.

>>
>>   Page 29 -- *all-character-registry-names* has to be a variable, not a
>>   constant, to accomodate systems (such as Symbolics Genera) that allows
>>   dynamic addition of character registries by third-party software vendors
>>   and by users.

Right, I made this change.

>>
>>   Page 35 -- CHAR-REGISTRY should be renamed to CHAR-REGISTRY-NAME, so that
>>   if at some later time character registry objects are added, there is no
>>   possibility of confusion about whether this function returns a name or
>>   an object.

Right, I made this change.

>>
>>   Page 40 -- the default :ELEMENT-TYPE for OPEN cannot be BASE-CHARACTER.  I
>>   think this was discussed at the X3J13 meeting.  The report suffers from a
>>   confusion between two meanings of BASE-CHARACTER: the character type
>>   implemented most efficiently by the Lisp, and the character type most
>>   natural to the file system.  These are not always the same.  Furthermore,
>>   in a network-based system that supports multiple file systems equally
>>   (Symbolics Genera is an example), each file system might have a different
>>   natural character type.  BASE-CHARACTER should just mean the character type
>>   implemented most efficiently by the Lisp.  The default for :ELEMENT-TYPE
>>   has two viable choices that I can see, and maybe you should just propose
>>   both and let people vote:
>>
>>     (1) CHARACTER.  This matches the behavior of MAKE-STRING and friends,
>>     adheres to the principle that writing correct programs should be easier
>>     than writing extra-efficient programs (since making a program correct
>>     requires making every part of it correct, while making a program
>>     efficient only requires improving the bottlenecks), and doesn't cost
>>     anything in implementations that don't have extended characters.
>>
>>     (2) The most natural type for the particular pathname being opened.
>>     In some systems this would be a constant, and in a subset of those
>>     systems this would be BASE-CHARACTER, however in general this might
>>     depend on the host, device, or even type fields of the pathname,
>>     and might also depend on information stored in the file system.
>>     In general this would always be an (improper) supertype of
>>     BASE-CHARACTER, but it's probably a bad idea to make that a requirement,
>>     as some file systems might not be able to implement it conveniently.
>>     Again this doesn't cost anything in implementations that don't have
>>     extended characters.

The discussion on p16 about the base coded character set efficiency
has been removed.  The default element-type now states that it is
implementation defined as character or a subtype of character.

>>
>>   The relationship of option 2 to :ELEMENT-TYPE :DEFAULT (a feature that
>>   already exists in Common Lisp) needs to be clarified.  Perhaps they
>>   are the same.

The same?  I don't understand.  For example, I can imagine the
element-type default as base-character and the external format
defaulted to either an ASCII or EBCDIC encoding.

>>
>>   Also the following promise from 14 November did not show up in the report:
>>
>>     >>     There should be a name for the "natural" encoding and there should be a
>>     >>     specification of the properties of the natural encoding that a programmer
>>     >>     can rely on.  Suggestions for the name include :BASE, :NATURAL, and
>>     >>     :INTERCHANGE.  The definition probably involves the concept of data
>>     >>     interchange with non-Lisp programs on the same system.
>>
>>     This will be added to the revision.

I lied.  No one came up with the 'properties' of such an encoding.
Do you have some text to suggest?

>>
>>   Appendix B -- I disagree with the way you've used deprecation.  I'll
>>   comment on each individual point:
>>    - I see no justification for deprecating STANDARD-CHAR.
>>    - I agree that STRING-CHAR should be deprecated, not deleted nor kept.
>>    - I think fonts and bits should be removed outright, not deprecated,
>>      because no portable program could possibly be using them.
>>    - I think the CHAR-INT function needs to be kept, although the INT-CHAR
>>      function should go away.  This is for hashing.  See comments below
>>      on character attributes.

I've removed Appendix B and mention of deprecation.  STANDARD-CHAR
is simply (characterp :standard).  String-char is back in as
implementation-defined either character or base-character (and
maybe should be voted as a deprecated type).

>>
>>   No particular page -- the use of strings for naming registries, labelling
>>   characters, and naming external code formats is objectionable.  Nothing
>>   else in Common Lisp is named by strings.  Use of strings might lead to
>>   efficiency problems.  We feel that keyword symbols are the appropriate
>>   objects to use for these three kinds of names.

I changed these back to symbols.

>>
>>   No particular page -- We agree with the deprecation or deletion of the two
>>   particular character attributes defined by CLtL, but not with the
>>   deprecation of the whole concept of character attributes.  In fact on page
>>   20 you say "characters are uniquely distinguished by their codes," which
>>   makes it impossible to have character attributes at all.  The language must
>>   define how conforming programs should be written so that they will work
>>   both in implementations with character attributes and in implementations
>>   without them.  For example, the value of (eql x (code-char (char-code x)))
>>   is unspecified.  Another thing that needs to be said is that the exact
>>   character operations (char=, string=, etc.) respect all character
>>   attributes, while the inexact character operations (char-equal,
>>   string-equal, etc.) respect or ignore each character attribute in an
>>   implementation-defined but consistent fashion.  Some of what you say on
>>   page 44 about attributes in general needs to be part of the spec, not
>>   deprecated.  I would retain everything on that page except for INT-CHAR and
>>   the last bullet (referring to bits and fonts), and I would add a remark
>>   that FIND-SYMBOL and INTERN respect character attributes.  If you want,
>>   perhaps I or someone else at Symbolics can provide exact text for what
>>   to say about character attributes that you could insert into your report.

I moved the attribute list previously in Appendix B back into the
description of characters.  Let me know what text you would like
to see for FIND-SYMBOL and INTERN and I'll add it to the list.

>>   No particular page -- On the subject of defining character registries in a
>>   separate document, and relating them to ISO standards for character
>>   encoding: I think that's fine.  I don't see anything wrong with introducing
>>   the concept of character registry and the requirement that each character
>>   object relates to exactly one registry.  However, I think the somewhat
>>   random list of character registries on pages 7-8 and again on page 21 does
>>   not belong in the language specification.  Even the names of the

Right.  They are not part of the Common LISP standard.  The revised
document is considerably clearer in this regards.

>>   standardized character registries belong in the character registry
>>   standard, not in the Common Lisp language standard.  I'm confused about the
>>   meaning of BASE, STANDARD, and CONTROL as character registry names; these
>>   are mentioned in your report but not explained very well.  If these are
>>   character registries that are required to exist in all Common Lisp
>>   implementations, then unlike the others they do belong in the Common Lisp
>>   language standard, not in the character registry standard.

By CONTROL, I meant a registry which contains the various control
codes mentioned in the various ISO coded character set standards.
BASE and STANDARD are no longer mentioned here.  They are allowed
as Common LISP repertiore names in characterp and the character
type specifier.

>>
>>   At the meeting there was some discussion about the issue of enumerating all
>>   characters in a character registry.  People claimed incorrectly that it was
>>   impossible.  In fact it's possible to do this, with questionable
>>   efficiency, by the following program:
>>
>>     (dotimes (code char-code-limit)
>>       (let ((char (code-char code)))
>>         (when char
>>           (when (eq (char-registry-name char) desired-registry-name)
>>             ... process this char ...))))
>>
>>   Of course you have to change the EQ to EQUALP if you continue to use
>>   strings to name character registries.  For more efficiency, you could add
>>   a way to iterate over all the codes in one character registry, but I think
>>   that is unnecessary.
>>
>>
>>   TYPOS:

Right. I've made these corrections.

>>
>>   25 -- base-string is missing from the Table 4-1 amendment.
>>
>>   26 -- general-string is not an array of BASE characters, also the first
>>   two paragraphs under A.4.8 are garbled (the two separate sentences for
>>   strings for symbols got smushed together).
>>
>>   37 -- This says the default for the :ELEMENT-TYPE option to MAKE-STRING
>>   is SIMPLE-STRING.  Actually it's CHARACTER.
>>


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 22 Feb 89 16:57:27 EST
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Feb 89  13:33:35 PST
Date: Wed, 22 Feb 89 03:48:56 PST
From: Thom Linden <baggins@IBM.com>
To: Common Lisp mailing <common-lisp@sail.stanford.edu>
Message-ID: <890222.034856.baggins@almvma>
Subject: cs proposal comments

>>   From: sandra%defun@cs.utah.edu (Sandra J Loosemore)
>>   Subject: comments on character proposal
>>
>>   Getting rid of bits and fonts (section 2.1) seems like a very good
>>   idea to me.  I would argue for deleting these "features" completely
>>   instead of merely deprecating them, because there now seems to be
>>   general agreement that the whole idea was brain-damaged in the first
>>   place, plus it's just about impossible to use them portably anyway
>>   (since implementations are free not to support them).  Deprecating the
>>   features would simply perpetuate the current sad state of affairs in
>>   to the ANSI standard.

I deleted Appendix B from the proposal.  The attribute check list is
incorporated into the character chapter as implementation dependent.

>>
>>   I am not at all sure why we need to standardize the idea of character
>>   registries at all, much less state that a character can only belong to
>>   one registry, or define a standard set of registries.  What does having
>>   registries buy the user, other than perhaps a way to test whether a
>>   character belongs to one or not?  Why isn't it sufficient just to say
>>   that implementations can support extended characters, and leave it at
>>   that?

The registries are introduced to allow an application a portable
way to name, compose and decompose characters.  Currently, there is
no way to do this in any programming language.  There are other
possiblities.  For example, simply labeling all characters
uniquely; another to define a universal coded character set and use
these numeric codes to 'name' characters.  I don't think using
numbers for naming characters is useful since I'll always forget
what character 34539 actually is!  Registries seem to provide a
framework for useful categorization of characters.  It also
avoids the current mess that the coded character set standards
are in.


>>
>>   I'm confused about how you propose to handle characters that appear in
>>   more than one character repetoire, and whether characters with accent
>>   marks are considered distinct from characters without accents.  For
>>   example, is the French "C" with a cedilla distinct from a normal
>>   French "C", and is that distinct from the standard-char "C"?

We handle characters that appear in more than one repertoire by
using registries.  No character appears in more than one registry.
The constituents of the registries are not defined by Common LISP.
I believe that in most environments today, it is recognized that
characters with accents are distinct from their vanilla cousins.
As we have proposed registries, they contain semantically
distinct characters.

>>
>>   The way the document describes things now, it seems like the Common
>>   Lisp standard would have to include a statement of exactly what
>>   characters belong in each of the standard registries listed in section
>>   2.2.  Otherwise, implementors might go off and define their own
>>   character registries that happen to include some characters that ought
>>   to belong in one of these standard registries.  For instance, the machine
>>   I happen to be sitting in front of right now supports an 8-bit native
>>   character set, and it seems perfectly reasonable for a Lisp runnning on
>>   this machine to include all 256 characters in its base character set,
>>   but some of those might actually be supposed to live off in some other
>>   registry.

The registries are independent of any coded character sets.
In particular, coded character sets are not registries.  Your base
repertoire (set of 256 characters) are possibly drawn from
several registries.

You are correct that lacking an international standard (or ANSI one),
for character registries an implementation could define the
a single registry containing all supported characters.  It could
also define NO registries and use only the conventional naming
of characters.  I expect an implementation taking the no-cost way
would choose the second approach.  On the other hand, an
implementation supporting text processing across international
boundaries is more likely to define some reasonable registries
eg. Latin, Greek, etc..


>>
>>   Also in section 2.2, why is it necessary for there to be a total
>>   ordering, or even a partial ordering, of all characters?  It seems
>>   like CHAR< and friends are not very useful except when comparing base
>>   characters anyway.  It seems like it would difficult to get things
>>   like the Spanish N-with-twiddle character to collate correctly anyway,
>>   given the constraints you have put on how character codes are derived
>>   and the requirement that CHAR< be just like < on the char-codes.

Right.  This is now removed.

>>
>>   It doesn't seem like STANDARD-CHAR-P belongs in the list of character
>>   predicates on p. 9, since no extended characters can possibly be
>>   STANDARD-CHAR-P anyway.

Right.  This is now removed.

>>
>>   The stuff in section 2.3 seems mostly reasonable to me.  It's not really
>>   clear why you need GENERAL-STRING (as distinct from STRING) and
>>   SIMPLE-GENERAL-STRING (as distinct from SIMPLE-STRING).  Again, some
>>   rationale would be helpful.

GENERAL-STRING means (VECTOR CHARACTER).  This is not the meaning of
STRING (a union type).  I agree that GENERAL-STRING is not much
of an abbreviation over (VECTOR CHARACTER).  It still seems somewhat
more mnemonic.

>>
>>   In section 2.4, the general idea of specifying an external character
>>   encoding to OPEN seems reasonable.  However, I'm confused by the
>>   business about having more than one coded character set mixed
>>   together.  If a character appears in more than one coded character
>>   set, which encoding takes precedence?  It seems like this has not been
>>   well thought-out.  Also, seeing as though we have just voted down a
>>   proposal to add an EXTERNAL-WIDTH function, it seems like a very bad
>>   idea to lump it in here.

Some encoding schemes allow disjoint coded characters sets to
coexist.  That is, a given character would appear on one but not
the other.  For example, a ISO8859/1 coded character set could
coexist with a coded character set for Chinese.

As for External-width, it was part of our subcommittee discussions
long before the recent stream proposal.  It will be a separate
item in the list of character votes.

>>
>>   Now for the general comments.
>>
>>   One thing that is not clear to me from reading this document is how
>>   much of it has already been standardized by ISO.  I share Larry's
>>   concern that we might standardize one thing, and then have ISO go off
>>   and standardize something completely different.  I think it's a
>>   mistake to try to second-guess what ISO might do.

The revision might make this clearer.  I think this is a
red herring anyhow.  As a programming language committee
we need to specify what is useful in the context of LISP.  We
can't expect a coded character set committee to figure it out.

On the other hand, we can influence what gets standardized
by defining our framework.  The ISO Prolog std committee is
interested in what we define.

>>
>>   I am also concerned about trying standardize things that have not yet
>>   been implemented.  I think it's a mistake to try to do language design
>>   in a standards committee.
>>
>>   Finally, I have some problems with the presentation of your proposal.
>>   One problem, as I mentioned at the meeting, is that you've made it an
>>   all-or-nothing package, and I can't vote for the whole thing because
>>   there are some parts of it that do not seem appropriate, even though I
>>   would support some of the other changes individually.  The other
>>   problem is that Appendix A is virtually unreadable.  Some of the
>>   conceptual changes involve wording changes to several passages, and I
>>   know that there are some other changes in the appendix that are not
>>   mentioned in the introductory blurb at all.  Is it totally impossible
>>   to recast the changes in standard cleanup format proposals?  The
>>   advantage of that format is that it presents more context, including a
>>   clear statement of why the existing CLtL behavior is "broken" and a
>>   rationale for the proposed change.

There will be several votes regarding this proposal.  I don't
intend to rewrite the document in a cleanup format.


>>
>>   I know that we adopted things like the CLOS document that were
>>   presented as single mega-proposals, but those were primarily additions
>>   to the language and what you are proposing is essentially a large
>>   number of incompatible changes.  I'm having a hard time identifying
>>   what all of those changes are.
>>

Actually, I don't think it's as large a number of changes as you
imply.  In any case, the vote split should help this out.



Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 22 Feb 89 16:51:11 EST
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Feb 89  13:34:09 PST
Date: Wed, 22 Feb 89 04:51:15 PST
From: Thom Linden <baggins@IBM.com>
To: Common Lisp mailing <common-lisp@sail.stanford.edu>
cc: David Gray <GRAY%DSG.CSC.TI.COM@RELAY.CS.NET>
Message-ID: <890222.045115.baggins@almvma>
Subject: cs proposal comments

>>   From: David N Gray <Gray@DSG.csc.ti.com>
>>   Subject: characters proposal
>>
>>   I have read the documented titled "Extensions to Common LISP to Support
>>   International Character Sets" dated January 1, 1989, and feel that it is
>>   not much of an improvement over what we saw in October.  Following are
>>   some random comments about things I happened to notice; this is not
>>   intended to be a comprehensive analysis.
>>
>>   First, documents such as this ought to be labelled with an X3J13
>>   document number so that they can be referred to conveniently and
>>   unambiguously.
>>
>>   "Appendix A" and "Appendix B" really should be chapters 3 and 4 since
>>   they are an essential part of the proposal, rather than being an
>>   appendage to it.

Appendix B is now eliminated.  Appendix A is really quite unlike
chapters 1 and 2 in structure.

>>
>>   Page 7 says that the definition of semi-standard-characters "is replaced
>>   by a more uniform approach with introduction of the Control Character
>>   Registry".  Do you really mean that it _will_be_ replaced when the
>>   Control Character Registry is defined in some subsequent document?  I
>>   certainly don't see anything in this document that could be considered a
>>   replacement.

Yes.  The revision is clearer on this.  This document does not define
names for character registries nor their constituents.

>>
>>   This whole concept of registries seems rather strange.  Is the intent
>>   that the alphabetic characters of the standard characters are to be in
>>   the "Latin" registry while characters such as period and comma are in
>>   "Latin-Punctuation"?   Is #\NEWLINE in the "Control" registry?  Where do
>>   the digits go -- "Mathematical"?.  Is #\- a "Latin-Punctuation" or a
>>   "Mathematical"?  Which registry is #\SPACE in?  Now tell me what to do
>>   with the extra non-Latin alphabetic characters used in Sweedish?  Does
>>   that require a separate registry for just those additional characters?
>>   Now we have simple text in a single language using characters from at
>>   least four different registries.  Do you really think it possible to
>>   agree on a "fixed", non-extensible, set of "Mathematical" or "Pattern"
>>   characters?

  Actually, I believe the simplicity of the registry framework will make
agreement easy.  Currently, members of the coded character set
committees spend vast amounts of time lobbying for inclusion of their
favorite character(s) in the 'popular' coded character set standard.
The effect of not being included means fewer installations will
support their native language properly.

  I think a new group, hopefully formed within
programming languages, should define the registries rather than
the existing coded character set committees.  There is no competition
between registries, ie. no advantage of one over another.  What this
committee has to agree upon is 1) a useful set of registry names and
2) definition of the constituents of each registry.  The only argument
I would anticipate is "are the semantics of my alpha the same
or different from your alpha" type debates.
  By the way,
the registries are fixed only in that a Common LISP implementation
cannot modify the standard definitions.  This guarantees an application
program can portably rely on the composition and decomposition
functions to establish the availability of any given character.

>>
>>   Page 9 says that an implementation needs to specify the total ordering
>>   of characters within each registry, but what about the ordering of
>>   characters in different registries?  Is that completely undefined?

There is no ordering of characters within registries.  As mentioned
in Hawaii, the character index (a number) was changed to character
label (a symbol) throughout the proposal.

>>
>>   Page 25 section A.4.5 doesn't specify the syntax of a registry name; did
>>   you intend it to be a string?

These have been changed to be symbols.

>>
>>   Page 27 has an example using  (typep x '(character "standard"))  but
>>   page 25 said that had to be a registry name; "standard" is not a
>>   registry name.

The revision is clearer on this.  character and characterp can take
registry names, :base or :standard.  The meaning of :base and :standard
is defined by Common LISP as the base character repertoire and
standard character repertoire respectively.

>>
>>   Page 29 - *ALL-REGISTER-NAMES* -- a list of strings?

Now a list of symbols.

>>
>>   Page 33 -- FIND-CHAR -- does the index value within a registry have any
>>   portable meaning?  Is that intended to be specified for the standard
>>   registries?  Is "base" supposed to be accepted here?  If not, how can
>>   you access the base codes?  If I were going to construct a character
>>   from its index value, it would be more meaningful to use an index
>>   relative to some coded character set rather than these registries.

FIND-CHAR takes a character label and registry.  These are specified
by the registry standard.  Base is not a registry name.  We have
introduced a new function CHAR-CCS-VALUE which takes a character
object and a coded character set name (a symbol) and returns the
encoding of the character in the coded character set.

>>
>>   Page 36, the last sentence doesn't make sense.  The default for
>>   :ELEMENT-TYPE would have to be either CHARACTER or BASE-CHARACTER.

Right. I've made this change.

>>
>>   Page 37, section A.22.1.1 -- the part being deleted specifies the
>>   meaning of including tab and form-feed characters in a Common Lisp
>>   source file; do you really intend that to not have any standard meaning?
>>   If my editor uses tabs for indenting, does that mean that the resulting
>>   source file is not a standard-conforming program?

That really depends on the definition of a conforming program. Is
this defined yet?

>>
>>   Page 38, the first reference to p360 of CLtL should be p353; the
>>   deletion here says that there shall not be any standard name for the
>>   commonly used control characters such as tab and form-feed.  That still
>>   seems wrong to me.
>>
>>   Page 41, what's the point of appending "ccs" to the name of the
>>   standard?  Presumably that stands for "coded character set", but isn't
>>   that adequately implied by the fact that this string will follow the
>>   keyword :EXTERNAL-CODE-FORMAT ?   The use of "default" seems odd since
>>   :DEFAULT is used everywhere else.

This was to distinguish from someone referring to the set of characters
(repertoire) represented in a given coded character set. Ie. to
distinguish ISO8859/6-1987 coded character set from the ISO8850/6-1987
repertoire.  In fact, the ISO coded character set standards never
refer to repertoires in isolation (ie. without the codes), so I've
dropped the 'ccs'.  Also, "default" is now :DEFAULT as elsewhere.


>>
>>   I agree with Moon that the excising of bits and fonts has not been done
>>   carefully enough for them to be compatible extensions.
>>

I think the new revision takes care of this by incorporating the
attribute list as part of the language proper (ie. not deprecated).



Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 22 Feb 89 16:47:02 EST
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Feb 89  13:32:18 PST
Date: Wed, 22 Feb 89 00:36:12 PST
From: Thom Linden <baggins@IBM.com>
To: Common Lisp mailing <common-lisp@sail.stanford.edu>
Message-ID: <890222.003612.baggins@almvma>
Subject: cs proposal revisions

I've sent out a revised cs document for your review.  It reflects
a number of your comments from the Hawaii meeting and over the
net.  The larger changes were:

  --  The 'depreciated' appendix is eliminated.  I re-introduced
      the list of implementation-dependent attribute support
      items into the document proper.  The other items in
      appendix B were simply eliminated.

  --  The functions sbchar and sgchar are eliminated.  In general,
      the comments indicate that case discrimination by schar
      does not introduce a substantial performance penalty.

  --  Character registry names and constituents are NOT defined by
      Common LISP.  The proposal defines only the framework for
      composition and decomposition of characters.  The naming
      of registries and definition of their constituents are
      left completely as an ISO standard activity.

  --  Character registry names and constituents are NOT defined by
      Common LISP.  The proposal defines only the framework for
      composition and decomposition of characters.  The naming
      of registries and definition of their constituents are
      left completely as an ISO standard activity.


  Please send comments to the X3J13 mailing list.  If time allows
  and it seems needed, I will send out another revision in time to
  allow for an actual vote at the March meeting.  A straw vote list
  will follow shortly.

Regards,
  Thom


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 16 Feb 89 17:08:35 EST
Received: from NMFECC.ARPA by SAIL.Stanford.EDU with TCP; 16 Feb 89  14:02:13 PST
Received: from tuva.sainet.mfenet by ccc.mfenet with Tell via MfeNet ;
	Thu, 16 Feb 89 13:59:35 PST
Date:	  Thu, 16 Feb 89 13:59:35 PST
From:     POTHIERS%TUVA.SAINET.MFENET@NMFECC.ARPA
Message-Id: <890216135935.20800216@NMFECC.ARPA>
To:       common-lisp@sail.stanford.edu

Subject: WANTED: Code Profiler
Date:    Thu, 16-FEB-1989 14:57 MST
X-VMS-Mail-To: ARPA%"common-lisp%sail.stanford.edu@nmfecc.arpa"

Does any have (or know where I can get) a Common Lisp code profiler?
I'm interested in something that will give be number of invocations &/or
caller &/or timing information for all the user written functions
in my system. I would really like to profile some of our stuff
that uses PCL too. I don't mind having to hack at the code some to
make it suit my puposes.

Please direct any advice to me directly at:
pothiers%tuva.sainet@nmfecc.arpa

Thanks,
Steve Pothier
Science Applications International Corporation
Tucson


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 12 Feb 89 17:00:27 EST
Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 12 Feb 89  13:53:21 PST
Received: from BOBOLINK.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 537730; Sun 12-Feb-89 16:51:03 EST
Date: Sun, 12 Feb 89 16:50 EST
From: Kent M Pitman <KMP@STONY-BROOK.SCRC.Symbolics.COM>
Subject: File I/O
To: dg1v+@andrew.cmu.edu
cc: Common-Lisp@SAIL.Stanford.EDU
In-Reply-To: <MXxTEhy00jbbQ6jVFp@andrew.cmu.edu>
Message-ID: <890212165052.3.KMP@BOBOLINK.SCRC.Symbolics.COM>

There's not a separate function. Most reader functions (eg, READ and READ-LINE)
take an eof-p argument that says whether to signal an error if you read past
the end of a file. The default is T, but if you specify NIL then you can specify
a value to be returned when you have read past the end of file. Here are some
examples:

 (DEFUN SHOW-FILE (FILE)
   (WITH-OPEN-FILE (STREAM FILE)
     (DO ((LINE (READ-LINE STREAM NIL NIL) (READ-LINE STREAM NIL NIL)))
         ((NOT LINE))
       (WRITE-LINE LINE))))

 (DEFUN GET-LISP-FORMS-FROM-FILE (FILE)
   (WITH-OPEN-FILE (STREAM FILE)
     (LET ((UNIQUE (LIST NIL)))
       (DO ((FORM (READ STREAM NIL UNIQUE) (READ STREAM NIL UNIQUE))
            (RESULT '() (CONS FORM RESULT)))
           ((EQ FORM UNIQUE) (NREVERSE RESULT))))))

By the way, the Common-Lisp list is -very- large (probably many hundreds of
recipients) and probably overkill for this kind of simple `how to' question.
Contacting your vendor or individually contacting just about any one of the
people you see contributing to this list would probably have gotten you the
same answer at lower cost to the community.


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 12 Feb 89 16:35:36 EST
Received: from po2.andrew.cmu.edu by SAIL.Stanford.EDU with TCP; 12 Feb 89  13:26:36 PST
Received: by po2.andrew.cmu.edu (5.54/3.15) id <AA03940> for common-lisp@sail.stanford.edu; Sun, 12 Feb 89 16:21:55 EST
Received: via switchmail; Sun, 12 Feb 89 16:21:35 -0500 (EST)
Received: from kennettsq.andrew.cmu.edu via qmail
          ID </afs/andrew.cmu.edu/service/mailqs/q005/QF.EXxTFqy00jbbE0JEMo>;
          Sun, 12 Feb 89 16:17:52 -0500 (EST)
Received: from kennettsq.andrew.cmu.edu via qmail
          ID </afs/andrew.cmu.edu/usr13/dg1v/.Outgoing/QF.0XxTEhy00jbbI6jVMz>;
          Sun, 12 Feb 89 16:16:30 -0500 (EST)
Received: from Version.6.25.N.CUILIB.3.45.SNAP.NOT.LINKED.kennettsq.andrew.cmu.edu.rt.r3
          via MS.5.6.kennettsq.andrew.cmu.edu.rt_r3;
          Sun, 12 Feb 89 16:16:29 -0500 (EST)
Message-Id: <MXxTEhy00jbbQ6jVFp@andrew.cmu.edu>
Date: Sun, 12 Feb 89 16:16:29 -0500 (EST)
From: David Greene <dg1v+@andrew.cmu.edu>
X-Andrew-Message-Size: 402+0
To: +dist+/afs/andrew.cmu.edu/usr0/postman/DistLists/Andrew-Hints.dl@andrew.cmu.edu,
        bb+andrew.programming.lisp@andrew.cmu.edu,
        common-lisp@sail.stanford.edu,
        Outbound News <outnews+ext.nn.comp.lang.lisp@andrew.cmu.edu>
Subject: File I/O

I am trying to read various types of ascii data files into a standard common
LISP program (Ibuki Common Lisp).  There are a number of ways to create streams
and such, but how can I test for an End Of File so that my read won't return an
error?

I have gone through Steele, but apparently the appropriate function has eluded
me.  Thanks for any help.


-David

dg1v@andrew.cmu.edu
dpg@isl1.ri.cmu.edu


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU  7 Feb 89 21:10:27 EST
Received: from vaxa.isi.edu by SAIL.Stanford.EDU with TCP; 7 Feb 89  17:58:33 PST
Posted-Date: Tue, 07 Feb 89 17:55:58 PST
Message-Id: <8902080156.AA04251@vaxa.isi.edu>
Received: from LOCALHOST by vaxa.isi.edu (5.59/5.51)
	id AA04251; Tue, 7 Feb 89 17:56:01 PST
To: common-lisp@sail.stanford.edu
From: goldman@vaxa.isi.edu
Subject: &environment extent
Date: Tue, 07 Feb 89 17:55:58 PST
Sender: goldman@vaxa.isi.edu

Can someone tell me whether the ENVIRONMENT object passed as the second
parameter to a macro-expander function is specified to have DYNAMIC or
INDEFINITE extent?

Thanks,
Neil


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 24 Jan 89 16:30:54 EST
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 24 Jan 89  13:16:41 PST
Date: Tue, 24 Jan 89 11:16:13 PST
From: Thom Linden <baggins@IBM.com>
To: Common Lisp mailing <common-lisp@sail.stanford.edu>
Message-ID: <890124.111613.baggins@almvma>
Subject: character proposal

Below are the minimum changes going into the character proposal.
This list was presented on a foil at the Hawaii meeting.


-- some minor corrections (bugs)

-- the registry document will:
     -- be an appendix to the standard, not required
     -- reference appropriate ISO standards (only)

-- character 'index' will be changed to character 'label' throughout
     (labels are strings, not numeric values)

-- add the function char-ccs-value which takes a character object
     and coded character set name and returns the value of
     the character within that encoding.

-- add the function sgchar which is similar to sbchar but takes
     a general-string object.

-- modify char-name, name-char, and #\name  to accept character
     names of the form 'registry:label'


As decided at the Hawaii meeting, the proposal will be voted on
at the March meeting (rather than by mail).  In particular, there
were requests to partition the vote.  If you have any specific
partition you would favor (eg. vote on external-width separately),
please let us know.  (Note, the ballot is being split, not the
document).  I'll probably send out a few informal ballots to get
a feeling for the partitioning as well identifing the controversial
items.


I will be revising the document and encourage any comments to
be sent immediately.  I hope to send out a revision at the end of
this week.  If there are additional comments (on the revision)
I will repeat this process if necessary to obtain a 'clean' version
for the March vote.



Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 23 Jan 89 12:57:49 EST
Received: from Think.COM by SAIL.Stanford.EDU with TCP; 23 Jan 89  09:45:51 PST
Received: from fafnir.think.com by Think.COM; Mon, 23 Jan 89 12:22:12 EST
Return-Path: <gls@Think.COM>
Received: from verdi.think.com by fafnir.think.com; Mon, 23 Jan 89 12:42:29 EST
Received: by verdi.think.com; Mon, 23 Jan 89 12:41:17 EST
Date: Mon, 23 Jan 89 12:41:17 EST
From: Guy Steele <gls@Think.COM>
Message-Id: <8901231741.AA12978@verdi.think.com>
To: krulwich-bruce@yale.arpa
Cc: Common-Lisp@sail.stanford.edu
In-Reply-To: Bruce Krulwich's message of Thu, 12 Jan 89 12:49:19 EST <8901121749.AA18587@ATHENA.CS.YALE.EDU>
Subject: Order of "processing" of arguments

   Date: Thu, 12 Jan 89 12:49:19 EST
   From: Bruce Krulwich <krulwich-bruce@yale.arpa>

   Michael Greenwald said:
   >Actually, CLtL pg 61 says that the arguments and parameters are
   >processed in order, from left to right.  I don't know if "processed"
   >implies "evaluated", but I always assumed (perhaps incorrectly) it did.

   Guy Steele replied:
   >I interpret this as referring to how the (fully evaluated) arguments
   >are processed during lambda-binding, not to the order in which argument
   >forms in a function call are evaluated.  After all, the arguments referred
   >to on page 61 might have come from a list given to APPLY, rather then
   >from EVAL on a function call.

   This seems vacuous to me.  Does this mean that an implementation in which a
   procedure entry point knows how many arguments its receiving (through a link
   table, for instance, or simply by counting its arguments) and constructs a
   REST-arg list before doing the binding of the required args is in violation of
   CLtL because it processes the rightmost argument before the leftmost one??  I
   hope not.

   It seems to me that as long as actuals and formals are matched up correctly
   there is no reason for the language specification to specify the order of the
   "processing" of the arguments during lambda-binding.


   Bruce Krulwich
   krulwich@cs.yale.edu

The implementation need only behave "as if" it
processed them in that way.

It is always permissible to dye one's whiskers green
and then to use so large a fan that they cannot be seen.
--Guy


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 23 Jan 89 10:49:06 EST
Received: from crash.cs.umass.edu ([128.119.40.235]) by SAIL.Stanford.EDU with TCP; 23 Jan 89  07:40:00 PST
Received: from vax3.cs.umass.edu by crash.cs.umass.edu (5.59/Ultrix2.0-B)
	id AA01738; Mon, 23 Jan 89 10:40:36 est
Message-Id: <8901231540.AA01738@crash.cs.umass.edu>
Date: Sun, 22 Jan 89 12:18 EST
From: ELIOT@cs.umass.EDU
Subject: Logical Operations on Numbers
To: Common-Lisp@sail.stanford.EDU
X-Vms-To: IN%"Common-Lisp@sail.stanford.edu"

   From:	IN%"seb1525@draper.COM" 20-JAN-1989 12:12
   Subj:	LOGICAL OPERATIONS ON NUMBERS

   From: SEB1525@mvs.draper.COM
   To: common-lisp@SAIL.STANFORD.EDU


   Isn't SUBSETP of A and B, where A and B are integers, implementable by
    (eql B (logior A B))?

Yes.  It is also (zerop (logandc2 A B)).  However, these expressions
are not efficient.  Suppose that the sets are large, hundreds or thousands
of elements.  In this case A and B are going to be 'bignums', certainly
not FIXNUMS.  Assuming that bignums are implemented so they can be 
operated on as a series of chunks we have:


	A = a1'a2'a3'...'an
	B = b1'b2'b3'...'bn
 

SUBSET implemented directly is:
	(AND[i=1..n] (%subset ai bi))

Where %subset operates on a single chunk.  AND[i=1..n] is a short circuit
logical 'AND' operation.  This requires n operations, and allocates NO new
memory.

SUBSET implemented as (eql B (logior A B)) requires n operations to compute
the logior, perhaps some overhead to normalize the new bignum,
plus n more operations to compute EQL, plus it allocates
memory to store max(A, B).

SUBSET implemented as (zerop (logandc2 A B)) requires n operations
to compute the logandc2, perhaps some overhead to normalize the new bignum,
and 1 operation to compute zero, plust it allocates memory to store
the intermediate result.  This is slightly more efficient, because
ZEROP is microscopically more efficient that EQL.  (ZEROP is FALSE for
all bignums.  EQL has to look at them.)  Furthermore the intermediate
result may be smaller than the intermediate result in the logior
construct.

I draw three conclusions from this.

(1) A naive computation of subset in Common Lisp requires approximately
twice the number of operations than it should, due to missing primitives.

(2) An optimizing compiler should try to recognize the SUBSET operation
and compile it efficiently.  This may be difficult, because there are
at least two (and probably many) ways to encode this operation using the
existing Common Lisp primitives.

(3) For logical completeness, clarity and consistency of source programs
and efficient implementation of some algorithms Common Lisp should be
extended to include a logical subset operation for integers.  The name
subsetp is already used (CLtL P.279) so I propose LOGSUBSETP with
semantics equivalent to:

(defun logsubsetp (a b)
  (zerop (logandc2 a b)))


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 20 Jan 89 15:53:11 EST
Received: from decwrl.dec.com by SAIL.Stanford.EDU with TCP; 20 Jan 89  12:34:07 PST
Received: from decvax.dec.com by decwrl.dec.com (5.54.5/4.7.34)
	for common-lisp@sail.stanford.edu; id AA14762; Fri, 20 Jan 89 12:31:42 PST
Received: from thor.prime.com by cvbnet.prime.com (3.2/SMI-3.2)
	id AA04473; Fri, 20 Jan 89 15:28:47 EST
Received: from giants.uucp by thor.prime.com (3.2/3.14)
	id AA08730; Fri, 20 Jan 89 15:22:47 EST
Return-Path: <tbardasz@giants>
Received: by giants.uucp (3.2/SMI-3.0DEV3)
	id AA04231; Fri, 20 Jan 89 15:23:00 EST
Date: Fri, 20 Jan 89 15:23:00 EST
From: decvax!cvbnet!giants.prime.com!tbardasz@decwrl.dec.com (Ted Bardasz)
Message-Id: <8901202023.AA04231@giants.uucp>
To: cvbnet!decvax!decwrl!SAIL.STANFORD.EDU!common-lisp@decwrl.dec.com
Subject: New Mail Address


	Please change my mail address to:

	decvax!tbardasz@cvbnet.prime.com

	Thanks,

		Ted Bardasz


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 20 Jan 89 11:52:45 EST
Received: from RELAY.CS.NET by SAIL.Stanford.EDU with TCP; 20 Jan 89  08:34:43 PST
Received: from relay2.cs.net by RELAY.CS.NET id aj08866; 20 Jan 89 8:51 EST
Received: from draper.com by RELAY.CS.NET id aa25245; 20 Jan 89 8:46 EST
Return-path: seb1525@mvs.draper.com
Received: from MVS.DRAPER.COM by DRAPER.COM via TCP; Fri Jan 20 08:16 EST
Received: by MVS.DRAPER.COM with NETMAIL; FRI, 20 JAN 89 08:16 EST
Date: FRI, 20 JAN 89 08:13 EST
From: SEB1525@mvs.draper.com
Subject: LOGICAL OPERATIONS ON NUMBERS
To: common-lisp@SAIL.STANFORD.EDU
Reply-to: seb1525@draper.com
X-MVS-to:  common-lisp@sail.stanford.edu
Message-Id: <NETMAILR09012008133SEB1525@MVS.DRAPER.COM>


Isn't SUBSETP of A and B, where A and B are integers, implementable by
 (eql B (logior A B))
?


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 19 Jan 89 13:12:10 EST
Received: from crash.cs.umass.edu ([128.119.40.235]) by SAIL.Stanford.EDU with TCP; 19 Jan 89  09:52:58 PST
Received: from vax3.cs.umass.edu by crash.cs.umass.edu (5.59/Ultrix2.0-B)
	id AA06071; Thu, 19 Jan 89 12:53:38 est
Message-Id: <8901191753.AA06071@crash.cs.umass.edu>
Date: Thu, 19 Jan 89 12:53 EST
From: ELIOT@cs.umass.EDU
Subject: Logical Operations on Numbers
To: Common-Lisp@sail.stanford.EDU
X-Vms-To: IN%"Common-Lisp@sail.stanford.edu"

Rather than duplicating the subset operations on both numbers and bitvectors
why not make the generic arithmetic routines accept bitvectors as non-negative
integers?  The generic arithmetic routines already handle so many types
that one more can't make a big difference.  Many numeric routines make
sense and extend the functionality if they could be applied to bitvector
For example, ZEROP (null set), =, /=, logXXX, boole,lognot, logtest,
logcount, integer-length.

However, bitvectors have never been very useful to me because of the
restriction that the bit-XXX operations can only work on arrays
of the same DIMENSIONS.  If this were relaxed and the smaller array was
treated as being extended with zeros I think they would be much more useful.

Chris Eliot


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 16 Jan 89 20:08:04 EST
Received: from ALDERAAN.SCRC.Symbolics.COM ([128.81.41.109]) by SAIL.Stanford.EDU with TCP; 16 Jan 89  16:50:50 PST
Received: from GANG-GANG.SCRC.Symbolics.COM by ALDERAAN.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 260510; Mon 16-Jan-89 19:48:35 EST
Date: Mon, 16 Jan 89 19:48 EST
From: Glenn S. Burke <gsb@ALDERAAN.SCRC.Symbolics.COM>
Subject: Logical Operations on Numbers
To: jonl@lucid.com, ELIOT@cs.umass.EDU
cc: common-lisp@sail.stanford.EDU
In-Reply-To: <8901150357.AA10940@bhopal>
Message-ID: <19890117004819.7.GSB@ANNISQUAM.SCRC.Symbolics.COM>

    Date: Sat, 14 Jan 89 19:57:36 PST
    From: Jon L White <jonl@lucid.com>

    For what it's worth, Johan DeKleer at Xerox PARC asked for just such
    functionality back in 1984.  I don't remember what the public response
    was then -- I seem to remember everyone trying to write clever, short
    code sequences that would "do the trick".  But the gaping hole still
    stands.  If just one more person seems to thinkg it is a good idea,
    then that should carry much force with the X3J13 committee.

    -- JonL --

Logical subsetp is in the critical path of a peephole optimizer i just
wrote.  For efficiency reasons, though, the code was reorganized so that
in any given instantiation the size was fixed, and some complicated
macrology ends up turning things into manipulation of lists of fixnums.
(here's an application for the fixnum type which can enhance
portability...)

I could see having this kind of predicate for both integers and
bitvectors, and could imagine a sufficiently powerful compiler handling
it (and other bit and logical operations) efficiently.


Received: from MCC.COM (TCP 1200600076) by AI.AI.MIT.EDU 15 Jan 89 14:50:53 EST
Received: from AMMON.ACA.MCC.COM by MCC.COM with TCP/SMTP; Fri 13 Jan 89 12:40:11-CST
Date: Fri, 13 Jan 89 12:59 CST
From: Clive B. Dawson <ai.clive@MCC.COM>
Subject: Test message
Message-ID: <19890113185930.2.CLIVE@AMMON.ACA.MCC.COM>
bcc: CLisp-Dis@MCC.COM

This message is just a test of a future common lisp mail distribution point from
MCC.COM.  Please disregard this message.

Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 14 Jan 89 23:21:14 EST
Received: from lucid.com by SAIL.Stanford.EDU with TCP; 14 Jan 89  19:59:57 PST
Received: from bhopal ([192.9.200.13]) by heavens-gate.lucid.com id AA03650g; Sat, 14 Jan 89 19:55:17 PST
Received: by bhopal id AA10940g; Sat, 14 Jan 89 19:57:36 PST
Date: Sat, 14 Jan 89 19:57:36 PST
From: Jon L White <jonl@lucid.com>
Message-Id: <8901150357.AA10940@bhopal>
To: ELIOT@cs.umass.EDU
Cc: common-lisp@sail.stanford.EDU
In-Reply-To: ELIOT@cs.umass.EDU's message of Thu, 12 Jan 89 15:31 EST <8901122046.AA00579@crash.cs.umass.edu>
Subject: Logical Operations on Numbers

For what it's worth, Johan DeKleer at Xerox PARC asked for just such
functionality back in 1984.  I don't remember what the public response
was then -- I seem to remember everyone trying to write clever, short
code sequences that would "do the trick".  But the gaping hole still
stands.  If just one more person seems to thinkg it is a good idea,
then that should carry much force with the X3J13 committee.

-- JonL --


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 13 Jan 89 20:24:21 EST
Received: from fs3.cs.rpi.edu by SAIL.Stanford.EDU with TCP; 13 Jan 89  17:09:41 PST
Received: by fs3.cs.rpi.edu (5.54/1.2-RPI-CS-Dept)
	id AA11907; Fri, 13 Jan 89 20:05:15 EST
Date: Fri, 13 Jan 89 17:30:43 EST
From: harrisr@turing.cs.rpi.edu (Richard Harris)
Received: by turing.cs.rpi.edu (4.0/1.2-RPI-CS-Dept)
	id AA05864; Fri, 13 Jan 89 17:30:43 EST
Message-Id: <8901132230.AA05864@turing.cs.rpi.edu>
To: RWK%FUJI.ILA.Dialnet.Symbolics.Com@riverside.scrc.symbolics.com,
        common-lisp@sail.stanford.edu
Subject: Re: commonlisp types

  Date: Mon, 9 Jan 89 21:42 EST
  From: Robert W. Kerns <RWK@F.ILA.Dialnet.Symbolics.COM>


  OK, next question:  Does it open-code or otherwise optimize TYPEP, or
  just call TYPEP on the list?
KCL just calls TYPEP on the list.

One of the patches that I have made to KCL is a version of TYPEP
that open-codes when the type is a constant, but my patch has the bug.

Richard Harris


Received: from life.ai.mit.edu (TCP 20015020120) by AI.AI.MIT.EDU 13 Jan 89 03:35:52 EST
Received: from ai.ai.mit.edu by life.ai.mit.edu; Fri, 13 Jan 89 03:25:00 EST
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Jan 89  23:41:27 PST
Date: Thu, 12 Jan 89 22:17:09 PST
From: Thom Linden <baggins@ibm.com>
To: Common Lisp mailing <common-lisp@sail.stanford.edu>
Message-Id: <890112.221709.baggins@almvma>
Subject: cs proposal part 3 of 3


%----------------------------------------------------------------------
\setcounter{section}{9}
\section{Symbols}                           % 10
%----------------------------------------------------------------------

\edithead {\csdag 3 (p163)}
\editstart
\\ \bf replace &
\cltxt
  It is ordinarily not permitted to alter a symbol's print name.
\\ \bf with &
\cltxt
  It is an error to alter a symbol's print name.
\editend

\setcounter{subsection}{1}
\subsection{The Print Name} % 10.2.

\edithead {\csdag 5 (p168)}
\editstart
\\ \bf replace &
\cltxt
  It is an extremely bad idea
\\ \bf with &
\cltxt
  It is an error and an extremely bad idea
\editend

%----------------------------------------------------------------------
\setcounter{section}{10}
\section{Packages}                           % 11
%----------------------------------------------------------------------

\setcounter{subsection}{6}
\subsection{Package System Functions and Variables} % 11.7.

\edithead {\csdag 31 (p184,intern)}
\editstart
\\ \bf append &
\cltxt
  All strings, base and extended, are acceptable {\em string}
  arguments.
\editend

%----------------------------------------------------------------------
\setcounter{section}{12}
\section{Characters}                        % 13
%----------------------------------------------------------------------

\edithead {\csdag 6 after (p233)}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd char-code-limit}   [{\clkwd Constant}]
\\ &
  The value of {\clkwd char-code-limit} is a non-negative integer
  that is the upper exclusive bound on values produced by the
  function {\clkwd char-code}, which returns the {\em code}
  of a given character; that is, the values returned by
  {\clkwd char-code} are non-negative and strictly less than
  the value of {\clkwd char-code-limit}.
  There may be unassigned codes between 0 and
  {\clkwd char-code-limit} which
  are not legal arguments to {\clkwd code-char}.
\\  &
\cltxt
  {\clkwd char-index-limit {\em registry}}   [{\clkwd Function}]
\\ &
  This function returns a non-negative integer
  that is the upper exclusive bound on values produced by the
  function {\clkwd char-index} for the specified {\em registry}.
  There may be unsupported index values between 0 and
  {\clkwd char-index-limit}, i.e.
  {\clkwd (find-char {\em registry index})} may return {\clkwd nil}.
\\  &
\cltxt
  {\clkwd *all-registry-names*}   [{\clkwd Constant}]
\\ &
  The value of {\clkwd *all-registry-names*} is a list of
  all character registry names supported by the implementation.
  Only Common LISP Character Registry names or implementation
  defined character registries may be included in this list.
  In particular, "base" and "standard" are not character registry
  names and must not be included.
\editend

\setcounter{subsection}{0}
\subsection{Character Attributes} % 13.1.

\edithead {\csdag delete entire section (p233)}
\editstart
\editend

\setcounter{subsection}{1}
\subsection{Predicates on Characters} % 13.2.


\edithead {\csdag 3 (p234)}
\editstart
\\ \bf replace &
\cltxt
  argument is a "standard character" that is, an object of type
  {\clkwd standard-char}.
   Note that any character with a non-zero {\em bits} or {\em font}
   attribute
   is non-standard.
\\ \bf with &
\cltxt
  argument is one of the Common LISP standard character subrepertoire.
\editend
\\
\edithead {\csdag 4 (p234)}
\editstart
\\ \bf delete &
\cltxt
  Note that any character with non-zero ...
\editend
\\
\edithead {\csdag 6 (p235)}
\editstart
\\ \bf replace &
\cltxt
  Of the standard characters all but \#$\backslash${\clkwd Newline}
  are graphic.
  The semi-standard characters \#$\backslash${\clkwd Backspace},
  \#$\backslash${\clkwd Tab},
  \#$\backslash${\clkwd Rubout},
  \#$\backslash${\clkwd Linefeed},
  \#$\backslash${\clkwd Return},
  and \#$\backslash${\clkwd Page} are not graphic.
\\ \bf with &
\cltxt
  Of the standard characters all but \#$\backslash${\clkwd Newline}
  are graphic.
\editend
\\
\edithead {\csdag 7 (p235)}
\editstart
\\ \bf delete &
\cltxt
  Programs may assume that graphic ...
\editend
\\
\edithead {\csdag 8 (p235)}
\editstart
\\ \bf delete &
\cltxt
  Any character with a non-zero bits...
\editend
\\
\edithead {\csdag 9 (p235)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd string-char-p} ...
\editend
\\
\edithead {\csdag 10 (p235)}
\editstart
\\ \bf delete &
\cltxt
  The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 13 (p235)}
\editstart
\\ \bf replace &
\cltxt
  If a character is alphabetic, then it is perforce graphic.  Therefore
  any character
  with a non-zero bits attribute cannot be alphabetic.  Whether a
  character is
  alphabetic is may depend on its font number.
\\ \bf with &
\cltxt
  If a character is alphabetic, then it is perforce graphic.
\editend
\\
\edithead {\csdag 22 (p236)}
\editstart
\\ \bf replace &
\cltxt
  If a character is either uppercase or lowercase, it is necessarily
  alphabetic (and
  therefore is graphic, and therefore has a zero bits attribute).
  However, it is permissible in theory for an alphabetic character
  to be neither
  uppercase nor lowercase (in a non-Roman font, for example).
\\ \bf with &
\cltxt
  If a character is either uppercase or lowercase, it is necessarily
  alphabetic (and
  therefore is graphic).
\editend
\\
\edithead {\csdag 25 (p236)}
\editstart
\\ \bf replace &
\cltxt
  The argument {\em char} must be a character object, and {\em radix}
  must be a non-negative
  integer. If {\em char} is not a digit of the radix specified
\\ \bf with &
\cltxt
  The argument {\em char} must be in the standard character
  subrepertoire and
  {\em radix} must be a non-negative integer.
  If {\em char} is not a standard character or is not a digit of the
  radix specified
\editend
\\
\edithead {\csdag 51 (p237)}
\editstart
\\ \bf delete &
\cltxt
  If two characters have the same bits ...
\editend
\\
\edithead {\csdag 52 (p237)}
\editstart
\\ \bf replace &
\cltxt
  If two characters differ in any attribute (code, bits, or font), then
  they are different.
\\ \bf with &
\cltxt
  If the codes of two characters differ, then
  they are different.
\editend
\\
\edithead {\csdag 94 (p239)}
\editstart
\\ \bf replace &
\cltxt
  The predicate {\clkwd char-equal} is like {\clkwd char=}, and
  similarly for the others, except
  according to a different ordering such that differences of bits
  attributes and case are ignored, and font information is taken into
  account in an implementation dependent manner.
\\ \bf with &
\cltxt
  The predicate {\clkwd char-equal} is like {\clkwd char=}, and
  similarly for the others, except
  according to a different ordering such that differences of case
  are ignored.
\editend
\\
\edithead {\csdag 97 example (p239)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (char-equal \#$\backslash$A \#$\backslash$Control-A) is true}
\editend
\\
\edithead {\csdag 98 (p239)}
\editstart
\\ \bf delete &
\cltxt
  The ordering may depend on the font ...
\editend

\setcounter{subsection}{2}
\subsection{Character Construction and Selection} % 13.3.

\edithead {\csdag 3 (p239)}
\editstart
\\ \bf replace &
\cltxt
  The argument {\em char} must be a character object.
  {\clkwd char-code} returns the {\em code} attribute of the
  character object;
  this will be a non-negative integer less than the (normal) value
\\ \bf with &
\cltxt
  The argument {\em char} must be a character object.
  {\clkwd char-code} returns the {\em code} of the
  character object;
  this will be a non-negative integer less than the value
\editend
\\
\edithead {\csdag 4 (p240)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd char-bits } ...
\editend
\\
\edithead {\csdag 5 (p240)}
\editstart
\\ \bf delete &
\cltxt
  The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 6 (p240)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd char-font } ...
\editend
\\
\edithead {\csdag 7 (p240)}
\editstart
\\ \bf delete &
\cltxt
  The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 8 (p240)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd code-char {\em code} \&optional {\em (bits 0) (font 0)}
  [{\em Function}]}
\\ \bf with &
\cltxt
  {\clkwd code-char {\em code}
  [{\em Function}]}
\editend
\\
\edithead {\csdag 9 (p240)}
\editstart
\\ \bf replace &
\cltxt
  All three arguments must be non-negative integers.  If it is possible
  in the
  implementation to construct a character object whose code attribute
  is {\em code},
  whose
  bits attribute is {\em bits}, and whose font attribute is {\em font},
  then such an object
  is returned;
\\ \bf with &
\cltxt
  The argument must be a non-negative integer.  If it is possible
  in the
  implementation to construct a character object identified by
  {\em code},
  then such an object is returned;
\editend
\\
\edithead {\csdag 10 (p240)}
\editstart
\\ \bf replace &
\cltxt
  For any integers, {\em c, b,} and {\em f}, if {\clkwd (code-char
  {\em c b f})} is
\\ \bf with &
\cltxt
  For any integer, {\em c}, if {\clkwd (code-char
  {\em c})} is
\editend
\\
\edithead {\csdag 12 (p240)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (char-bits (code-char } ...
\editend
\\
\edithead {\csdag 13 (p240)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (char-font (code-char } ...
\editend
\\
\edithead {\csdag 14 (p240)}
\editstart
\\ \bf delete &
\cltxt
  If the font and bits attributes ...
\editend
\\
\edithead {\csdag 15 (p240)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (char= (code-char (char-code ...}
\editend
\\
\edithead {\csdag 16 (p240)}
\editstart
\\ \bf delete &
\cltxt
  is true.
\editend
\\
\edithead {\csdag 17 (p240)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd make-char} ...
\editend
\\
\edithead {\csdag 18 (p240)}
\editstart
\\ \bf delete &
\cltxt
 The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 19 (p240)}
\editstart
\\ \bf delete &
\cltxt
 If {\em bits} or {\em font} are zero ...
\editend
\\
\edithead {\csdag 19 (p240)}
\editstart
\\ \bf append &
\cltxt
  {\clkwd find-char} {\em index registry}    [{\em Function}]
\\ &
  {\clkwd find-char} returns a character object.
  {\em index} is an integer
  value uniquely identifying a character within the character
  registry name {\em registry}.
  If the implementation does not support the specified
  character, {\clkwd nil} is returned.
\editend

\setcounter{subsection}{3}
\subsection{Character Conversions} % 13.4.

\edithead {\csdag 8 (p241)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd char-upcase} returns a character object with the same
  font and bits attributes as {\em char}, but with possibly a
  different code attribute.
\\ \bf with &
\cltxt
  {\clkwd char-upcase} returns a character object with possibly
  a different code.
\editend
\\
\edithead {\csdag 10 (p241)}
\editstart
\\ \bf replace &
\cltxt
  Similarly, {\clkwd char-downcase} returns a character object with the
  same font and bits attributes as {\em char}, but with possibly a
  different code attribute.
\\ \bf with &
\cltxt
  Similarly, {\clkwd char-downcase} returns a character object with
  possibly a different code.
\editend
\\
\edithead {\csdag 12 (p241)}
\editstart
\\ \bf delete &
\cltxt
  Note that the action of ...
\editend
\\
\edithead {\csdag 13 (p241)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd digit-char {\em weight} \&optional ({\em radix} 10)
  ({\em font} 0)      [{\em Function}]}
\\ \bf with &
\cltxt
  {\clkwd digit-char {\em weight} \&optional ({\em radix} 10)
       [{\em Function}]}
\editend
\\
\edithead {\csdag 14 (p241)}
\editstart
\\ \bf replace &
\cltxt
  All arguments must be integers.  {\clkwd digit-char} determines
  whether or not it is
  possible
  to construct a character object whose font attribute is {\em font},
  and whose {\em code}
\\ \bf with &
\cltxt
  All arguments must be integers.  {\clkwd digit-char} determines
  whether or not it is
  possible to construct a character object whose {\em code}
\editend
\\
\edithead {\csdag 15 (p242)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd digit-char} cannot return {\clkwd nil} if {\em font}
  is zero, {\em radix}
\\ \bf with &
\cltxt
  {\clkwd digit-char} cannot return {\clkwd nil}.
  {\em radix}
\editend
\\
\edithead {\csdag 22 (p242)}
\editstart
\\ \bf delete &
\cltxt
  Note that no argument is provided for ...
\editend
\\
\edithead {\csdag 23 through 30 (p242, char-int, int-char)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd char-int} {\em char}
\editend
\\
\edithead {\csdag 32 (p242)}
\editstart
\\ \bf replace &
\cltxt
  All characters that have zero font and bits attributes and that are
  non-graphic
\\ \bf with &
\cltxt
  All characters that are
  non-graphic
\editend
\\
\edithead {\csdag 33 (p243)}
\editstart
\\ \bf replace &
\cltxt
  The standard newline and space characters have the respective
  names {\clkwd Newline} and {\clkwd Space}.  The semi-standard
  characters have the names {\clkwd Tab, Page, Rubout, Linefeed,
  Return,} and {\clkwd Backspace}.
\\ \bf with &
\cltxt
  The standard newline and space characters have the respective
  names {\clkwd Newline} and {\clkwd Space}.
\editend
\\
\edithead {\csdag 35 (p243)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd char-name} will only locate "simple" ...
\editend
\\
\edithead {\csdag 36 (p243)}
\editstart
\\ \bf append &
\cltxt
  {\clkwd name-char} may accept other names for characters
  in addition to those returned by {\clkwd char-name}.
\editend
\\
\edithead {\csdag 36 (p243)}
\editstart
\\ \bf append &
\cltxt
  {\clkwd char-registry} {\em char}    [{\em Function}]
\\ &
  {\clkwd char-registry} returns a string value representing
  the character registry to which {\em char} belongs.
\editend
\\
\edithead {\csdag 36 (p243)}
\editstart
\\ \bf append &
\cltxt
  {\clkwd char-index} {\em char}    [{\em Function}]
\\ &
  {\clkwd char-index} returns an integer value representing
  the character (registry) index of {\em char}.
\editend

\setcounter{subsection}{4}
\subsection{Character Control-Bit Functions} % 13.5.

\edithead {\csdag delete entire section (p243)}
\editstart
\editend

%----------------------------------------------------------------------
\setcounter{section}{13}
\section{Sequences}                         % 14
%----------------------------------------------------------------------
\setcounter{subsection}{0}
\subsection{Simple Sequence Functions}         % 14.1

\edithead {\csdag 21 (p249,make-sequence)}
\editstart
\\ \bf append &
\cltxt
  If type {\clkwd string} is specified, the result is
  equivalent to {\clkwd make-string}.
\editend

%----------------------------------------------------------------------
\setcounter{section}{17}
\section{Strings}                           % 18
%----------------------------------------------------------------------

\edithead {\csdag 1 (p299)}
\editstart
\\ \bf replace &
\cltxt
  Specifically, the type {\clkwd string} is identical to the type
  {\clkwd (vector string-char),}
  which in turn is the same as {\clkwd (array string-char (*))}.
\\ \bf with &
\cltxt
  Specifically, the type {\clkwd string} is a subtype of
  {\clkwd vector}
  and consists of vectors specialized by subtypes of {\clkwd character}.
\editend

\setcounter{subsection}{0}
\subsection{String Access}  % 18.1.

\edithead {\csdag 3 (p300)}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd sbchar} {\em simple-base-string index}    [{\em Function}]
\editend
\\
\edithead {\csdag 4 (p300)}
\editstart
\\ \bf replace &
\cltxt
  character object.  (This character will necessarily satisfy the
  predicate
  {\clkwd string-char-p}).
\\ \bf with &
\cltxt
  character object.
\editend
\\
\edithead {\csdag 9 (p300)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd setf} may be used with {\clkwd char} to destructively
  replace a character within a string.
\\ \bf with &
\cltxt
  {\clkwd setf} may be used with {\clkwd char} to destructively
  replace a character within a string.
  The new character must be of a type which can be stored in the
  string; it is an error otherwise.
\editend
\\
\edithead {\csdag 10 (p300)}
\editstart
\\ \bf insert &
\cltxt
  For {\clkwd sbchar}, the string must be a simple base string.
  The new character must be of a type which can be stored in the
  string; it is an error otherwise.
\editend

\setcounter{subsection}{2}
\subsection{String Construction and Manipulation}  % 18.3.

\edithead {\csdag 2 (p302)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd make-string {\em size} \&key :initial-element  [{\em Function}]}
\\ \bf with &
\cltxt
  {\clkwd make-string {\em size} \&key :initial-element  :element-type
  [{\em Function}]}
\editend
\\
\edithead {\csdag 3 (p302,make-string)}
\editstart
\\ \bf replace &
\cltxt
  This returns a string (in fact a simple string) of length {\em size},
  each of whose characters has been initialized to the
  {\clkwd :initial-element} argument.  If an {\clkwd :initial-element}
  argument is not specified, then the string will be initialized
  in an implementation-dependent way.
\\ \bf with &
\cltxt
  This returns a string of length {\em size},
  each of whose characters has been initialized to the
  {\clkwd :initial-element} argument.  If an {\clkwd :initial-element}
  argument is not specified, then the string will be initialized
  in an implementation-dependent way.
  The {\clkwd :element-type} argument names the type of the elements
  of the string; a string is constructed of the most specialized
  type that can accommodate elements of the given type.
  If {\clkwd :element-type} is omitted, the type
  {\clkwd simple-string} is the default.
\editend
\\
\edithead {\csdag 5 (p302,make-string)}
\editstart
\\ \bf replace &
\cltxt
  A string is really just a one-dimensional array of "string
  characters" (that is,
  those characters that are members of type {\clkwd string-char}).
  More complex character arrays may be constructed using the function
  {\clkwd make-array}.
\\ \bf with &
\cltxt
  More complex character arrays may be constructed using the function
  {\clkwd make-array}.
\editend
\\
\edithead {\csdag 29 (p304,make-string)}
\editstart
\\ \bf replace &
\cltxt
  If {\em x} is a string character (a character of type
  {\clkwd string-char}), then
\\ \bf with &
\cltxt
  If {\em x} is a character, then
\editend

%----------------------------------------------------------------------
\setcounter{section}{21}
\section{Input/Output}                      % 22

\setcounter{subsection}{0}
\subsection{Printed Representation of LISP Objects}  % 22.1.

\setcounter{subsubsection}{0}
\subsubsection{What the Read Function Accepts}  % 22.1.1.

\edithead {\csdag Table 22-1: Standard Character Syntax Types (p336)}
\editstart
\\ \bf delete entry &
\cltxt
  {\clkwd <tab>} {\em whitespace}
\\ &
  {\clkwd <page>} {\em whitespace}
\\ &
  {\clkwd <backspace>} {\em constituent}
\\ &
  {\clkwd <return>} {\em whitespace}
\\ &
  {\clkwd <rubout>} {\em constituent}
\\ &
  {\clkwd <linefeed>} {\em whitespace}
\editend

\setcounter{subsubsection}{1}
\subsubsection{Parsing of Numbers and Symbols}  % 22.1.2.

\edithead {\csdag Table 22-3: Standard Constituent Character
Attributes (p340)}
\editstart
\\ \bf delete entry &
\cltxt
  {\clkwd <backspace>} {\em illegal}
\\  &
  {\clkwd <tab>} {\em illegal}
\\  &
  {\clkwd <linefeed>} {\em illegal}
\\  &
  {\clkwd <page>} {\em illegal}
\\  &
  {\clkwd <return>} {\em illegal}
\\  &
  {\clkwd <rubout>} {\em illegal}
\editend

\setcounter{subsubsection}{3}
\subsubsection{Standard Dispatching Macro Character Syntax}  % 22.1.4.

\edithead {\csdag Table 22-4: Standard \# Macro Character Syntax (p352)}
\editstart
\\ \bf delete entry &
\cltxt
  {\clkwd \#<backspace>} {\em signals error}
\\  &
  {\clkwd \#<tab>} {\em signals error}
\\  &
  {\clkwd \#<linefeed>} {\em signals error}
\\  &
  {\clkwd \#<page>} {\em signals error}
\\  &
  {\clkwd \#<return>} {\em signals error}
\\  &
  {\clkwd \#<rubout>} {\em undefined}
\editend
\\
\edithead {\csdag 8 (p353)}
\editstart
\\ \bf replace &
\cltxt
  The following names are standard across all implementations:
\\ \bf with &
\cltxt
  All non-graphic
  characters, including extended characters, are uniquely
  named in an implementation-dependent manner.
  The following names are standard across all implementations:
\editend
\\
\edithead {\csdag 11 through 18 inclusive delete (p353)}
\editstart
\\ \bf delete &
\cltxt
  The following names are semi-standard; ...
\editend
\\
\edithead {\csdag 20 through 26 inclusive delete (p354)}
\editstart
\\ \bf delete &
\cltxt
  The following convention is used in implementations ...
\editend
\\
\edithead {\csdag 108 (p360)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd \#<space>, \#<tab>, \#<newline>, \#<page>, \#<return>}
\\ \bf with &
\cltxt
  {\clkwd \#<space>, \#<newline>}
\editend

\setcounter{subsubsection}{4}
\subsubsection{The Readtable}  % 22.1.5.

\edithead {\csdag 3 (p360)}
\editstart
\\ \bf replace &
\cltxt
  Even if an implementation supports characters with non-zero
  {\em bits} and {\em font}
  attributes, it need not (but may) allow for such characters to
  have syntax
  descriptions
  in the readtable.  However, every character of type
  {\clkwd string-char}
  must be represented in the readtable.
\\ \bf with &
\cltxt
  All base and extended characters
  are representable in the readtable.
\editend

\setcounter{subsubsection}{5}
\subsubsection{What the Print Function Produces}  % 22.1.6.

\edithead {\csdag 13 (p366)}
\editstart
\\ \bf replace &
\cltxt
  is used.  For example, the printed representation of the character
  \#$\backslash$A
  with control
  and meta bits on would be \#$\backslash${\clkwd CONTROL-META-A},
  and that of
  \#$\backslash$a with control and meta bits on would be
  \#$\backslash${\clkwd CONTROL-META-$\backslash$a}.
\\ \bf with &
\cltxt
  is used (see 22.1.4).
\editend

\setcounter{subsection}{2}
\subsection{Output Functions}  % 22.3.

\setcounter{subsubsection}{0}
\subsubsection{Output to Character Streams}  % 22.3.1.

\edithead {\csdag 26 (p384)}
\editstart
\\ \bf replace &
\cltxt
  ({\em not} the substring delimited by {\clkwd :start} and
  {\clkwd :end}).
\\ \bf with &
  ({\em not} the substring delimited by {\clkwd :start} and
  {\clkwd :end}).
  Only characters which are members of the coded character set(s)
  associated with the output stream or \#$\backslash${\clkwd Newline}
  are valid to be written;
  it is an error otherwise.  All character streams must provide
  appropriate line division behavior for
  \#$\backslash${\clkwd Newline}.
\editend
\\
\edithead {\csdag 27 after (p384)}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd external-width} {\em object} \&{\clkwd optional}
  {\em output-stream}   [{\em Function}]
\\  &
  {\clkwd external-width} returns the number of host system base
  character units required for the object on the output-stream. If
  not applicable to the output stream, the function
  returns {\clkwd nil}.
  This number corresponds to the current state of the stream
  and may change if there has been intervening output.
  If the output stream is not specified {\clkwd *standard-output*}
  is the default.
\editend

\footnote{
The X3 J13 proposal STREAM-INFO: ONE-DIMENSIONAL-FUNCTIONS
modified to include these semantics is an
acceptable alternative to the {\clkwd external-width} function
proposed here.}

\setcounter{subsubsection}{2}
\subsubsection{Formatted Output to Character Streams}  % 22.3.3.

\edithead {\csdag 23 delete example (p387)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (format nil "Type} $\tilde{ }$
  {\clkwd :C to $\tilde{ }$ :A."} . . .
\editend
\\
\edithead {\csdag 66 (p389)}
\editstart
\\ \bf replace &
\cltxt
  $\tilde{ }${\clkwd :C} spells out the names of the control bits and
  represents non-printing
  characters by their names: {\clkwd Control-Meta-F, Control-Return,
  Space}.
  This is a "pretty" format for printing characters.
\\ \bf with &
\cltxt
  $\tilde{ }${\clkwd :C}
  represents non-printing
  characters by their names: {\clkwd Newline,
  Space}.  This is a "pretty" format
  for printing characters.
\editend
%----------------------------------------------------------------------

%----------------------------------------------------------------------
\setcounter{section}{22}
\section{File System Interface}             % 23

\setcounter{subsection}{1}
\subsection{Opening and Closing Files}  % 23.2.

\edithead {\csdag 2 (p418)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd open {\em filename} \&key :direction :element-type}
  {\clkwd :if-exists :if-does-not-exist}
  [{\em Function}]
\\ \bf with &
\cltxt
  {\clkwd open {\em filename} \&key :direction :element-type}
  {\clkwd
  :external-code-format}
  {\clkwd :if-exists :if-does-not-exist}
  [{\em Function}]
\editend
\\
\edithead {\csdag 11 (p419)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd string-char}
\\  &
  The unit of transaction is a string-character.  The functions
  {\clkwd read-char}
  and/or {\clkwd write-char} may be used on the stream.
\\ \bf with &
\cltxt
  The default value of {\clkwd :element-type} is an
  implementation-defined subtype of character.
\\  &
  {\clkwd base-character}
\\  &
  The unit of transaction is a base character.  The functions
  {\clkwd read-char}
  and/or {\clkwd write-char} may be used on the stream.  This is
  the default.
\editend
\\
\edithead {\csdag 16 (p419)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd character}
\\  &
  The unit of transaction is any character, not just a string-character.
  The functions {\clkwd read-char} and/or {\clkwd write-char} may
  be used on the stream.
\\ \bf with &
\cltxt
  {\clkwd character}
\\  &
  The unit of transaction is any character.
  The functions {\clkwd read-char} and/or {\clkwd write-char} may
  be used on the stream.
\editend
\\
\edithead {\csdag 19 after (p420)}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd :external-code-format}
\\  &
This argument specifies a string or list of
string(s) indicating an implementation recognized scheme for
representing 1 or more coded character sets with non-homogeneous codes.
\\  &
The default value is "default" and is
implementation defined but must include the
base characters.
\\  &
As many coded character set names must be provided as the
implementation requires for that external coding convention.
\\  &
References to standard ISO coded character set names must
include the full ISO reference number and approval year followed
by "ccs".  The following are valid ISO reference names:
"ISO8859/1-1987ccs", "ISO6937/2-1983ccs", "iso646-1983ccs", etc..
All implementation recognized schemes are formed from
{\clkwd standard-p} characters.  Within scheme names,
alphabetic case is ignored.
\editend
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\chapter{Deprecated Language Features}

The X3 J13 Character subcommittee proposal
will cause certain areas of \cite{steele84} to
become obsolete.  We have included in this appendix, potential
additions to the standard document for areas we feel are important
in the interest of compatibility.  The character subcommittee
recommends that the X3 J13 committee as a whole adopt a
policy regarding obsolescence.  This policy
may be to keep the obsolete function in the interest of
compatibility for existing applications, or
to drop the obsolete function completely.  One compromise
is to document these functions in an appendix to the Common LISP
Standard.  The appendix would be for informational use only
and not a part of the standard definition.


%----------------------------------------------------------------------
\setcounter{section}{1}
\section{Data Types}                        % 2
%----------------------------------------------------------------------

\setcounter{subsection}{14}
\subsection{Overlap, Inclusion, and Disjointness of Types} % 2.15.

\edithead {\csdag 14 (p34)}
\editstart
\\ \bf deprecated &
\cltxt
  The type {\clkwd standard-char} is a subtype of
  {\clkwd base-character};
  The type {\clkwd string-char} is implementation defined as either
  {\clkwd base-character} or {\clkwd character}.
\editend

%----------------------------------------------------------------------
\setcounter{section}{12}
\section{Characters}                        % 13
%----------------------------------------------------------------------

\edithead {\csdag throughout}
\editstart
\\ \bf deprecated &
\cltxt
  Earlier versions of Common LISP incorporated {\em font} and
  {\em bits} as attributes of character objects.
  There are several functions which were removed
  from the language or modified by this proposal.
  The deleted functions and constants include:
\begin{itemize}
\item char-font-limit
\item char-bits-limit
\item int-char
\item char-int
\item char-bits
\item char-font
\item make-char
\item char-control-bit
\item char-meta-bit
\item char-super-bit
\item char-hyper-bit
\item char-bit
\item set-char-bit
\end{itemize}
\editend
\\
\edithead {\csdag (p233)}
\editstart
\\ \bf deprecated &
\cltxt
  If supported by an implementation these attributes may
  effect the action of selected functions.  In particular,
  the following effects noted:
\\ &
\begin{itemize}
\item Attributes, such as those
  dealing with how the character is displayed or its typography,
  are not part of the character code.
  For example, bold-face, color
  or size are not considered part of the character code.
\item If two characters differ in any attributes,
  then they are not {\clkwd char=}.
\item If two characters have identical
  attributes, then their ordering by
  {\clkwd char}$<$ is consistent with the numerical ordering by the
  predicate $<$ on
  their code attributes. (Similarly for {\clkwd char}$>$,
  {\clkwd char}$>=$ and {\clkwd char}$<=$.)
\item The effect, if any, on {\clkwd char-equal} of each
  attribute has to be specified as part of
  the definition of that attribute.
\item The effect of {\clkwd char-upcase} and {\clkwd char-downcase}
  is to preserve attributes.
\item The function {\clkwd char-int} is equivalent to {\clkwd char-code}
  if no attributes are associated with
  the character object.
\item The function {\clkwd int-char} is equivalent to {\clkwd code-char}
  if no attributes are associated with
  the character object.
\item It is implementation dependent whether characters within
  double quotes have attributes removed.
\item  It is implementation dependent whether
  attributes are removed from symbol names by {\clkwd read}.
\item  Even if an implementation supports characters with non-zero
  {\em bits} and {\em font}
  attributes, it need not (but may) allow for such characters to
  have syntax descriptions
  in the readtable.
\end{itemize}
\editend


%----------------------------------------------------------------------
\begin{thebibliography}{wwwwwwww 99}


\bibitem[Ida87]{ida87} M. Ida, et al.,
{\em
JEIDA Common LISP Committee Proposal on Embedding Multi-Byte Characters
},
ANSI X3J13 document 87-022, (1987).

\bibitem[ISO 646]{iso646} ISO,
{\em
Information processing -- ISO 7-bit coded character set
for information interchange
},
ISO (1983).

\bibitem[ISO 4873]{iso4873} ISO,
{\em
Information processing -- ISO 8-bit code for information
interchange -- Structure and rules for implementation
},
ISO (1986).

\bibitem[ISO 6937/1]{iso6937/1} ISO,
{\em
Information processing -- Coded character sets for text
communication -- Part 1: General introduction
},
ISO (1983).

\bibitem[ISO 6937/2]{iso6937/2} ISO,
{\em
Information processing -- Coded character sets for text
communication -- Part 2: Latin alphabetic and non-alphabetic
graphic characters
},
ISO (1983).

\bibitem[ISO 8859/1]{iso8859/1} ISO,
{\em
Information processing -- 8-bit single-byte coded
graphic character sets -- Part 1: Latin alphabet No. 1
},
ISO (1987).

\bibitem[ISO 8859/2]{iso8859/2} ISO,
{\em
Information processing -- 8-bit single-byte coded
graphic character sets -- Part 2: Latin alphabet No. 2
},
ISO (1987).

\bibitem[ISO 8859/6]{iso8859/6} ISO,
{\em
Information processing -- 8-bit single-byte coded
graphic character sets -- Part 6: Latin/Arabic alphabet
},
ISO (1987).

\bibitem[ISO 8859/7]{iso8859/7} ISO,
{\em
Information processing -- 8-bit single-byte coded
graphic character sets -- Part 7: Latin/Greek alphabet
},
ISO (1987).

\bibitem[Kerns87]{kerns87} R. Kerns,
{\em
Extended Characters in Common LISP
},
X3J13 Character Subcommittee document, Symbolics Inc (1987).

\bibitem[Kurokawa88]{kurokawa88} T. Kurokawa, et al.,
{\em
Technical Issues on International Character Set Handling in Lisp
},
ISO/IEC SC22 WG16 document N33, (1988).

\bibitem[Linden87]{linden87} T. Linden,
{\em
Common LISP - Proposed Extensions for International Character Set
Handling
},
Version 01.11.87, IBM Corporation (1987).

\bibitem[Steele84]{steele84} G. Steele Jr.,
{\em
Common LISP: the Language
},
Digital Press (1984).

\bibitem[Xerox87]{xerox87} Xerox,
{\em
Character Code Standard, Xerox System Integration Standard
},
Xerox Corp. (1987).

\end{thebibliography}

\end{document}             % End of document.


Received: from life.ai.mit.edu (TCP 20015020120) by AI.AI.MIT.EDU 13 Jan 89 03:33:43 EST
Received: from ai.ai.mit.edu by life.ai.mit.edu; Fri, 13 Jan 89 03:22:44 EST
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Jan 89  23:40:00 PST
Date: Thu, 12 Jan 89 22:16:40 PST
From: Thom Linden <baggins@ibm.com>
To: Common Lisp mailing <common-lisp@sail.stanford.edu>
Message-Id: <890112.221640.baggins@almvma>
Subject: cs proposal part 2 of 3

%----------------------------------------------------------------------
% split into three parts this time as mailer had problems
%----------------------------------------------------------------------
%----------------------------------------------------------------------

\newcommand{\edithead}{\begin{tabular}{l p{3.95in}}
  \multicolumn{2}{l} }

\newcommand{\csdag}{\bf$\Rightarrow$\ddag}

\newcommand{\editstart}{}

\newcommand{\editend}{\\ & \end{tabular}}

%----------------------------------------------------------------------
%----------------------------------------------------------------------
\appendix
\chapter{Editorial Modifications to CLtL}

The following sections specify the editorial changes needed in
CLtL to support the proposal.  Section/subsection numbers and titles
match those found in \cite{steele84}.  The notation
{\csdag x (pn, function)} denotes a reference to paragraph x within the
subsection (we count each individual example or metastatement
as 1 paragraph of text).  Also, {\bf (pn, function)}, or simply
{\bf (pn)} is included as an additional
aid to the reader indicating the page number and function modified.
When an entire paragraph is deleted,
the first few words of the paragraph is noted.

If a section or paragraph of CLtL is {\em not} referenced,
no editorial changes are required to support this proposal.
\footnote{This may be an over optimistic statement since the changes
are fairly pervasive.  The editor should take the sense of
Chapter 1 into account in resolving any discrepancies.}

%----------------------------------------------------------------------
\setcounter{section}{1}
\section{Data Types}                        % 2
%----------------------------------------------------------------------


\edithead {\csdag 8 (p12)}
\editstart
\\ \bf replace &
\cltxt
   provides for a
   rich character set, including ways to represent characters of various
   type styles.
\\ \bf with &
\cltxt
   provides support for international language characters as well
   as characters used in specialized arenas, eg. mathematics.
\editend

\setcounter{subsection}{1}
\subsection{Characters}                     % 2.2.

\edithead {\csdag 1 (p20)}
\editstart
\\ \bf replace &
\cltxt
  Characters are represented as data objects of type {\clkwd character}.
  There are two subtypes of interest, called
  {\clkwd standard-char} and {\clkwd string-char}.
\\ \bf with &
\cltxt
  Characters are represented as data objects of type
  {\clkwd character}.
\editend
\\
\edithead {\csdag 2 (p20)}
\editstart
\\ \bf replace &
\cltxt
  This works well enough for printing characters. Non-printing
  characters
\\ \bf with &
\cltxt
  This works well enough for graphic characters.  Non-graphic
  characters
\editend

\subsubsection{Standard Characters}         % 2.2.1.

\edithead {\csdag 0 section heading (p20)}
\editstart
\\ \bf replace &
\cltxt
  Standard Characters
\\ \bf with &
\cltxt
  Base Characters
\editend
\\
\edithead {\csdag 1 before (p20)}
\editstart
\\ \bf insert &
\cltxt
  A {\em character repertoire} defines a collection of characters
  independent of their specific rendered image or font.
  Character
  repertoires are specified independent of coding and their characters
  are only identified with a unique label, a graphic symbol, and
  a character description.
  A {\em coded character set} is a character repertoire plus
  an {\em encoding} providing a unique mapping between each character
  and a number which serves as the character representation.
\\ &
  Many computers have some "base" coded character set
  (often a variant of ISO646-1983)
  which is a function
  of hardware instructions for dealing with characters, as well as
  the organization of
  the file system.  This base character representation is likely
  to be the smallest
  transaction unit permitted for text stream I/O operations.
\\ &
  The {\em base character repertoire} is used to refer to
  the collection of characters represented by
  the base coded character set.  Common LISP does
  not define the base
  character encoding
  but does require all implementations to support a "standard"
  {\em subrepertoire} of the base character
  repertoire.
\editend
\\
\edithead {\csdag 1 before (p20)}
\editstart
\\ \bf insert &
\cltxt
  The {\clkwd base-character} type is defined as a subtype of
  {\clkwd character}.  A {\clkwd base-character}
  object can contain any member of the base character repertoire.
  Objects of type
  {\clkwd (and character (not base-character))} are referred to
  as {\em extended characters}.
\editend
\\
\edithead {\csdag 1 (p20)}
\editstart
\\ \bf delete &
\cltxt
  Common LISP defines a "standard character set" ...
\editend
\\
\edithead {\csdag 1 (P20)}
\editstart
\\ \bf new &
\cltxt
  As a subset of the base character repertoire,
  Common LISP defines a standard character
  subrepertoire for two purposes.
  Common LISP programs that are written in the
  standard character subrepertoire
  can be read by any Common LISP implementation; and Common LISP
  programs
  that use only standard characters as data objects are most likely
  to be portable.
  The standard characters are not defined by their glyphs, but by their
  roles within
  the language.  There are two aspects to the roles of the
  standard characters:
  one is their role in reader and format control
  string syntax; the second is their role as
  components of the names of all Common LISP
  functions, macros, constants, and global
  variables.  As long as an implementation chooses 96 glyphs
  and treats those 96 in a manner
  consistent with the language's specification for the standard characters
  (for example,
  the naming of functions),
  it doesn't matter what glyphs the I/O
  hardware uses to
  represent those characters: they are
  the standard characters.  Any program or
  data text written wholly
  in those characters
  is portable through simple code conversion.
  The Common LISP
  standard character subrepertoire consists of
  a newline \#$\backslash${\clkwd Newline}, the
  graphic space character \#$\backslash${\clkwd Space},
  and the following additional
  ninety-four graphic characters or their equivalents:
\editend
\\
\edithead {\csdag 2 (p21)}
\editstart
\\ \bf delete &
\cltxt
  ! " \# ...
\editend
\\
\edithead {\csdag 2 new (p21)}
\editstart
\\ &
  {\bf Common LISP Standard Character Subrepertoire}
\editend
\footnote{\cltxt \#$\backslash${\clkwd Space}
and \#$\backslash${\clkwd Newline} are omitted.
graphic labels and descriptions are from ISO 6937/2.
The first letter of the graphic label categorizes the
character as follows: L - Latin, N - Numeric, S - Special
.}
\\
{\small \begin{tabular}{||l|c|l||l|c|l||}    \hline
  ID     &    Glyph    &  Name or description
& ID     &    Glyph    &  Name or description
\\ \hline
  LA01  &  a  &  small a
& ND01  &  1  &  digit 1
\\ \hline
  LA02  &  A  &  capital A
& ND02  &  2  &  digit 2
\\ \hline
  LB01  &  b  &  small b
& ND03  &  3  &  digit 3
\\ \hline
  LB02  &  B  &  capital B
& ND04  &  4  &  digit 4
\\ \hline
  LC01  &  c  &  small c
& ND05  &  5  &  digit 5
\\ \hline
  LC02  &  C  &  capital C
& ND06  &  6  &  digit 6
\\ \hline
  LD01  &  d  &  small d
& ND07  &  7  &  digit 7
\\ \hline
  LD02  &  D  &  capital D
& ND08  &  8  &  digit 8
\\ \hline
  LE01  &  e  &  small e
& ND09  &  9  &  digit 9
\\ \hline
  LE02  &  E  &  capital E
& ND10  &  0  &  digit 0
\\ \hline
  LF01  &  f  &  small f
& SC03  &  \$    &  dollar sign
\\ \hline
  LF02  &  F  &  capital F
& SP02  &  !     &  exclamation mark
\\ \hline
  LG01  &  g  &  small g
& SP04  &  "     &  quotation mark
\\ \hline
  LG02  &  G  &  capital G
& SP05  &  \apostrophe     &  apostrophe
\\ \hline
  LH01  &  h  &  small h
& SP06  &  (     &  left parenthesis
\\ \hline
  LH02  &  H  &  capital H
& SP07  &  )     &  right parenthesis
\\ \hline
  LI01  &  i  &  small i
& SP08  &  ,     &  comma
\\ \hline
  LI02  &  I  &  capital I
& SP09  &  \_    &  low line
\\ \hline
  LJ01  &  j  &  small j
& SP10  &  -     &  hyphen or minus sign
\\ \hline
  LJ02  &  J  &  capital J
& SP11  &  .     &  full stop, period
\\ \hline
  LK01  &  k  &  small k
& SP12  &  /     &  solidus
\\ \hline
  LK02  &  K  &  capital K
& SP13  &  :     &  colon
\\ \hline
  LL01  &  l  &  small l
& SP14  &  ;     &  semicolon
\\ \hline
  LL02  &  L  &  capital L
& SP15  &  ?     &  question mark
\\ \hline
  LM01  &  m  &  small m
& SA01  &  +     &  plus sign
\\ \hline
  LM02  &  M  &  capital M
& SA03  &  $<$   &  less-than sign
\\ \hline
  LN01  &  n  &  small n
& SA04  &  =   &  equals sign
\\ \hline
  LN02  &  N  &  capital N
& SA05  &  $>$   &  greater-than sign
\\ \hline
  LO01  &  o  &  small o
& SM01  &  \#    &  number sign
\\ \hline
  LO02  &  O  &  capital O
& SM02  &  \%    &  percent sign
\\ \hline
  LP01  &  p  &  small p
& SM03  &  \&    &  ampersand
\\ \hline
  LP02  &  P  &  capital P
& SM04  &  *     &  asterisk
\\ \hline
  LQ01  &  q  &  small q
& SM05  &  @     &  commercial at
\\ \hline
  LQ02  &  Q  &  capital Q
& SM06  &  [     &  left square bracket
\\ \hline
  LR01  &  r  &  small r
& SM07  &  $\backslash$   &  reverse solidus
\\ \hline
  LR02  &  R  &  capital R
& SM08  &  ]     &  right square bracket
\\ \hline
  LS01  &  s  &  small s
& SM11  &  \{    &  left curly bracket
\\ \hline
  LS02  &  S  &  capital S
& SM13  &  $|$     &  vertical bar
\\ \hline
  LT01  &  t  &  small t
& SM14  &  \}    &  right curly bracket
\\ \hline
  LT02  &  T  &  capital T
& SD13  &  \bq   &  grave accent
\\ \hline
  LU01  &  u  &  small u
& SD15  &  $\hat{ }$  &  circumflex accent
\\ \hline
  LU02  &  U  &  capital U
& SD19  &  $\tilde{ }$ &  tilde
\\ \hline
  LV01  &  v  &  small v
& & &
\\ \hline
  LV02  &  V  &  capital V
& & &
\\ \hline
  LW01  &  w  &  small w
& & &
\\ \hline
  LW02  &  W  &  capital W
& & &
\\ \hline
  LX01  &  x  &  small x
& & &
\\ \hline
  LX02  &  X  &  capital X
& & &
\\ \hline
  LY01  &  y  &  small y
& & &
\\ \hline
  LY02  &  Y  &  capital Y
& & &
\\ \hline
  LZ01  &  z  &  small z
& & &
\\ \hline
  LZ02  &  Z  &  capital Z
& & &
\\
\hline
\end{tabular} }
\\
\edithead {\csdag 3 (p21)}
\editstart
\\ \bf delete &
\cltxt
  @ A B C...
\editend
\\
\edithead {\csdag 4 (p21)}
\editstart
\\ \bf delete &
\cltxt
  \bq a b c...
\editend
\\
\edithead {\csdag 5 (p21)}
\editstart
\\ \bf delete &
\cltxt
  The Common LISP Standard character set is apparently ...
\editend
\\
\edithead {\csdag 6 (p21)}
\editstart
\\ \bf replace &
\cltxt
  Of the ninety-four non-blank printing characters
\\ \bf with &
\cltxt
  Of the ninety-five graphic characters
\editend
\\
\edithead {\csdag 9 (p21)}
\editstart
\\ \bf delete &
\cltxt
  The following characters are called ...
\editend
\\
\edithead {\csdag 10 (p21)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd \#$\backslash$Backspace \#$\backslash$Tab } ...
\editend
\\
\edithead {\csdag 11 (p21)}
\editstart
\\ \bf delete &
\cltxt
  Not all implementations of Common ...
\editend

\subsubsection{Line Divisions}              % 2.2.2.

\edithead {\csdag 6 (p22)}
\editstart
\\ \bf replace &
\cltxt
  a two-character sequence, such as
  {\clkwd \#$\backslash$Return } and then
  {\clkwd \#$\backslash$Newline },
  is not acceptable,
\\ \bf with &
\cltxt
  a two-character sequence is not acceptable,
\editend
\\
\edithead {\csdag 8 (p22)}
\editstart
\\ \bf delete &
\cltxt
  Implementation note: If an implementation uses ...
\editend

\subsubsection{Non-standard Characters}     % 2.2.3.

\edithead {\csdag delete entire section (p23)}
\editstart
\editend

\subsubsection{Character Attributes}        % 2.2.4.

\edithead {\csdag 0 section heading (p23)}
\editstart
\\ \bf replace &
\cltxt
  Character Attributes
\\ \bf with &
\cltxt
  Character Identity
\editend
\\
\edithead {\csdag 1 through 8 (p23)}
\editstart
\\ \bf delete all paragraphs&
\cltxt
  Every object of type {\clkwd character} ...
\editend
\\
\edithead {\csdag 1 (p23)}
\editstart
\\ \bf new &
\cltxt
Common LISP
characters are partitioned into a unique collection of
repertoires called {\em
Character Registries}.  That is, each character is included
in one and only one Character Registry.  The label identifying
each character within a Character Registry is a unique numerical value
referred to as the {\em character index}.
\\ &
Characters are uniquely distinguished by their codes,
which are drawn from the set of
non-negative integers.  That is, within Common LISP
a unique numerical code
is assigned to each semantically different character.
Character codes are composed from a Character Registry and a
character index.  The convention by which a character index and
Character Registry compose a character code is implementation
dependent.
\editend

\subsubsection{String Characters}           % 2.2.5.

\edithead {\csdag delete entire section (p23)}
\editstart
\editend

\setcounter{subsection}{4}
\subsubsection{Character Registries}           % 2.2.5.

\edithead {\csdag new section (p23)}
\editstart
\\ \bf new &
\cltxt
Character registries provide portable specifications of
character objects.  Every character object is uniquely
identified by a registry name and index.
Character Registry names are strings formed from the Common LISP
{\clkwd standard-p} characters. Within registry names, alphabetic
case is ignored.
\\ &
Common LISP defines the following Character Registries:
\footnote{This document
defines a partial list of
the Character Registry names.  A subsequent
document will define the complete Common LISP Character Registry
Standard including the effect of the character predicates
{\em alpha-char-p},
{\em lower-case-p}, etc..}
\begin{itemize}
\item Arabic
\item Armenian
\item Bo-po-mo-fo
\item Control
\item Cyrillic
\item Georgian
\item Greek
\item Hangul
\item Hebrew
\item Hiragana
\item Japanese-Punctuation
\item Kanji-JIS-Level-1
\item Kanji-JIS-Level-2
\item Kanji-Gaiji
\item Katakana
\item Latin
\item Latin-Punctuation
\item Mathematical
\item Pattern
\item Phonetic
\item Technical
\end{itemize}
\editend
\\
\edithead {\csdag new section (p23)}
\editstart
\\ \bf new &
\cltxt
The Common LISP Character Registry Standard is fixed;
an implementation
may not extend the set of characters within any Common LISP
Character Registry.
\\ &
An implementation may provide support for all or part of any Common LISP
Character Registry
and may provide new character registries which include characters
having unique semantics (i.e. not defined in any other
implementation-defined character registry or Common LISP Character
Registry).  Implementation registries must be uniquely
named using only {\clkwd standard-p} characters.  In addition,
the repertoire names {\em base} and {\em standard} have
reserved Common LISP usage.
\\ &
An implementation must document the registries it supports.
For each registry supported,
an implementation must define individual characters supported
including at least the following:
\begin{itemize}
\item Character Labels,
Glyphs, and Descriptions.
\item $<$ Common LISP
Character Registry name, character index $>$ pair if one exists
otherwise $<$ implementation-defined
character registry name, character index $>$ pair.
\item Reader Canonicalization.
\item Position in total ordering.
The partial ordering of the Standard alphanumeric
characters must be preserved.
\item Effect of character predicates.
In particular,
\begin{itemize}
\item {\clkwd alpha-char-p}
\item {\clkwd lower-case-p}
\item {\clkwd upper-case-p}
\item {\clkwd both-case-p}
\item {\clkwd graphic-char-p}
\item {\clkwd standard-char-p}
\item {\clkwd alphanumericp}
\end{itemize}
\item Interaction with File I/O.  In particular, the
coded character set standards
\footnote{For example, "ISO8859/1-1987ccs".} and
external encoding schemes
\footnote{For example, {\em "Xerox System Integration Character
Code Standard"}\cite{xerox87}.}
which are supported must be specified.
\end{itemize}
\editend

\subsection{Symbols}                        % 2.3.

\edithead {\csdag 12 (p25)}
\editstart
\\ \bf replace &
\cltxt
  A symbol may have uppercase letters, lowercase letters, or both
  in its print name.
\\ \bf with &
\cltxt
  A symbol may have characters from any supported character registry
  in its print name.
  It may have uppercase letters, lowercase letters, or both.
\editend

\setcounter{subsection}{4}
\subsection{Arrays}
\subsubsection{Vectors}

\edithead {\csdag 6 (p29)}
\editstart
\\ \bf replace &
\cltxt
  All implementations provide specialized arrays for the cases when
  the components are characters (or rather, a special subset of the
  characters);
\\ \bf with &
\cltxt
  All implementations provide specialized arrays for the cases when
  the components are characters (or optionally, special subsets of
  the characters);
\editend

\subsubsection{Strings}

\edithead {\csdag 1 (p30)}
\editstart
\\ \bf replace &
\cltxt
  A string is simply a vector of characters.  More precisely, a string
  is a specialized vector whose elements are of type
  {\clkwd string-char}.
\\ \bf with &
\cltxt
  A string is simply a vector of characters.  More precisely, a string
  is a specialized vector whose elements are of type
  {\clkwd character} or a subtype
  of character.
\editend

\setcounter{subsection}{14}
\subsection{Overlap, Inclusion, and Disjointness of Types} % 2.15.

\edithead {\csdag 14 (p34)}
\editstart
\\ \bf replace &
\cltxt
  The type {\clkwd standard-char} is a subtype of {\clkwd string-char};
  {\clkwd string-char} is a subtype of {\clkwd character}.
\\ \bf with &
\cltxt
  The type {\clkwd base-character} is a subtype of
  {\clkwd character}.
\editend
\\
\edithead {\csdag 15 (p34)}
\editstart
\\ \bf replace &
\cltxt
  The type {\clkwd string} is a subtype of {\clkwd vector},
  for {\clkwd string} means {\clkwd (vector string-char)}.
\\ \bf with &
\cltxt
  The type {\clkwd string} is a subtype of {\clkwd vector},
  {\clkwd string} consists of vectors specialized by subtypes of
  {\clkwd character}.
\editend
\\
\edithead {\csdag 15 after (p34)}
\editstart
\\ \bf insert &
\cltxt
  The type {\clkwd base-string} means
  {\clkwd (vector base-character)}.
\editend
\\
\edithead {\csdag 15 after (p34)}
\editstart
\\ \bf insert &
\cltxt
  The type {\clkwd general-string} means
  {\clkwd (vector character)} and is a subtype of {\clkwd string}.
\editend
\\
\edithead {\csdag 20 (p34)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd (simple-array string-char (*))};
\\ \bf with &
\cltxt
  {\clkwd (simple-array character (*))};
\editend
\\
\edithead {\csdag 20 after (p34)}
\editstart
\\ \bf insert &
\cltxt
  The type {\clkwd simple-base-string} means
  {\clkwd (simple-array base-character (*))} and
  is the most efficient string which can hold
  the standard characters. {\clkwd simple-base-string}
  is a subtype of {\clkwd base-string}.
\editend
\\
\edithead {\csdag 20 after (p34)}
\editstart
\\ \bf insert &
\cltxt
  The type {\clkwd simple-general-string} means
  {\clkwd (simple-array character (*))}.
  {\clkwd simple-general-string}
  is a subtype of {\clkwd general-string}.
\editend


%----------------------------------------------------------------------
\setcounter{section}{3}
\section{Type Specifiers}                   % 4
%----------------------------------------------------------------------
\setcounter{subsection}{1}
\subsection{Type Specifier Lists} % 4.2.


\edithead {\csdag 8 Table 4-1 (alphabetic list) (p43)}
\editstart
\\ \bf remove &
\\ &
\cltxt
  {\clkwd standard-char}
\\ &
  {\clkwd string-char}
\editend
\\
\edithead {\csdag 8 Table 4-1 (alphabetic list) (p43)}
\editstart
\\ \bf insert &
\\ &
\cltxt
  {\clkwd base-character}
\\ &
  {\clkwd general-string}
\\ &
  {\clkwd simple-base-string}
\\ &
  {\clkwd simple-general-string}
\editend

\setcounter{subsection}{2}
\subsection{Predicating Type Specifiers} % 4.3.

\edithead {\csdag 2 (p43)}
\editstart
\\ \bf delete &
\cltxt
  As an example, the entire ...
\editend
\\
\edithead {\csdag 3 delete example (p43)}
\editstart
\\ \bf delete &
\cltxt
  {\clkwd (deftype string-char () } ...
\editend

\setcounter{subsection}{4}
\subsection{Type Specifiers That Specialize} % 4.5.

\edithead {\csdag 5 after (p46)}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd (character {\em registries})}
\\  &
  This denotes a character type specialized to members
  of the specified registries.  {\em registries} may be a
  single character registry name or a list of
  character registry names.
\editend

\setcounter{subsection}{5}
\subsection{Type Specifiers That Abbreviate} % 4.6.

\edithead {\csdag 20 (p49)}
\editstart
\\ \bf replace &
\cltxt
  Means the same as {\clkwd (array string-char ({\em size}))}: the set of
  strings of
  the indicated size.
\\ \bf with &
\cltxt
  Means the union of the vector types specialized by subtypes of
  character
  and the indicated size.
  For the purpose of declaration, it is equivalent to
  {\clkwd (general-string ({\em size}))}.
\editend
\\
\edithead {\csdag 23 (p49)}
\editstart
\\ \bf replace &
\cltxt
  Means the same as {\clkwd (simple-array string-char ({\em size}))}: the
  set of simple strings of the indicated size.
\\ \bf with &
\cltxt
  Means the union of the simple vector types specialized by subtypes of
  character and the indicated size.
  For the purpose of declaration, it is equivalent to
  {\clkwd (simple-general-string ({\em size}))}.
\editend
\\
\edithead {\csdag 23 after (p49)}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd (base-string {\em size})}
\\ &
  Means the same as {\clkwd (array base-character ({\em size}))}: the
  set of base strings of the indicated size.
\\ &
  {\clkwd (simple-base-string {\em size})}
\\ &
  Means the same as {\clkwd (simple-array base-character ({\em size}))}:
  the set of simple base strings of the indicated size.
\editend
\\
\edithead {\csdag 23 after (p49)}
\editstart
\\ \bf insert &
\cltxt
  {\clkwd (general-string {\em size})}
\\ &
  Means the same as {\clkwd (array base-character ({\em size}))}: the
  set of base strings of the indicated size.
\\ &
  {\clkwd (simple-general-string {\em size})}
\\ &
  Means the same as
  {\clkwd (simple-array general-character ({\em size}))}:
  the set of simple general strings of the indicated size.
\editend

\setcounter{subsection}{7}
\subsection{Type Conversion Function} % 4.8.

\edithead {\csdag 6 (p51)}
\editstart
\\ \bf replace &
\cltxt
  Some strings, symbols, and integers may be converted to
  characters.  If {\em object} is a string of length 1,
  then the sole element of the print name is returned.
  If {\em object} is an integer {\em n}, then {\clkwd (int-char }
  {\em n}{\clkwd )} is returned.  See {\clkwd character}.
\\ \bf with &
\cltxt
  Some strings amd symbols may be converted to
  characters.  If {\em object} is a string of length 1,
  then the sole element of the print name is returned.
  See {\clkwd character}.
\editend
\\
\edithead {\csdag 6 after (p52)}
\editstart
\\ \bf insert &
\begin{itemize}
\cltxt
\item Any string subtype may be converted to any other string
subtype, provided the new string can contain all actual
elements of the old string.  It is an error if it cannot.
\end{itemize}
\editend


%----------------------------------------------------------------------
\setcounter{section}{5}
\section{Predicates}                        % 6
%----------------------------------------------------------------------
\edithead {\csdag 2 (p71)}
\editstart
\\ \bf replace &
\cltxt
  but {\clkwd standard-char} begets {\clkwd standard-char-p}
\\ \bf with &
\cltxt
  but {\clkwd bit-vector} begets {\clkwd bit-vector-p}
\editend

\setcounter{subsection}{1}
\subsection{Data Type Predicates} % 6.2.

\setcounter{subsubsection}{1}
\subsubsection{Specific Data Type Predicates} % 6.2.2.

\edithead {\csdag 36 (p75)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd characterp} {\em object}
\\ \bf with &
\cltxt
  {\clkwd characterp} {\em object} \&{\clkwd optional}
  {\em repertoire}
\editend
\\
\edithead {\csdag 37 (p75)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd characterp} is true if its argument is a character,
  and otherwise is false.
\\ \bf with &
\cltxt
  If {\em repertoire} is omitted, {\clkwd characterp}
  is true if its argument is a character object,
  and otherwise is false.
  If a {\em repertoire} argument is specified,
  {\clkwd characterp} is true if its argument
  is a character object and a member of the specified repertoire,
  and
  otherwise is false.
  For example, {\clkwd (characterp  \#$\backslash$A}
  {\clkwd "Latin")}
  is true since \#$\backslash$A is a member of the Common LISP
  Latin Character Registry.  {\em repertoire} may be any supported
  character registry name or the reserved repertoire names
  "base" and "standard". {\clkwd (characterp x "base")} is
  true if its argument is a member of the base character
  repertoire and false
  otherwise.
  {\clkwd (characterp x "standard")} is
  true if its argument is a member of the standard character
  repertoire and false
  otherwise.
\editend
\\
\edithead {\csdag 38 (p75)}
\editstart
\\ \bf replace &
\cltxt
  {\clkwd (characterp x) $\equiv$ (typep x \apostrophe character)}
\\ \bf with &
\cltxt
  {\clkwd (characterp x "standard") $\equiv$ (typep x \apostrophe
  (character "standard")}
\editend
\\
\edithead {\csdag 72 (p76)}
\editstart
\\ \bf replace &
\cltxt
  See also {\clkwd standard-char-p, string-char-p, streamp,}
\\ \bf with &
\cltxt
  See also {\clkwd standard-char-p, streamp,}
\editend

\setcounter{subsubsection}{2}
\subsubsection{Equality Predicates} % 6.2.3.

\edithead {\csdag 75 (p81)}
\editstart
\\ \bf replace &
\cltxt
  which ignores alphabetic case and certain other attributes
  of characters;
\\ \bf with &
\cltxt
  which ignores alphabetic case
  of characters;
\editend

%----------------------------------------------------------------------
\setcounter{section}{6}
\section{Control Structure}                 % 7
%----------------------------------------------------------------------

\setcounter{subsection}{1}
\subsection{Generalized Variables} % 7.2.

\edithead {\csdag 19 modify table (p95)}
\editstart
\\ \bf replace &
\cltxt
  char               string-char
\\ &
  schar              string-char
\\ \bf with &
\cltxt
  char               character
\\ &
  schar              character
\\ &
  sbchar             base-character
\editend
\\
\edithead {\csdag 22 table entry (p96)}
\editstart
\\ \bf delete &
\cltxt
  char-bit           first                  set-char-bit
\editend

%----------------------------------------------------------------------


Received: from life.ai.mit.edu (TCP 20015020120) by AI.AI.MIT.EDU 12 Jan 89 23:55:34 EST
Received: from ai.ai.mit.edu by life.ai.mit.edu; Thu, 12 Jan 89 23:42:45 EST
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Jan 89  19:54:57 PST
Date: Thu, 12 Jan 89 13:33:25 PST
From: Thom Linden <baggins@ibm.com>
To: Common Lisp mailing <common-lisp@sail.stanford.edu>
Message-Id: <890112.133325.baggins@almvma>
Subject: cs proposal part 1

\documentstyle{report}     % Specifies the document style.

\pagestyle{headings}

\title{\bf
Extensions to Common LISP to Support International
Character Sets}
\author{
Michael Beckerle\thanks{Gold Hill Computers} \and
Paul Beiser\thanks{Hewlett-Packard} \and
Robert Kerns\thanks{Independent consultant} \and
Kevin Layer\thanks{Franz, Inc.} \and
Thom Linden\thanks{IBM Research, Subcommittee Chair} \and
Larry Masinter\thanks{Xerox Research} \and
David Unietis\thanks{Lucid, Inc.}
}
\date{January 1, 1989}   % Deleting this command produces today's date.

\begin{document}

\maketitle                 % Produces the title.

\setcounter{secnumdepth}{4}

\setcounter{tocdepth}{4}
\tableofcontents


%----------------------------------------------------------------------
%----------------------------------------------------------------------
\newfont{\cltxt}{cmr10}
\newfont{\clkwd}{cmtt10}

\newcommand{\apostrophe}{\clkwd '}
\newcommand{\bq}{\clkwd\symbol{'22}}


%----------------------------------------------------------------------
%----------------------------------------------------------------------
\chapter{Introduction}

This is a proposal to the X3 J13 committee
for both extending and modifying the Common LISP
language definition to provide a standard basis for Common LISP
support of the variety of characters used to represent the
native languages of the international community.

This proposal was created by the Character Subcommittee of X3 J13.
We would like to acknowledge discussions with T. Yuasa and other
members of the JIS Technical Working Group,
comments from members of X3 J13,
and the proposals \cite{ida87},
\cite{linden87}, \cite{kerns87}, and \cite{kurokawa88} for
providing the motivation and direction for these extensions.
As all these documents and discussions were created
expressly for LISP standardization usage,
we have borrowed freely from their ideas as well as the texts
themselves.

This document is separated into three parts. The first part explains the
major language changes and their motivations. While intended as
commentary to a general audience, and not explicitly as
part of the standard document, the X3 J13 editor may
include sections at her/his discretion.  The second part,
Appendix A, provides
the page by page set of editorial changes to \cite{steele84}.
The final part, Appendix B, contains language elements deleted
from \cite{steele84} which we view as important from a compatibility
viewpoint but consider deprecated Common LISP features.
\section{Objectives}

The major objectives of this proposal are:
\begin{itemize}
\item To provide a consistent, well-defined scheme allowing support
of both very large character sets and multiple character sets.
\footnote{The distinction between the terms {\em character repertoire}
and {\em coded character set} is made later.  The usage
of the term {\em character set},
avoided after this introduction, encompasses both terms.}

Many software applications are intended for international use, or
have requirements for incorporation of language elements of multiple
native languages within a single application.
Also, many applications require specialized languages including,
for example, scientific and typesetting symbols.
In order
to ensure some portability of these applications, data expressed in
a mixture of these
languages must be treated uniformly by the
software language.

All character and string manipulations should operate uniformly,
regardless of the character set(s) of the character objects.
This applies to array indexing, readtable definitions, read
symbol construction and I/O operations.


\item To ensure efficient performance of string and character
operations.

Many native
languages, such as Japanese and Chinese, use character
sets which contain more characters than the Latin alphabet.
Supporting larger sized character sets frequently means employing
larger data fields to uniquely encode each character.
Common LISP implementations using
larger sized character sets can
incur performance penalties in terms
of space, time, or both.

The use of large and/or multiple character sets by an
implementation
implies the need for a more complex character type representation.
Given a more complex character representation, the efficiency
of language operations on characters (e.g. string operations)
could be affected.

\item To assure forward compatibility of the proposed model
and definition with existing Common LISP implementations.

Developers should not be required to re-write large amounts of either
LISP code or data representations in order to apply the proposed
changes to existing implementations.
The proposed changes should provide an easy
portability path for existing code to many possible implementations.
\end{itemize}

There are a number of issues, some under the general rubric of
internationalization, which this proposal does {\em not} cover.
Among these issues are:
\begin{itemize}
\item Time and date formats
\item Monetary formats
\item Numeric punctuation
\item Fonts
\item Lexicographic orderings
\item Right-to-left and bidirectional languages
\end{itemize}

%----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\chapter{Overview}

We use several terms within this document which
are new in the context of Common LISP.
Definitions for the following prominent
terms are provided for the reader's convenience.

A {\em character repertoire} defines a collection of characters
independent of their specific rendered image or font.  This
corresponds to the mathematical notion of a {\em set}
\footnote{We avoid the term {\em character set} as it has been
(over)used in the context of character repertoire as well
as in the context of coded character set.}.
Character
repertoires are specified independent of coding and their characters
are only identified with a unique label, a graphic symbol, and
a character description.
A {\em coded character set} is a character repertoire plus
an {\em encoding} providing a unique mapping between each character
and a number which serves as the character representation.
There are numerous internationally standardized coded character
sets; for example, \cite{iso8859/1} and \cite{iso646}.

A character may be included in one or more character repertoires.
Similarly, a character may be included in one or more
coded character sets.  For example, the Latin letter "A" is contained
in the coded character set standards: ISO 8859/1, ISO 8859/2,
ISO 6937/2, and others.

Common LISP
characters are partitioned into a unique collection of
repertoires called {\em
Character Registries}.  That is, each character is included
in one and only one Character Registry.  The label identifying
each character within a Character Registry is a unique numerical value
referred to as the {\em character index}.

In Common LISP a {\em character} data object is identified by its
{\em character code}, a unique numerical code.
Each character code is composed from
a Character Registry
shared by all characters of a particular Registry,
and a character index, a numerical value which
is unique within the Character Registry.

Character data objects which are classified as {\em graphic},
or displayable, are each associated with a {\em glyph}.  The
glyph is the visual representation of the character.

The primary purpose of introducing these terms is to provide a
consistent naming to Common LISP concepts which are related
to those found in ISO standardization of coded
character sets.
\footnote{The bibliography includes several relevant ISO
coded character set standards.}
They also serve as a demarkation between these
standardization activities.  For example, while Common LISP is free to
define unique repertoires and facilities to manipulate them, it should
not define coded character sets.

A secondary purpose is to detach the language specification from
underlying hardware representation.  From a language
specification viewpoint it is inconsequential whether
characters occupy one or more (8-bit) bytes or whether
a Common LISP implementation's
internal representation for characters is distinct from or identical
to any given external representation (for example, a text interchange
representation \cite{iso6937/2}).
We specifically do not propose any standard coded character sets.

%----------------------------------------------------------------------
\section{Character Identity}


Characters are uniquely distinguished by their codes,
which are drawn from the set of
non-negative integers.  That is, within Common LISP
a unique numerical code
is assigned to each semantically different character.
Character codes are composed from a Character Registry and a
character index.  The convention by which a character index and
Character Registry compose a character code is implementation
dependent.

It is important to separate the notion of glyph from the notion of
character data object when defining a scheme under which issues of
identity can be rigorously decided by a computer language.  Glyphs are
the visual aspects of characters, writable on surfaces, and sometimes
called 'graphics'.  A language specification valid for more than a
narrow range of systems can only make assumptions about the existence
of {\em abstract} glyphs (for example, the Latin letter A) and not about
glyph variants (for example, the italicized Latin letter {\em A})
or characteristics of display devices.  Thus, a key element of this
proposal is the removal of the {\em font} and {\em bits}
attributes from the language specification.

One ramification is that the distinction between {\clkwd string-char}
and {\clkwd character} is eliminated.  {\bf All} characters can be
inserted into (type compatible) strings.
In addition, all functions
dealing with the {\em bits} and {\em font} attributes are either
removed or modified by this proposal.

A second ramification is the introduction of new functions to
compose and decompose character objects.
The {\clkwd characterp} predicate is extended to
support testing
membership of a character in a given Character Registry.
\footnote{
For example,
testing membership in the Japanese Katakana Character Registry.
}
Also, a global variable {\clkwd *all-registry-names*} is added to
support application determination of supported Character Registries.

A third ramification is that I/O functions must be modified to manage
the interaction between the Common LISP treatment of characters and
the external environment.

The definition in \cite{steele84} of semi-standard characters has
been eliminated.  This is replaced by a more uniform approach
with introduction of the Control Character
Registry (see below).


%----------------------------------------------------------------------
\section{Character Repertoires and Registries}


A Common LISP program must be able to compose and decompose
characters in a portable uniform manner, independent of any
underlying representation.  One possible composition is by
the pair $<$ coded character set standard, decimal representation $>$
\footnote{This syntax is for illustration only and is not being
proposed.}.
Thus, for example, one might compose the Latin 'A' with the pair
$<$ "ISO8859/2-1987ccs", 65 $>$,
$<$ "ISO8859/6-1987ccs", 65 $>$, or
$<$ "ISO646-1983ccs", 65 $>$, etc..  The difficulty here is two-fold.
First, there are several ways to compose the same character and
second, there may be multiple answers to
the question: {\em To what coded character set
does character object x belong?}.\footnote{Even
worse, the answer might change yearly.}
The identical problems occur if the pair
$<$ character repertoire standard, decimal representation $>$ is used.
\footnote{Existing repertoires seem to be defined exclusively
in the context of coded character sets and not as standards
in their own right.}

The concept of Character Registry is introduced by this proposal
to resolve the problem of character composition and decomposition.
Each character is universally defined by the
pair $<$ Character Registry name, character index $>$. For this
to be a portable definition, it must have a standard meaning.
Thus this
proposal relies on a {\em Character Registry Standard}.
There is no existing Character Registry Standard.
Until such an ANSI or ISO standard exists, Common LISP
defines the {\em Common LISP Character Registry Standard}.
\footnote{It is the intention of X3 J13 to promote and adopt
an eventual ANSI or ISO Character Registry Standard.  In particular, we
acknowledge that X3 J13 is {\em not} the appropriate forum to
define the standard.  We believe
it is a required component of all programming languages
providing support for international characters.}


Common LISP defines the following Character Registries:
\footnote{In the interest of brevity, this document will
define only a partial list of
the Character Registry names.  A subsequent
document will define the complete Common LISP Character Registry
Standard including the effect of the character predicates
{\em alpha-char-p},
{\em lower-case-p}, etc..}
\footnote{
Character Registry names are strings formed from the Common LISP
{\clkwd standard-p} characters. Within registry names, alphabetic
case is ignored.}
\begin{itemize}
\item Arabic
\item Armenian
\item Bo-po-mo-fo
\item Control
\item Cyrillic
\item Georgian
\item Greek
\item Hangul
\item Hebrew
\item Hiragana
\item Japanese-Punctuation
\item Kanji-JIS-Level-1
\item Kanji-JIS-Level-2
\item Kanji-Gaiji
\item Katakana
\item Latin
\item Latin-Punctuation
\item Mathematical
\item Pattern
\item Phonetic
\item Technical
\end{itemize}
The Common LISP Character Registry Standard is fixed;
an implementation
may not extend the set of characters within any Common LISP
Character Registry.

An implementation may provide support for all or part of any Common LISP
Character Registry
and may provide new character registries which include characters
having unique semantics (i.e. not defined in any other
implementation-defined character registry or Common LISP Character
Registry).  Implementation registries must be uniquely
named using only {\clkwd standard-p} characters.  In addition,
the repertoire names {\em base} and {\em standard} have
reserved Common LISP usage.


An implementation must document the registries it supports.
For each registry supported,
an implementation must define individual characters supported
including at least the following:
\begin{itemize}
\item Character Labels,
Glyphs, and Descriptions.
\item $<$ Common LISP
Character Registry name, character index $>$ pair if one exists
otherwise $<$ implementation-defined
character registry name, character index $>$ pair.
\item Reader Canonicalization.
\item Position in total ordering.
The partial ordering of the Standard alphanumeric
characters must be preserved.
\item Effect of character predicates.
In particular,
\begin{itemize}
\item {\clkwd alpha-char-p}
\item {\clkwd lower-case-p}
\item {\clkwd upper-case-p}
\item {\clkwd both-case-p}
\item {\clkwd graphic-char-p}
\item {\clkwd standard-char-p}
\item {\clkwd alphanumericp}
\end{itemize}
\item Interaction with File I/O.  In particular, the
coded character set standards
\footnote{For example, "ISO8859/1-1987ccs".} and
external encoding schemes
\footnote{For example, {\em "Xerox System Integration Character
Code Standard"}\cite{xerox87}.}
which are supported must be specified.
\end{itemize}

The
intent of the provision for multiple character registries
is that native language glyphs (with associated digits and
punctuation)
\footnote{For example, the glyphs on the keycaps of a particular
terminal, or any other glyph sets with a common use in graphics or
symbolic communication.
}
should each be mapped by the I/O interface
into registries inside
Common LISP, all the members of which
share a common registry name.
Which glyph sets are supported by the overall computing system, the
details of the mapping of
glyphs to character codes, and any implementation unique character
registry names used, are left unspecified by Common LISP.

The diversity of glyph sets and coded character
set conventions in use worldwide and the desirability
of allowing Common LISP to manipulate symbolic elements from many
languages, perhaps simultaneously, mandate such a flexible approach.

%----------------------------------------------------------------------
\section{Hierarchy of Types}

Providing support for extensive character repertoires may
impact Common LISP implementation performance in terms
of space, time, or both.
\footnote{This does not apply to all implementations.
Unique hardware support and user community requirements must
be taken into consideration.}
In particular, many existing
implementations support variants of the ISO 8859/1 standard.
Supporting large
repertoires argues for a multi-byte internal representation
for each character, even if an application primarily (or exclusively)
uses the ISO 8859/1 characters.

This proposal extends the definition of the character and string
type hierarchy to include specialized subtypes
of character and string.  An implementation is free to associate
compact internal representation tailored to each subtype.
The {\clkwd string} type specifier, when used as a
declaration (for example, in {\clkwd make-sequence})
is defined to mean the most general string subtype supported
by the implementation.  This definition emphasizes portability
of existing Common LISP applications to international
character environments over performance.  Applications emphasizing
efficiency of text processing in non-international environments
will require some modification to utilize subtypes with
compact internal representations.

It has been suggested that either a single type is
sufficient to support international characters,
or that a hierarchy of types could be used, in a manner
transparent to the user.  A desire to provide flexibility which
encourages implementations to support international
characters without compromising application efficiency
led us to accept the need for more than one type.
We believe that these choices reflect a minimal
modification of this aspect of the type system, and that
exposing the types for string and character construction while
requiring uniform treatment for characters otherwise
is the most reasonable approach.

\subsection{Character Type}

The following type specifier is added as a subtype
of {\clkwd character}.
\begin{itemize}
\item {\clkwd base-character}
\end{itemize}

An implementation may support additional subtypes of {\clkwd character}
which may or may not be supertypes of {\clkwd base-character}.
In addition, an implementation may define {\clkwd base-character}
as equivalent to {\clkwd character}.

Characters of type {\clkwd base-character} are referred to as
{\em base characters}.  Characters of type {\clkwd
(and character (not base-character))}
are referred to as {\em extended characters}.
The base characters are
distinguished in the following respects:
\begin{itemize}
\item
The standard characters are a subrepertoire of the base characters.
\item
Only members of the base character repertoire
can be elements of a base string.
\item
The base characters are, in general, the default characters for I/O
operations.
\end{itemize}
No upper bound is specified for the number of glyphs in the base
character repertoire--that
is implementation dependent.  The lower bound is 96, the
number of standard characters defined for Common LISP.
\footnote{Or, in contrast, the base repertoire may include all
the Common LISP Character Registries.}


The distinction of base characters is largely a pragmatic
choice.  It permits efficient handling of common situations, is
in some sense privileged for host system I/O, and can serve as an
intermediate basis for portability, less general than the standard
characters, but possibly more useful across a narrower range of
implementations.

Many computers have some "base" character representation which
is a function of hardware instructions for dealing with characters,
as well as the organization of the file system.  The base character
representation is likely to be the smallest transaction unit permitted
for text file and terminal I/O operations.  On a system with a record
based I/O paradigm, the base character representation is likely to
be the smallest record quantum.  On many computer systems,
this representation is a byte.

However, there are often multiple
coded character sets supportable on a
computer, through the use of special display and entry hardware, which
are varying interpretations of the basic system character
representation.  For example, ISO 8859/1 and ISO 6937/2 are two
different interpretations of the same 1-byte code representations.
Many countries have their own glyph-to-code mappings for 1-byte
character codes addressing the special requirements of national
languages.  Differentiating between these, without reference to
display hardware, is a matter of convention, since they all use the
same set of code representations.  When a single byte is not enough,
two or more bytes are sometimes used for character encoding.  This
makes character handling even more difficult on machines where the
natural representation size is a byte, since not only is the semantic
value of a character code a matter of convention, which may vary
within the same computing system, but so is the identification of a
set of bits as a complete character code.

It is the intention of this proposal that the base characters of
Common LISP
be the natural characters of the host system: its composition
should be
determined by the code capacity of the natural file system and I/O
transaction representations, and its assumed display glyphs should be
those of the terminals most commonly employed.
There are several advantages to this scheme.  Internal representation
of strings of just base characters can be more compact than
strings including extended characters.
Source programs are likely to consist predominantly of base characters
since the standard characters are a subset of the base character
repertoire. Parsing of pure base character text
can be more efficient than parsing of text including
extended characters.
I/O can be performed more simply
with base characters.

The standard characters are the 96 characters used in the Common LISP
definition {\bf or their equivalents}.

This was the Common LISP \cite{steele84} definition, but
{\em equivalents} is a vague term.

The standard characters are not defined by their glyphs, but by their
roles within the language.  There are two aspects to the roles of the
standard characters: one is their role in reader and format control
string syntax; the second is their role as components of the names of
all Common LISP
functions, macros, constants, and global variables.  As
long as an implementation chooses 96 glyphs
and treats those 96 in a manner consistent with
the language's specification for the standard characters (e.g.
the naming of functions), it doesn't matter what glyphs the I/O
hardware uses to represent those characters: they are the standard
characters.  Any program or
data text written wholly in those characters
is portable through simple code conversion.
\footnote{For example, the currency glyph, \$ , might be replaced
uniformly by the currency glyph available on a particular display.}

Additional
mechanisms, such as in \cite{linden87}, which support establishment of
equivalency between otherwise distinct characters are not excluded by
this proposal.
\footnote{We believe this is an important issue but it requires
additional implementation experience.  We also encourage
new proposals from JIS and ISO LISP Working Groups on this issue.}

\subsection{String Type}

The {\clkwd string} type
is defined as
a vector of characters.  More precisely, a string
is a specialized vector whose elements are of type
{\clkwd character} or a subtype of character.  The following string
subtypes are
distinguished with standardized names: {\clkwd base-string},
{\clkwd general-string}, {\clkwd simple-base-string}, and
{\clkwd simple-general-string}.
All strings which are not base strings
are referred to as {\em extended strings}.

A base string can only contain base characters.  A
{\clkwd general-string}
can contain any implementation supported base or extended characters,
in any mixture.
\footnote{This type might be more appropriately named
{\clkwd most-general-string}.  {\clkwd general-string} was
subjectively judged to be less offensive.}

All Common LISP functions defined to operate on strings treat
base and extended strings uniformly with the following
caveat: for any function which inserts a character into a string, it
is an error to insert an extended character
into a base string.
\footnote{An implementation may, optionally, provide automatic
coersion to an extended string.}

An implementation may support string subtypes more general
than {\clkwd base-string} but more specialized than
{\clkwd general-string}.
For example, a hypothetical
implementation supporting Arabic and Cyrillic Character Registries
might provide:
\begin{itemize}
\item {\clkwd general-string} -- may contain Arabic, Cyrillic or
base characters in any mixture.
\item {\clkwd region-specialized-string} -- may contain installation
selected repertoire (Arabic/Cyrillic) or base characters in any
mixture.
\item {\clkwd base-string} -- may contain base characters
\end{itemize}
Though, clearly, portability of applications using
{\clkwd region-specialized-string} is limited, a performance
advantage might argue for its use.
\footnote{{\clkwd region-specialized-string} is used here for
illustration only; it is not being proposed as a standardized
string subtype.}

Alternatively,
an implementation
supporting a large base character repertoire
including, say, Japanese Character Registries may define
{\clkwd base-character}
as equivalent to {\clkwd character}.

We expect that applications sensitive to the performance
of character handling in some host environments will
utilize the string subtypes to provide performance
improvement.  Applications with emphasis on international
portability will likely utilize only {\clkwd general-string}s.

The {\clkwd coerce} function is extended to
allow for explicit coercion between base strings and extended strings.

During reader
construction of symbols, if all the characters
in the symbol's name are of type {\clkwd base-character},
then the name of the symbol may be stored as a base string.
Otherwise it will be stored as an extended string.

The base string type allows for more compact representation of strings
of base characters, which are likely to predominate in any system.
Note that in any particular implementation the base characters
need not be the
most compactly representable, since others might have
a smaller repertoire.
However, in most implementations base strings are
likely to be more space efficient than extended strings.


%----------------------------------------------------------------------
\section{Streams and System I/O}

A lot of the work of ensuring that a
Common LISP implementation operates correctly in a
multiple coded character set environment must be performed by
the I/O interface.
The system I/O interface, abstracted in
Common LISP as streams, is responsible
for ensuring that text input from outside LISP is properly mapped
into character objects internally, and that the inverse mapping
is performed on output.  It is beyond the scope of a language
definition to specify the details of this operation, but options
are specified which allow runtime indication from the user as to
what coded character sets a stream uses, and how the mappings
should be done.  It is expected that implementations will provide
reasonable defaults and invocation options to accommodate desired use
at an installation.

One keyword argument is proposed as an addition to {\clkwd open}:
\begin{itemize}
\item {\clkwd :external-code-format}
whose value would be:
\begin{itemize}
\item
A name or list indicating an implementation recognized scheme for
representing 1 or more coded character sets.
\footnote{
For example, the so/si convention used by IBM on 370
machines could be selected by a list including
the name {\em "ibm-shift-delimited"}.
The run-encoding convention defined by XEROX could be
selected by {\em "xerox-run-encoded"}.
The convention based on
ASCII which uses leading bit patterns to distinguish two-byte codes
from one-byte codes could be selected by
{\em "ascii-high-byte-delimited"}.
}
As many coded character set names must be provided as the
implementation requires for that external coding convention.
\footnote{
For example, if {\em "ibm-shift-delimited"} were the
{\clkwd :external-code-format} argument, two
coded character set specifiers would have to be provided.
}
\end{itemize}
\end{itemize}

These arguments are provided for input, output, and
bidirectional streams.
It is an error to try to write a character other than a
member of the specified coded character sets
to a stream.  (This excludes the
\#$\backslash${\clkwd Newline} character.
Implementations must provide appropriate line division behavior
for all character streams.)

An implementation supporting multiple coded character sets
must allow for the external
representation of characters to be separately (and perhaps
multiply) specified to {\clkwd open},
since there can be circumstances under
which more than one external representation for characters
is in use, or more than one coded character set
is mixed together in an
external representation convention.

In addition to supporting conversion at the system interface, the
language must allow user programs to determine how much space data
objects will require when output in whichever external representations
are available.

The new function {\clkwd external-width}
takes a character
or string object as its required argument.  It also takes an optional
{\em output-stream}.
It returns the number of host system character
representation quantum units
\footnote{
Same as the storage width of a base character, usually a byte.
}
required to externally store that object, using the
representation convention associated with the stream.
If the object cannot be represented in
that convention, the function returns {\clkwd nil}.
This function is necessary
to determine if strings can be written to fixed length
fields in databases or terminal screen templates.  Note that this
function does not
address the problem of calculating
screen width of strings printed in proportional fonts.
\footnote{
The X3 J13 proposal STREAM-INFO: ONE-DIMENSIONAL-FUNCTIONS
modified to include these semantics is an
acceptable alternative to the {\clkwd external-width} function
proposed here.}

%----------------------------------------------------------------------
%----------------------------------------------------------------------


Received: from life.ai.mit.edu (TCP 20015020120) by AI.AI.MIT.EDU 12 Jan 89 23:51:23 EST
Received: from ai.ai.mit.edu by life.ai.mit.edu; Thu, 12 Jan 89 23:42:14 EST
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Jan 89  19:59:33 PST
Date: Thu, 12 Jan 89 16:53:24 PST
From: Thom Linden <baggins@ibm.com>
To: Common Lisp mailing <common-lisp@sail.stanford.edu>
Message-Id: <890112.165324.baggins@almvma>
Subject: cs proposal

  Hopefully the character proposal covers all the varied comments
we received previously.  Thanks again to everyone for the constructive
criticism.  In particular, I wish to express our thanks to
Yuasa-san, Kurokawa-san and the JIS Lisp committee.

Regards,
  Thom


Received: from life.ai.mit.edu (TCP 20015020120) by AI.AI.MIT.EDU 12 Jan 89 23:50:50 EST
Received: from ai.ai.mit.edu by life.ai.mit.edu; Thu, 12 Jan 89 23:41:42 EST
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 12 Jan 89  19:59:21 PST
Date: Thu, 12 Jan 89 13:36:53 PST
From: Thom Linden <baggins@ibm.com>
To: Common Lisp mailing <common-lisp@sail.stanford.edu>
Message-Id: <890112.133653.baggins@almvma>
Subject: cs proposal

I've just sent out two messages containing the latest character
proposal (no DRAFT this time).  We will only vote on this at
Hawaii if the full J13 agrees otherwise (which I expect)
a network ballot will be sent right after Hawaii.

Aloha,
  Thom


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 12 Jan 89 16:48:26 EST
Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 12 Jan 89  13:31:37 PST
Received: from GROUSE.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 520433; Thu 12-Jan-89 16:29:56 EST
Date: Thu, 12 Jan 89 16:29 EST
From: Robert A. Cassels <Cassels@STONY-BROOK.SCRC.Symbolics.COM>
Subject: Logical Operations on Numbers
To: ELIOT@cs.umass.EDU, common-lisp@sail.stanford.EDU
In-Reply-To: <8901122046.AA00579@crash.cs.umass.edu>
Message-ID: <19890112212955.5.CASSELS@GROUSE.SCRC.Symbolics.COM>

    Date: Thu, 12 Jan 89 15:31 EST
    From: ELIOT@cs.umass.EDU

    Section 12.7 (pp 220-225) describes CL operations for manipulating
    finite sets using integers.  Unfortunately there does not seem to
    be any predicate to determine if one set is a subset of another
    using this representation.  'logtest' serves as an intersection test,
    'logbitp' serves as a member test but to determine subset relations
    seems to require computing the set difference (with logandc2) and
    comparing the result with zero.  If the sets are moderately large
    (say several hundred elements) this involves expensive bignum operations
    that I would like to avoid.

One can imagine a compiler noticing the pattern (LOGTEST .. (LOGNOT ..))
and compiling a call to a special routine which didn't do the explicit
LOGNOT computation.  I don't know of any compiler which does this,
though.

    I have also thought of using bitvectors, but the operations on bitvectors
    (p 294) only operate on bitvectors of the same length.

For vectors, it's not too hard to imagine that the shorter one should be
treated as if it were extended with zeros (presumably at the higher
index end).  It's a little harder to decide what to do in the
multidimensional case.

							    Furthermore,
    the bitvector functions only include bitwise operations, but no subset
    test here either.

    Isn't SUBSET considered an important set manipulation primitive?

    Chris Eliot
    University of Massashusetts at Amherst

Symbolics Common Lisp defines:

  SCL:BIT-VECTOR-SUBSET-P - Function (BIT-VECTOR-1 BIT-VECTOR-2 &key (:START1 0) :END1 (:START2 0) :END2)
   ;; BIT-VECTOR-1 is a subset of BIT-VECTOR-2
  SCL:BIT-VECTOR-POSITION - Function (BIT BIT-VECTOR &key (:START 0) :END)
   ;; equivalent to (POSITION BIT BIT-VECTOR :START START :END END)
  SCL:BIT-VECTOR-ZERO-P - Function (BIT-VECTOR &key (:START 0) :END)
  SCL:BIT-VECTOR-EQUAL - Function (BIT-VECTOR-1 BIT-VECTOR-2 &key (:START1 0) :END1 (:START2 0) :END2)
   ;; equivalent to (EQUAL (SUBSEQ BIT-VECTOR-1 :START START1 :END END1)
   ;;                      (SUBSEQ BIT-VECTOR-2 :START START2 :END END2))
  SCL:BIT-VECTOR-DISJOINT-P - Function (BIT-VECTOR-1 BIT-VECTOR-2 &key (:START1 0) :END1 (:START2 0) :END2)
  SCL:BIT-VECTOR-CARDINALITY - Function (BIT-VECTOR &key (:START 0) :END)
   ;; counts the "1" bits

At the present time, -SUBSET-P, -EQUAL, and -DISJOINT-P all return NIL
if the vectors have different lengths.

A more CL-consistent way of doing cardinality is probably by analogy
with the COUNT function:
  BIT-VECTOR-COUNT - Function (BIT BIT-VECTOR &key (:START 0) :END)
   ;; equivalent to (COUNT BIT BIT-VECTOR :START START :END END)


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 12 Jan 89 16:44:55 EST
Received: from crash.cs.umass.edu ([128.119.40.235]) by SAIL.Stanford.EDU with TCP; 12 Jan 89  13:25:08 PST
Received: from vax3.cs.umass.edu by crash.cs.umass.edu (5.59/Ultrix2.0-B)
	id AA00629; Thu, 12 Jan 89 16:25:58 est
Message-Id: <8901122125.AA00629@crash.cs.umass.edu>
Date: Thu, 12 Jan 89 16:19 EST
From: MURRAY@cs.umass.EDU
Subject: argument processing
To: common-lisp@sail.stanford.EDU
X-Vms-To: IN%"common-lisp@sail.stanford.EDU"

Subj:	Order of "processing" of arguments
To: Common-Lisp@SAIL.Stanford.EDU

> From: Bruce Krulwich <krulwich-bruce@YALE.ARPA>
> ...
> It seems to me that as long as actuals and formals are matched up correctly
> there is no reason for the language specification to specify the order of the
> "processing" of the arguments during lambda-binding.

The order of processing of lambda-binding is important, because
&optional or &key parameters can have code that is executed if their arguments
are not supplied in a call.  By specifying the left-right order of processing,
it defines that any arguments bound "on the left" are accessable to code
executed "on the right".

Kelly Murray



Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 12 Jan 89 16:22:54 EST
Received: from crash.cs.umass.edu ([128.119.40.235]) by SAIL.Stanford.EDU with TCP; 12 Jan 89  12:45:18 PST
Received: from vax3.cs.umass.edu by crash.cs.umass.edu (5.59/Ultrix2.0-B)
	id AA00579; Thu, 12 Jan 89 15:46:00 est
Message-Id: <8901122046.AA00579@crash.cs.umass.edu>
Date: Thu, 12 Jan 89 15:31 EST
From: ELIOT@cs.umass.EDU
Subject: Logical Operations on Numbers
To: common-lisp@sail.stanford.EDU
X-Vms-To: IN%"common-lisp@sail.stanford.EDU"

Section 12.7 (pp 220-225) describes CL operations for manipulating
finite sets using integers.  Unfortunately there does not seem to
be any predicate to determine if one set is a subset of another
using this representation.  'logtest' serves as an intersection test,
'logbitp' serves as a member test but to determine subset relations
seems to require computing the set difference (with logandc2) and
comparing the result with zero.  If the sets are moderately large
(say several hundred elements) this involves expensive bignum operations
that I would like to avoid.

I have also thought of using bitvectors, but the operations on bitvectors
(p 294) only operate on bitvectors of the same length.  Furthermore,
the bitvector functions only include bitwise operations, but no subset
test here either.

Isn't SUBSET considered an important set manipulation primitive?

Chris Eliot
University of Massashusetts at Amherst


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 12 Jan 89 13:09:56 EST
Received: from ATHENA.CS.YALE.EDU by SAIL.Stanford.EDU with TCP; 12 Jan 89  09:49:45 PST
Received: by ATHENA.CS.YALE.EDU; Thu, 12 Jan 89 12:49:19 EST
Date: Thu, 12 Jan 89 12:49:19 EST
From: Bruce Krulwich <krulwich-bruce@YALE.ARPA>
Full-Name: Bruce Krulwich
Message-Id: <8901121749.AA18587@ATHENA.CS.YALE.EDU>
Received: by yale-hp-crown (szechuan) 
          via WIMP-MAIL (Version 1.3/1.5) ; Thu Jan 12 12:51:16
To: Common-Lisp@SAIL.Stanford.EDU
Subject: Order of "processing" of arguments
Newsgroups: arpa.common-lisp
In-Reply-To: <46940@yale-celray.yale.UUCP>
Organization: Computer Science, Yale University, New Haven, CT 06520-2158

Michael Greenwald said:
>Actually, CLtL pg 61 says that the arguments and parameters are
>processed in order, from left to right.  I don't know if "processed"
>implies "evaluated", but I always assumed (perhaps incorrectly) it did.

Guy Steele replied:
>I interpret this as referring to how the (fully evaluated) arguments
>are processed during lambda-binding, not to the order in which argument
>forms in a function call are evaluated.  After all, the arguments referred
>to on page 61 might have come from a list given to APPLY, rather then
>from EVAL on a function call.

This seems vacuous to me.  Does this mean that an implementation in which a
procedure entry point knows how many arguments its receiving (through a link
table, for instance, or simply by counting its arguments) and constructs a
REST-arg list before doing the binding of the required args is in violation of
CLtL because it processes the rightmost argument before the leftmost one??  I
hope not.

It seems to me that as long as actuals and formals are matched up correctly
there is no reason for the language specification to specify the order of the
"processing" of the arguments during lambda-binding.


Bruce Krulwich
krulwich@cs.yale.edu



Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 12 Jan 89 07:44:22 EST
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by SAIL.Stanford.EDU with TCP; 12 Jan 89  04:27:20 PST
Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 307893; 12 Jan 89 06:41:54 EST
Date: Thu, 12 Jan 89 06:09 EST
From: Robert W. Kerns <RWK@F.ILA.Dialnet.Symbolics.COM>
Subject: Re: commonlisp types 
To: Robert W. Kerns <RWK@F.ILA.Dialnet.Symbolics.COM>, quiroz%cs.rochester.edu@RIVERSIDE.SCRC.SYMBOLICS.COM,
    common-lisp%sail.stanford.edu@RIVERSIDE.SCRC.SYMBOLICS.COM
In-Reply-To: <19890110024213.3.RWK@F.ILA.Dialnet.Symbolics.COM>
Message-ID: <19890112110920.0.RWK@F.ILA.Dialnet.Symbolics.COM>

    Date: Mon, 9 Jan 89 21:42 EST
    From: Robert W. Kerns <RWK@F.ILA.Dialnet.Symbolics.COM>
	BTW, our mailer didn't like the address
	    Robert W. Kerns <RWK@FUJI.ILA.Dialnet.Symbolics.COM>
	on the excuse that FUJI.ILA.Dialnet.Symbolics.COM is an unknown host.

    "It's not my PLANET, Monkey Boy!"
      -- John Wharten (villan from Buckaroo Bonzai)

Sumimasen, ga... I think that's supposed to be "Wharfin" or something.


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 10 Jan 89 12:48:55 EST
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by SAIL.Stanford.EDU with TCP; 10 Jan 89  09:30:06 PST
Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 307284; 10 Jan 89 12:28:11 EST
Date: Mon, 9 Jan 89 21:42 EST
From: Robert W. Kerns <RWK@F.ILA.Dialnet.Symbolics.COM>
Subject: Re: commonlisp types 
To: quiroz%cs.rochester.edu@RIVERSIDE.SCRC.SYMBOLICS.COM, common-lisp%sail.stanford.edu@RIVERSIDE.SCRC.SYMBOLICS.COM
In-Reply-To: <8901070112.AA09737@lesath.cs.rochester.edu>
Message-ID: <19890110024213.3.RWK@F.ILA.Dialnet.Symbolics.COM>

    Date: Fri, 06 Jan 89 20:12:09 -0500
    From: quiroz@cs.rochester.edu


    : So I'm curious.  Does any compiler actually get this right?

    KCL.  See script at the end of this message.

OK, next question:  Does it open-code or otherwise optimize TYPEP, or
just call TYPEP on the list?

If you don't know, I'll check it next time I use KCL (which will be
*after* X3J13).

    BTW, our mailer didn't like the address
	Robert W. Kerns <RWK@FUJI.ILA.Dialnet.Symbolics.COM>
    on the excuse that FUJI.ILA.Dialnet.Symbolics.COM is an unknown host.

"It's not my PLANET, Monkey Boy!"
  -- John Wharten (villan from Buckaroo Bonzai)

As a workaround, you can use

RWK%FUJI.ILA.Dialnet.Symbolics.Com@Riverside.SCRC.Symbolics.Com

which is essentially what I have to do to send to you.

Or you can use RWK@AI.AI.MIT.Edu, which forwards to the same place.


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU 10 Jan 89 11:14:40 EST
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 10 Jan 89  07:56:38 PST
Date: Mon, 09 Jan 89 19:50:46 PST
From: Thom Linden <baggins@ibm.com>
To: Common Lisp mailing <common-lisp@sail.stanford.edu>
Message-ID: <890109.195046.baggins@IBM.com>
Subject: Character proposal

The revised proposal should be transmitted fairly soon.  Due to this
delay, I won't be asking for a vote unless J13 agrees it is ready.
The content of the scheduled time for characters will be to
review the substantial changes.

I will bring copies to the meeting as well as send over the network.

Regards,
  Thom


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU  7 Jan 89 04:07:10 EST
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by SAIL.Stanford.EDU with TCP; 7 Jan 89  00:54:42 PST
Received: from LUCID.COM by Riverside.SCRC.Symbolics.COM via INTERNET with SMTP id 306559; 7 Jan 89 03:53:00 EST
Received: from bhopal ([192.9.200.13]) by heavens-gate id AA08351g; Sat, 7 Jan 89 00:50:24 PST
Received: by bhopal id AA02943g; Sat, 7 Jan 89 00:52:38 PST
Date: Sat, 7 Jan 89 00:52:38 PST
From: Jon L White <jonl@lucid.com>
Message-Id: <8901070852.AA02943@bhopal>
To: RWK@FUJI.ILA.Dialnet.Symbolics.COM
Cc: jonl%lucid.com@Riverside.SCRC.Symbolics.Com,
        common-lisp%sail.stanford.edu@Riverside.SCRC.Symbolics.Com
In-Reply-To: Robert W. Kerns's message of Fri, 6 Jan 89 15:16 EST <19890106201603.1.RWK@CALVARY.ILA.Dialnet.Symbolics.COM>
Subject: commonlisp types

re: [TYPE-SPECIFIER-P] I'd like to encourage you to make YOUR definition 
    explicit for us, as a starting point.

Well, what I can tell you in reasonable terms won't be that helpful. We
simpy hook in to the part of SUBTYPEP that has to resolve these questions,
and "catch" any signals about unrecognized types.  For symbols, the
question of a recognized type is fairly easy -- there's a list in CLtL
of some basic types, and then there's more basic types coming from
DEFSTRUCT, and finally there's "recursion" via DEFTYPE.  Can you think
of an easier answer for this?


re:     Anyone know of an implementation for which this fails?
    Yes, Symbolics.  You must have missed my query about any implementations
    for which it succeeds!  Any implementation which does source-rewriting
    to optimize TYPEP has to concern itself with this issue.  (The issue is the
    same as for doing INLINEing, but Symbolics fails to use the same mechanism
    for optimizations as it does for inlining.)

Lucid succeeds (and one or two others that I tried).  Oddly enough, Lucid
also "fails" to use the same mechanism for compiler optimizers as it does
for INLINEing -- and it gets the optimizations right, but certain cases
of lexical inlining screws wrong.


-- JonL --


Received: from SAIL.Stanford.EDU (TCP 1200000013) by AI.AI.MIT.EDU  7 Jan 89 00:24:48 EST
Received: from cayuga.cs.rochester.edu (CS.ROCHESTER.EDU) by SAIL.Stanford.EDU with TCP; 6 Jan 89  21:11:52 PST
Received: from lesath.cs.rochester.edu by cayuga.cs.rochester.edu (5.59/k) id AA09897; Fri, 6 Jan 89 20:12:20 EST
Received: from loopback by lesath.cs.rochester.edu (3.2/k) id AA09737; Fri, 6 Jan 89 20:12:14 EST
Message-Id: <8901070112.AA09737@lesath.cs.rochester.edu>
To: common-lisp@sail.stanford.edu
Subject: Re: commonlisp types 
In-Reply-To: Your message of Fri, 06 Jan 89 15:33:00 -0500.
             <19890106203322.2.RWK@CALVARY.ILA.Dialnet.Symbolics.COM> 
Date: Fri, 06 Jan 89 20:12:09 -0500
From: quiroz@cs.rochester.edu


: So I'm curious.  Does any compiler actually get this right?

KCL.  See script at the end of this message.

BTW, our mailer didn't like the address
    Robert W. Kerns <RWK@FUJI.ILA.Dialnet.Symbolics.COM>
on the excuse that FUJI.ILA.Dialnet.Symbolics.COM is an unknown host.

Cesar

KCl (Kyoto Common Lisp)  June 3, 1987
--- UofR version of September 9, 1988
Loading /u/quiroz/.kclrc
Loading /u/quiroz/work/kcl/defsys/defsys.o
Finished loading /u/quiroz/work/kcl/defsys/defsys.o
Finished loading /u/quiroz/.kclrc

>     (defun bar (x) (symbolp x))
bar

>     (defun foo (x)
       (flet ((bar (y) (integerp y)))
	 (typep x '(satisfies bar))))
foo

>     (foo 'x)
t

>(compile 'bar)
End of Pass 1.  
End of Pass 2.  
OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3
bar

>(compile 'foo)
End of Pass 1.  
End of Pass 2.  
OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3
foo

>(foo 'x)
t

>


Received: from SAIL.Stanford.EDU (TCP 4425400302) by AI.AI.MIT.EDU  6 Jan 89 17:04:48 EST
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by SAIL.Stanford.EDU with TCP; 6 Jan 89  13:46:18 PST
Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 306344; 6 Jan 89 15:56:48 EST
Received: from CALVARY.ILA.Dialnet.Symbolics.COM by F.ILA.Dialnet.Symbolics.COM via CHAOS with CHAOS-MAIL id 7601; Fri 6-Jan-89 15:15:53 EST
Date: Fri, 6 Jan 89 15:16 EST
From: Robert W. Kerns <RWK@FUJI.ILA.Dialnet.Symbolics.COM>
Subject: commonlisp types
To: Jon L White <jonl%lucid.com@Riverside.SCRC.Symbolics.Com>
cc: common-lisp%sail.stanford.edu@Riverside.SCRC.Symbolics.Com
In-Reply-To: <8901040858.AA01403@bhopal>
Message-ID: <19890106201603.1.RWK@CALVARY.ILA.Dialnet.Symbolics.COM>

    Date: Wed, 4 Jan 89 00:58:57 PST
    From: Jon L White <jonl@lucid.com>

    re: How do you define "valid type specifier"?

    Very syntactically.  I think its perfectly acceptable to have a set
    of combination rules for making "words" in the type-specifier syntax,
    even though some such "words" would be gibberish.

    The important thing is that base-level types -- those defined in 
    CLtL -- along with DEFSTRUCT extensions be recognizable.  They don't
    have the problems that SATISFIES generates, or that a broken user
    definition generates (such as your DEFTYPE FOO example).

I'm not saying there's a fundamental problem here, just that there's a choice
to be made, and that writing precise and understandable definitions is
non-trivial.  I'd like to encourage you to make YOUR definition explicit for
us, as a starting point.

    By the bye, on another note, I haven't seen any implementation that
    has the bug Kent wondered about earlier:
	 (defun bar (x) (symbolp x))
	 (defun foo (x)
	   (flet ((bar (y) (integerp y)))
	     (typep x '(satisfies bar))))
	 (foo 'x)
	The correct answer is T, but I bet a lot of implementations return NIL
	in compiled code.
    Anyone know of an implementation for which this fails?

Yes, Symbolics.  You must have missed my query about any implementations
for which it succeeds!  Any implementation which does source-rewriting
to optimize TYPEP has to concern itself with this issue.  (The issue is the
same as for doing INLINEing, but Symbolics fails to use the same mechanism for
optimizations as it does for inlining.)


Received: from SAIL.Stanford.EDU (TCP 4425400302) by AI.AI.MIT.EDU  6 Jan 89 17:06:50 EST
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by SAIL.Stanford.EDU with TCP; 6 Jan 89  13:46:18 PST
Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 306345; 6 Jan 89 15:57:47 EST
Received: from CALVARY.ILA.Dialnet.Symbolics.COM by F.ILA.Dialnet.Symbolics.COM via CHAOS with CHAOS-MAIL id 7603; Fri 6-Jan-89 15:33:07 EST
Date: Fri, 6 Jan 89 15:33 EST
From: Robert W. Kerns <RWK@FUJI.ILA.Dialnet.Symbolics.COM>
Subject: commonlisp types
To: gls%Think.COM@Riverside.Symbolics.COM, jwz%spice.cs.cmu.edu@Riverside.SCRC.Symbolics.COM,
    common-lisp%sail.stanford.edu@Riverside.SCRC.Symbolics.COM
In-Reply-To: <881222151736.1.KMP@BOBOLINK.SCRC.Symbolics.COM>
Supersedes: <19890103102924.8.RWK@F.ILA.Dialnet.Symbolics.COM>
Comments: Retransmission of failed mail.
Message-ID: <19890106203322.2.RWK@CALVARY.ILA.Dialnet.Symbolics.COM>

    Date: Thu, 22 Dec 88 15:17 EST
    From: Kent M Pitman <KMP@STONY-BROOK.SCRC.Symbolics.COM>
    Fyi, it turns out this rationale doesn't hold as much water as you'd think.
    Consider:

     (defun bar (x) (symbolp x))

     (defun foo (x)
       (flet ((bar (y) (integerp y)))
	 (typep x '(satisfies bar))))

     (foo 'x)

    The correct answer is T, but I bet a lot of implementations return NIL
    in compiled code.

Like the Symbolics system, Boo, Hiss!

In terms of source transformations, this would have to compile the TYPEP
as follows:

(defun foo (x)
  (flet ((bar (y) (integerp y)))
    (let ((#:G0002 x))
      (macrolet ((bar (a) `(funcall (symbol-function 'bar) ,a)))
        (bar #:G0002)))))

Which is obviously going to require either a codewalker or a typewalker
to identify either locally defined functions or functions used in the
type expansion to shadow with MACROLET.

So I'm curious.  Does any compiler actually get this right?  Really,
this is a general problem with any form of source-code rewrites.  The
Symbolics compiler does get this right with inlined functions, but I'll
bet it doesn't with some other internal in-lined things that work as
source transformations.