Date: Thursday, 16 September 1982 23:27-EDT From: Scott E. Fahlman To: common-lisp at SU-AI Subject: Revised array proposal (long) Here is a revision of my array proposal, fixed up in response to some of the feedback I've received. See if you like it any better than the original. In particular, I have explictly indicated that certain redundant forms such as MAKE-VECTOR should be retained, and I have removed the :PRINT keyword, since I now believe that it causes more trouble than it is worth. A revised printing proposal appears at the end of the document. ********************************************************************** Arrays can be 1-D or multi-D. All arrays can be created by MAKE-ARRAY and can be accessed with AREF. Storage is done via SETF of an AREF. The term VECTOR refers to any array of exactly one dimension. Vectors are special, in that they are also sequences, and can be referenced by ELT. Also, only vectors can have fill pointers. Vectors can be specialized along several distinct axes. The first is by the type of the elements, as specified by the :ELEMENT-TYPE keyword to MAKE-ARRAY. A vector whose element-type is STRING-CHAR is referred to as a STRING. Strings, when they print, use the "..." syntax; they also are the legal inputs to a family of string-functions, as defined in the manual. A vector whose element-type is BIT (alias (MOD 2)), is a BIT-VECTOR. These are special because they form the set of legal inputs to the boolean bit-vector functions. (We might also want to print them in a strange way -- see below.) Some implementations may provide a special, highly efficient representation for simple vectors. A simple vector is (of course) 1-D, cannot have a fill pointer, cannot be displaced, and cannot be altered in size after its creation. To get a simple vector, you use the :SIMPLE keyword to MAKE-ARRAY with a non-null value. If there are any conflicting options specified, an error is signalled. If an implementation does not support simple vectors, this keyword/value is ignored except that the error is still signalled on inconsistent cases. We need a new set of type specifiers for simple things: SIMPLE-VECTOR, SIMPLE-STRING, and SIMPLE-BIT-VECTOR, with the corresponding type-predicate functions. Simple vectors are referenced by AREF in the usual way, but the user may use THE or DECLARE to indicate at compile-time that the argument is simple, with a corresponding increase in efficiency. Implementations that do not support simple vectors ignore the "simple" part of these declarations. Strings (simple or non-simple) self-eval; all other arrays cause an error when passed to EVAL. EQUAL descends into strings, but not into any other arrays. EQUALP descends into arrays of all kinds, comparing the corresponding elements with EQUALP. EQUALP is false if the array dimensions are not the same, but it is not sensitive to the element-type of the array, whether it is simple, etc. In comparing the dimensions of vectors, EQUALP uses the length from 0 to the fill pointer; it does not look at any elements beyond the fill pointer. The set of type-specifiers required for all of this is ARRAY, VECTOR, STRING, BIT-VECTOR, SIMPLE-VECTOR, SIMPLE-STRING, SIMPLE-BIT-VECTOR. Each of these has a corresponding type-P predicate, and each can be specified in list from, along with the element-type and dimension(s). MAKE-ARRAY takes the following keywords: :ELEMENT-TYPE, :INITIAL-VALUE, :INITIAL-CONTENTS, :FILL-POINTER, and :SIMPLE. There is still some discussion as to whether we should retain array displacement, which requires :DISPLACED-TO and :DISPLACED-INDEX-OFFSET. The following functions are redundant, but should be retained for clarity and emphasis in code: MAKE-VECTOR, MAKE-STRING, MAKE-BIT-VECTOR. MAKE-VECTOR takes the same keywords as MAKE-ARRAY, but can only take a single integer as the dimension argument. MAKE-STRING and MAKE-BIT-VECTOR are like MAKE-VECTOR, but do not take the :ELEMENT-TYPE keyword, since the element-type is implicit. Similarly, we should retain the forms VREF, CHAR, and BIT, which are identical in operation to AREF, but which declare their aray argument to be VECTOR, STRING, or BIT-VECTOR, respectively. If the :SIMPLE keyword is not specified to MAKE-ARRAY or related forms, the default is NIL. However, vectors produced by random forms such as CONCATENATE are simple, and vectors created when the reader sees #(...) or "..." are also simple. As a general rule, arrays are printed in a simple format that, upon being read back in, produces a form that is EQUALP to the original. However, some information may be lost in the printing process: element-type restrictions, whether a vector is simple, whether it has a fill pointer, whether it is displaced, and the identity of any element that lies beyond the fill pointer. This choice was made to favor ease of interactive use; if the user really wants to preserve in printed form some complex data structure containing non-simple arrays, he will have to develop his own printer. A switch, SUPPRESS-ARRAY-PRINTING, is provided for users who have lots of large arrays around and don't want to see them trying to print. If non-null, this switch causes all arrays except strings to print in a short, non-readable form that does not include the elements: #. In addition, the printing of arrays and vectors (but not of strings) is subject to PRINLEVEL and PRINLENGTH. Strings, simple or otherwise, print using the "..." syntax. Upon read-in, the "..." syntax creates a simple string. Bit-vectors, simple or otherwise, print using the #"101010..." syntax. Upon read-in, this format produces a simple bit-vector. Bit vectors do observe SUPPRESS-ARRAY-PRINTING. All other vectors print out using the #(...) syntax, observing PRINLEVEL, PRINLENGTH, and SUPPRESS-ARRAY-PRINTING. This format reads in as a simple vector of element-type T. All other arrays print out using the syntax #nA(...), where n is the number of dimensions and the list is a nest of sublists n levels deep, with the array elements at the deepest level. This form observes PRINLEVEL, PRINLENGTH, and SUPPRESS-ARRAY-PRINTING. This format reads in as an array of element-type T. Query: I am still a bit uneasy about the funny string-like syntax for bit vectors. Clearly we need some way to read these in that does not turn into a type-T vector. An alternative might be to allow #(...) to be a vector of element-type T, as it is now, but to take the #n(...) syntax to mean a vector of element-type (MOD n). A bit-vector would then be #2(1 0 1 0...) and we would have a parallel notation available for byte vectors, 32-bit word vectors, etc. The use of the #n(...) syntax to indicate the length of the vector always struck me as a bit useless anyway. One flaw in this scheme is that it does not extend to multi-D arrays. Before someone suggests it, let me say that I don't like #nAm(...), where n is the rank and m is the element-type -- it would be too hard to remember which number was which. But even with this flaw, the #n(...) syntax might be useful.