Quad-NREAD Read data from a native file

Topic: `APLX Help` : `Help on APL language` : `System Functions & Variables` : `⎕NREAD Read data from native file` [ Previous \| Next \| Contents \| Index \| APL Home ]
	`⎕NREAD` Read data from a native file
The `⎕NREAD` function allows you to read data from anywhere in the file, specifying an optional start byte, count and conversion mode. The file must have been opened in a mode which permits reading. The full syntax is `({}` means optional) : R ← ⎕NREAD TIENO {,CONV {,COUNT {,STARTBYTE}}} TIENO is the tie number associated with the file to read CONV specifies any conversion to apply to the data - e.g. read as raw data, translated characters, 4-byte integers, booleans, etc. The default is to read the file as raw character data. See below for details. COUNT specifies the number of elements to read (except when CONV = 11, see below). The number of bytes read from the file will depend on the conversion mode used (see below). A value of `¯1` (default) specifies read to end-of-file. STARTBYTE may be used to specify the offset in bytes from the beginning of the file at which to start reading data. A value of `¯1` (default) specifies the current file position, the position at which the last successful read or write operation completed. The conversion mode parameter CONV can be used to read data very easily from a file with a known structure. Data can be read as raw bytes, translated characters, booleans, integers or floating point numbers. In addition byte-swapping facilities allow data to be read from a file created on a host with different local byte-ordering conventions. The full list of supported values for the conversion mode is as follows : Normal modes: 0 read data as a stream of raw bytes 1 read data as booleans, 1 bit per element 2 read data as 32-bit integers 3 read data as 64-bit IEEE double-precision floating point numbers 4 read character data and translate from external representation to APLX's own internal format 5 read Unicode UTF-16 characters (two bytes per element), and convert to APLX internal representation as characters. Any Unicode values which cannot be mapped to APLX characters are converted to the value set by ⎕MC (by default, question mark). 6 read data as 32-bit IEEE single-precision floating point numbers 8 read Unicode UTF-8 characters (variable bytes per element), convert to APLX internal representation as characters. Any Unicode values which cannot be mapped to APLX characters are converted to the value set by ⎕MC (by default, question mark). Byte-swapped modes: ¯2 read data as 32-bit byte-swapped integers ¯3 read data as 64-bit byte-swapped floats ¯5 read data as byte-swapped Unicode characters ¯6 read data as 32-bit byte-swapped floats For compatibility with some other APL interpreters the following conversion specifiers are also supported : 11 read data as booleans (same as mode=1) 82 read data as raw characters (same as mode=0) 163 read data as 16-bit integers. Values are converted to 32-bit integers before being returned. They are treated as unsigned. 323 read data as 32-bit integers (same as mode=2) 325 read data as 32-bit floating point numbers (same as mode=6) 645 read data as 64-bit floating point numbers (same as mode=3) ¯163 read data as unsigned 16-bit integers with byte-swapping ¯323 read data as 32-bit integers with byte-swapping (same as mode=¯2) ¯325 read data as 32-bit floats with byte-swapping (same as mode=¯6) ¯645 read data as 64-bit floats with byte-swapping (same as mode=¯3) Under APLX64, the following additional conversion types are available: 7 read data as 64-bit integers ¯7 read data as 64-bit integers with byte-swapping 643 read data as 64-bit integers (same as mode=7) 643 read data as 64-bit integers with byte-swapping (same as mode=¯7) The optional COUNT specified in the `⎕NREAD` argument relates to the number of elements to read, not necessarily the number of bytes. For example when reading 32-bit integers, the number of bytes read will be four times the value of COUNT. Note that when reading boolean data with CONV `=` 11, the value of COUNT specifies the number of bytes to read rather than the number of bits. This is done for compatibility with some other APL interpreters. Note that all file i/o operations start on a byte boundary. In particular, following a boolean read operation that returns a non-integral number of bytes, the current file position will be aligned on the next byte boundary. Two possible errors may occur when specifying an inappropriate value of COUNT or CONV. If the number of bytes remaining in the file is insufficient to satisfy the request, a FILE I/O ERROR occurs and the current file position is unchanged. For example it is an error to try to read 10000 bytes from a file with only 9000 bytes remaining :- ⎕NREAD 100 0 10000 Insufficient data available FILE I/O ERROR ⎕NREAD 100 0 10000 ^ To read all data up to the end of file, omit the count parameter or specify it as `¯1.` A second type of error can arise when trying to read integers or floats. If the count is not explicitly specified and the number of bytes remaining in the file is not an exact multiple of the element size, a FILE I/O ERROR occurs and the current file position is again unchanged. For example, if there are 23 bytes remaining in the file an attempt to read them as 4-byte integers will fail since there is too much data for five integers and not enough for six: ⎕NREAD 100 2 Wrong number of bytes remain for data type requested FILE I/O ERROR ⎕NREAD 100 2 ^ *Examples:* Read all bytes from current file position to end of file as raw data: ⎕NREAD 100 Read from current file position to end of file as integers: ⎕NREAD 100 2 Read next ten integers from file: ⎕NREAD 100 2 10 Read ten floats starting at offset 20 bytes from start of file: ⎕NREAD 100 3 10 20 Reading Unicode UTF-16 text files By convention, Unicode UTF-16 plain-text files start with a 'byte-order' mark. This is the special hex value FEFF, represented as a two-byte value in the byte-ordering used to create the file. Thus, on 'big-endian' systems such as the Macintosh, the first two bytes of the file will normally be hex FE and FF (decimal 254 and 255). On a 'little-endian' system such as Windows or x86 Linux, numbers are represented backwards so the first two bytes will normally be FF and FE. You can use this information to determine whether to use the conversion type `5` or `¯5` when reading the contents of a UTF-16 text file, by reading the first element of the file as a 16-bit integer (conversion code 163 for `⎕NREAD`. If you get the value 65279 (hex FEFF), the Unicode file was written using the same byte-ordering as the machine you are running on, so no byte reversal is required and you can use conversion code `5` to read the Unicode characters from the remainder of the file. If you get the value 65534 (hex FFFE), the Unicode file was written using the opposite byte-ordering convention to that of the machine you are using, so you need to use conversion code `¯5`. For example: 'c:\temp\uni.txt' ⎕NTIE 1 ⍝ Open a UTF-16 text file ⎕NREAD 1 163 1 0 ⍝ Read first two bytes as 16-bit integer 65279 ⍝ This is the correct value for hex FEFF ⎕AF 4 ⎕DR 65279 0 0 254 255 TEXT←⎕NREAD 1 5 ¯1 ⍝ Read the remainder of the file as Unicode ⎕NUNTIE 1 If you want to read UTF-16 files without converting them to APLX characters, use conversion type `163` or `¯163`, to read them as 2-byte (unsigned) integers, with byte-swapping if necessary. This allows you to process Unicode values which cannot be represented in the APLX character set. If you later need to convert the returned integer values to APLX text, use `⎕UCS`. See also `⎕MC`, which contains the character used to replace Unicode characters which cannot be represented in APLX.
Topic: `APLX Help` : `Help on APL language` : `System Functions & Variables` : `⎕NREAD Read data from native file` [ Previous \| Next \| Contents \| Index \| APL Home ]

⎕NREAD Read data from a native file

Reading Unicode UTF-16 text files

`⎕NREAD` Read data from a native file