Topic: APLX Help : Help on APL language : System Functions & Variables : ⎕UCS Convert text to/from Unicode
[ Previous | Next | Contents | Index | APL Home ]

www.microapl.co.uk

⎕UCS Convert text to/from Unicode


Unicode is a worldwide standard which encodes characters as integers or 'code points'. It includes representations of all the international and special characters used in modern computer applications. APL characters, including all those used in APLX, are defined in the Unicode standard, although there are some ambiguities about a few of them. This makes it possible to exchange character data between APLX and other applications (including other APL interpreters which support Unicode), without encountering problems with character translation and 'code tables', provided that the text being transferred can be represented in the APLX character set.

The system function ⎕UCS translates Unicode values to the equivalent character in the APLX character set (if there is one), and vice versa. It takes a right argument, which must be a simple character or integer array.

If the argument is a character array, ⎕UCS returns an integer array of the same shape, containing the Unicode representation of each character. This will be a number within the range 0 to 65535, because all of the characters supported in APLX fall into the basic Unicode range.

If the argument is an integer array, ⎕UCS returns a character array of the same shape, containing the APLX character corresponding to each Unicode value provided. Unicode values which have no equivalent in the APLX character set are converted to the current value of ⎕MC (Missing Character). By default, this is a question mark.

For example:

            ⎕UCS 'X←⍳10'
      88 8592 9075 49 48
            ⎕UCS 88 8592 9075 49 48
      X←⍳10
            ⎕UCS 937 8364 223
      ?ۧ

In the last example, the Unicode value 937 (hex 03A9, representing the Greek capital omega character) was translated to the 'missing character' value (question mark) because it has no equivalent in the APLX character set.

A different 'missing character' can be set using ⎕MC:

            ⎕MC←'$'
            ⎕UCS 937 8364 223
      $ۧ

See the section on the APLX Character Set for details of the mapping between APLX characters and Unicode.


Topic: APLX Help : Help on APL language : System Functions & Variables : ⎕UCS Convert text to/from Unicode
[ Previous | Next | Contents | Index | APL Home ]