APLX Help : Help on APL language : System Functions & Variables : ⎕XML Convert to/from XML
|
|||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
|
Extensible Markup Language (XML) is a widely used standard for storing data in a text format that many different programs can access. It combines the actual data with 'mark-up' which indicates how the data should be interpreted. The See also the An Example of XML formatA full description of XML is beyond the scope of this document. However, the following simple but complete XML example demonstrates some of the main features:
<?xml version="1.0" encoding="utf-8"?>
<sales>
<!-- Sales by month -->
<month>January
<item>
<name>Ice Cream</name>
<amount currency="dollars">25.10</amount>
</item>
<item>
<name>Fizzy Drinks</name>
<amount currency="dollars">360.92</amount>
</item>
</month>
<month>February
<item>
<name>Ice Cream</name>
<amount currency="dollars">5.02</amount>
</item>
<item>
<name>Fizzy Drinks</name>
<amount currency="dollars">403.16</amount>
</item>
</month>
</sales>
The first line specifies the XML version used, and the third line ("Sales by month") is a comment. The remainder of the document consists of elements which contain the data. Each element begins with a start tag and ends with a matching end tag, for example: <name>...</name> Element tag names are case-sensitive. An element may contain data, or other elements nested within it, or both. In addition the start tag may include one or more attributes specifying how the data is to be interpreted. Each attribute is a pair of the form name="value", for example: <amount currency="dollars">25.10</amount> An empty element which contains no data and no other elements nested within it can be written as: <name/> Within an XML document there is usually no significance in the amount of white space used, for example the number of spaces used to indent an element or the positions of line breaks. The following is valid in XML: <item><name>Ice Cream</name><amount currency="dollars">25.10</amount></item> Converting XML Data to an APL ArraySyntax:
The right argument is a character vector (with embedded carriage returns and/or line feeds) containing the XML text to be converted. The optional left argument gives some control over the conversion process and is discussed below. The result is an N-row, 5-column matrix containing a flattened representation of the XML data. Each element in the XML data will produce one row in the result. The columns are as follows:
For example, when presented with the XML sample listed above the array produced is as follows:
⎕XML xml_data
0 sales 3
1 month 7
2 January 4
2 item 3
3 name Ice Cream 5
3 amount 25.10 currency dollars 5
2 item 3
3 name Fizzy Drinks 5
3 amount 360.92 currency dollars 5
1 month 7
2 February 4
2 item 3
3 name Ice Cream 5
3 amount 5.02 currency dollars 5
2 item 3
3 name Fizzy Drinks 5
3 amount 403.16 currency dollars 5
⎕DISPLAY ⎕XML xml_data
┌→─────────────────────────────────────────────────────┐
↓ ┌→────┐ ┌⊖┐ ┌→────────┐ │
│ 0 │sales│ │ │ ⌽ ┌⊖┐ ┌⊖┐ │ 3 │
│ └─────┘ └─┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌→────┐ ┌⊖┐ ┌→────────┐ │
│ 1 │month│ │ │ ⌽ ┌⊖┐ ┌⊖┐ │ 7 │
│ └─────┘ └─┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌⊖┐ ┌→──────┐ ┌→────────┐ │
│ 2 │ │ │January│ ⌽ ┌⊖┐ ┌⊖┐ │ 4 │
│ └─┘ └───────┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌→───┐ ┌⊖┐ ┌→────────┐ │
│ 2 │item│ │ │ ⌽ ┌⊖┐ ┌⊖┐ │ 3 │
│ └────┘ └─┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌→───┐ ┌→────────┐ ┌→────────┐ │
│ 3 │name│ │Ice Cream│ ⌽ ┌⊖┐ ┌⊖┐ │ 5 │
│ └────┘ └─────────┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌→─────┐ ┌→────┐ ┌→─────────────────────┐ │
│ 3 │amount│ │25.10│ ↓ ┌→───────┐ ┌→──────┐ │ 5 │
│ └──────┘ └─────┘ │ │currency│ │dollars│ │ │
│ │ └────────┘ └───────┘ │ │
│ └∊─────────────────────┘ │
│ ┌→───┐ ┌⊖┐ ┌→────────┐ │
│ 2 │item│ │ │ ⌽ ┌⊖┐ ┌⊖┐ │ 3 │
│ └────┘ └─┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌→───┐ ┌→───────────┐ ┌→────────┐ │
│ 3 │name│ │Fizzy Drinks│ ⌽ ┌⊖┐ ┌⊖┐ │ 5 │
│ └────┘ └────────────┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌→─────┐ ┌→─────┐ ┌→─────────────────────┐ │
│ 3 │amount│ │360.92│ ↓ ┌→───────┐ ┌→──────┐ │ 5 │
│ └──────┘ └──────┘ │ │currency│ │dollars│ │ │
│ │ └────────┘ └───────┘ │ │
│ └∊─────────────────────┘ │
│ ┌→────┐ ┌⊖┐ ┌→────────┐ │
│ 1 │month│ │ │ ⌽ ┌⊖┐ ┌⊖┐ │ 7 │
│ └─────┘ └─┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌⊖┐ ┌→───────┐ ┌→────────┐ │
│ 2 │ │ │February│ ⌽ ┌⊖┐ ┌⊖┐ │ 4 │
│ └─┘ └────────┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌→───┐ ┌⊖┐ ┌→────────┐ │
│ 2 │item│ │ │ ⌽ ┌⊖┐ ┌⊖┐ │ 3 │
│ └────┘ └─┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌→───┐ ┌→────────┐ ┌→────────┐ │
│ 3 │name│ │Ice Cream│ ⌽ ┌⊖┐ ┌⊖┐ │ 5 │
│ └────┘ └─────────┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌→─────┐ ┌→───┐ ┌→─────────────────────┐ │
│ 3 │amount│ │5.02│ ↓ ┌→───────┐ ┌→──────┐ │ 5 │
│ └──────┘ └────┘ │ │currency│ │dollars│ │ │
│ │ └────────┘ └───────┘ │ │
│ └∊─────────────────────┘ │
│ ┌→───┐ ┌⊖┐ ┌→────────┐ │
│ 2 │item│ │ │ ⌽ ┌⊖┐ ┌⊖┐ │ 3 │
│ └────┘ └─┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌→───┐ ┌→───────────┐ ┌→────────┐ │
│ 3 │name│ │Fizzy Drinks│ ⌽ ┌⊖┐ ┌⊖┐ │ 5 │
│ └────┘ └────────────┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌→─────┐ ┌→─────┐ ┌→─────────────────────┐ │
│ 3 │amount│ │403.16│ ↓ ┌→───────┐ ┌→──────┐ │ 5 │
│ └──────┘ └──────┘ │ │currency│ │dollars│ │ │
│ │ └────────┘ └───────┘ │ │
│ └∊─────────────────────┘ │
└∊─────────────────────────────────────────────────────┘
Options for converting XML to an APL arrayThe conversion from XML to an APL array described above can be controlled by an optional left argument which consists of one or more keyword/value pairs, for example:
R←('markup' 'preserve') ('whitespace' 'preserve') ⎕XML xml_data
The supported keywords are:
Type code returned by
|
|||||||||||||||||||||||||||||||||||||||||||||
| 1 | Element has a tag (in column 2) (Always true) |
| 2 | Element contains nested child element |
| 4 | Element contains data as well as nested items |
| 8 | Element contains nested XML markup |
| 16 | Element contains nested XML comment |
| 32 | Element contains nested XML Processing Instruction |
For example, the element <Weight> in the following example has a type code of 21 (1 + 16 + 4) when markup and comments are preserved:
<Weight>
<!-- All weights approximate-->
100
</Weight>
Notice that an XML element with children always has a tag name in column 2. It never has any data in column 3 : all the data is returned in subsequent rows.
(b) The following type codes are used for XML elements which don't have any children:
| 1 | Element is an empty XML tag, e.g. <empty/>. The tag name in returned in column 2, and column 3 is blank. |
| 4 | Row is data for parent (See below). The data is returned in column 3, and column 2 is blank. |
| 5 | Element has an XML tag and data, e.g. <Tag>Data</Tag> The tag name is returned in column 2 and the data in column 3. |
| 8 | Element is unprocessed XML markup, e.g. <!ELEMENT name (#PCDATA)>. The markup is returned in column 2, and column 3 is blank. |
| 16 | Element is XML comment, e.g. <!--Comment-->. The comment is returned in column 2, and column 3 is blank. |
| 32 | Element is XML Processing Instruction, e.g. <?xml version="1.0" encoding="utf-8"?>. The processing instruction is returned in column 2, and column 3 is blank. |
The following example illustrates how the codes are used:
<Tag1>Text <Tag2> <Tag3>Text</Tag3> </Tag2> More Text </Tag1>
When converted by ⎕XML this will produce the following array
⎕DISPLAY ⎕XML xml_data
┌→───────────────────────────────────┐
↓ ┌→───┐ ┌⊖┐ ┌→────────┐ │
│ 0 │Tag1│ │ │ ⌽ ┌⊖┐ ┌⊖┐ │ 7 │
│ └────┘ └─┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌⊖┐ ┌→───┐ ┌→────────┐ │
│ 1 │ │ │Text│ ⌽ ┌⊖┐ ┌⊖┐ │ 4 │
│ └─┘ └────┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌→───┐ ┌⊖┐ ┌→────────┐ │
│ 1 │Tag2│ │ │ ⌽ ┌⊖┐ ┌⊖┐ │ 3 │
│ └────┘ └─┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌→───┐ ┌→───┐ ┌→────────┐ │
│ 2 │Tag3│ │Text│ ⌽ ┌⊖┐ ┌⊖┐ │ 5 │
│ └────┘ └────┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
│ ┌⊖┐ ┌→────────┐ ┌→────────┐ │
│ 1 │ │ │More Text│ ⌽ ┌⊖┐ ┌⊖┐ │ 4 │
│ └─┘ └─────────┘ │ │ │ │ │ │ │
│ │ └─┘ └─┘ │ │
│ └∊────────┘ │
└∊───────────────────────────────────┘
Syntax:
R←[options] ⎕XML NSTMAT
When presented with an array of APL data, ⎕XMLwill convert it to XML representation. The result is a character vector with embedded line-feed characters.
The right argument must be a nested matrix with one row for each XML element, and between 3 and 5 columns as follows
| Column 1: | An integer indicating the depth of nesting of the element. A value of 0 is used for the outer-most nesting level, with deeper nesting being indicated by higher numbers. |
| Column 2: | The element name to use for the start tag. |
| Column 3: | The element data (see below) |
| Column 4: | (Optional) An M-row, 2-column nested matrix containing any attribute name/value pairs. Each item in the matrix is a character vector.
If the element has no attributes you can specify a 0-row matrix, or a pair of empty character vectors. If none of the elements have any attributes you can omit column 4 completely. |
| Column 5: | (Optional) An integer type code (ignored). This column is only used to facilitate round-trip conversions from XML to APL and back again. |
The data specified in Column 3 will usually be a character vector or scalar. However, as a convenience ⎕XML also allows you to specify numeric values. These are formatted as character data before copying to the XML result. Numeric values are also allowed for attribute values (but not names).
Example:
array←1 4⍴0 '?xml version="1.0" encoding="utf-8"?' '' ('' '')
array←array⍪0 'Person' '' ('' '')
array←array⍪1 'Name' '' ('order' 'western')
array←array⍪2 'FirstName' 'Fred' ('' '')
array←array⍪2 'LastName' 'Smith' ('' '')
array←array⍪1 'DateOfBirth' '' ('' '')
array←array⍪2 'Year' 1943 ('' '')
array←array⍪2 'Month' 12 ('' '')
array←array⍪2 'Day' 17 ('' '')
XML←⎕XML array
⎕SS XML ⎕L ⎕R ⍝ Convert line feeds to carriage return for display
<?xml version="1.0" encoding="utf-8"?>
<Person>
<Name order="western">
<FirstName>Fred</FirstName>
<LastName>Smith</LastName>
</Name>
<DateOfBirth>
<Year>1943</Year>
<Month>12</Month>
<Day>17</Day>
</DateOfBirth>
</Person>
The conversion process can be controlled by an optional left argument, for example:
R←('whitespace' 'preserve') ⎕XML apl_data
The only supported option is:
⎕XML strips all leading and trailing white space from element data, and compresses runs of white space within the data into a single space. The XML text produced then has spaces and line-feed characters added to format it for readability. For example elements are indented to reflect their degree of nesting.
⎕XML with spaces preserved.
To be valid, an XML file must start with a line containing an XML prologue, e.g.
<?xml version="1.0" encoding="utf-8"?>
Note that ⎕XML does not add the prologue automatically. To ensure that the XML is valid you must do one of two things:
(a) Make sure that the first row of the array used to generate the XML contains a valid prologue, as in the example above, or
(b) Prepend the prologue after the XML has been generated:
XML←⎕XML 0 'Name' 'Fred Smith'
XML←'<?xml version="1.0" encoding="utf-8"?>',⎕L,XML
If you create an XML file using ⎕EXPORT, APLX will automatically add the prologue if it is missing from the array.
This work is based on the original design concepts and implementation by Mark E. Johns, and has been designed in cooperation with Dyalog Ltd
APLX Help : Help on APL language : System Functions & Variables : ⎕XML Convert to/from XML
Copyright © 1996-2010 MicroAPL Ltd