Topic: APLX Help : Interfacing to other languages : Interfacing to R
[Next | Previous | Contents | Index | APL Home ]

www.microapl.co.uk

Interfacing to the R statistical language


What is R?

R is an open-source language and set of packages aimed principally at statistical analysis. It includes a huge library of pre-written statistical and mathematical routines, which can be accessed immediately and very conveniently from APLX. It also includes mathematically-oriented graphing facilities.

R is available from http://www.r-project.org, which describes R as follows:

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

Installing R

R can be downloaded either in source code form, or as a pre-compiled binary for most popular platforms, from a number of wesbites (see http://www.r-project.org). In each case you need the R shared library (called libR.so in Linux, R.dll under Windows, and libR.dylib under MacOS); this is usually available in the pre-compiled binaries. If installing from source, be sure to specify the option --enable-R-shlib when running the configure script.

Installing under Windows

This is most easily done using the installer provided with the pre-built binaries. The only additional step which you might need to take is to add the R binary directory to your search path, so that APLX can find the DLL R.dll.

Installing under Linux and MacOS

Follow the instructions provided with the R download. You also need to set up environment variables for R; this is usually done in the R script.

Calling R from APLX

Most of the interface between APLX and R is done using a single external class, named 'r', which represents the R session that you are running. (Note that this is different from most of the other external class interfaces, where objects of many different classes can be created separately from APLX). You create a single instance of this class using ⎕NEW. R functions (either built-in or loaded from packages) then appear as methods of this object, and R variables as properties of the object.

For example:

      ⍝ Open the R interface and try a few simple things
      r←'r' ⎕new 'r'
      r.sqrt 2
1.414213562
      r.sqrt (⊂⍳5)
1 1.414213562 1.732050808 2 2.236067977
      r.sqrt ¯1
[r:NAN]                  ⍝ Returns a special R object NAN
      r.mean (⊂⍳10)
5.5

When calling R functions, the APLX right argument is always a vector where each element corresponds to one argument of the R function. The calls to the sqrt and mean functions above illustrate this; to pass an array as the argument, it needs to be enclosed.

Creating variables in the R environment

Assigning to a symbol as though it were a property of the R session class creates a variable in the R world:

      r.x←2 3⍴⍳6         ⍝ x is an R variable
      r.x
1 2 3
4 5 6
      r.x.⎕ref
[r:matrix]

Evaluating R expressions

Because R is an interpreted language, it is possible to use the System Function ⎕EVAL to run lines of R code, for setting up variables in the R environment, for defining R functions, and so on.

       'r' ⎕eval '4:9'
4 5 6 7 8 9

However, a more convenient syntax is provided (for the 'r' class only) in which ⎕EVAL is a monadic system method.

The right argument is a text vector containing any expression which is a valid line of R code. The result is the explicit result (if any) of evaluating the expression in the external environment. For example:

      r←'r' ⎕new 'r'
      r.x←2 3⍴⍳6         ⍝ x is an R variable
      r.x
1 2 3
4 5 6

      r.⎕eval 'x[2,]'
4 5 6
      r.⎕eval 'mean(x[2,])'
5

Note that the last line could be executed using the alternative syntax where ⎕EVAL is a system function:

      'r' ⎕eval 'mean(x[2,])'
5

Example: 3-D plot

In this short but complete example (based on an article by Skomorokhov and Kutinsky from Quote Quad 123 No 4), we create some data in the R environment, define an R function, and run the R outer product to create some test data. We then call the R persp function to create a 3-D plot:

      r←'r' ⎕new 'r'
      x←r.⎕eval 'seq(-10,10,length=50)'
      y←x
	  
      ⍝ Define an R function and return a reference to it:
      fn←r.⎕eval 'foo<-function(x,y){r<-sqrt(x^2+y^2);10*sin(r)/r}'
      fn
[r:function]
      r.z←r.outer(x y fn)
      r.x←x
      r.y←y
      ⊣r.⎕eval 'persp(x,y,z,theta=30,phi=30,expand=0.5,xlab="X",ylab="Y",zlab="Z")'

This causes R to open a window and display a 3-d perspective chart:

Listing R variables and functions

The ⎕NL system method can be used to get the names of R variables and/or functions. The function list includes built-in functions and functions from all the loaded R packages, so may be several thousand items long:

      ⍝ List R variables:
      vars←r.⎕nl 2
      ⍴vars
129 21
      ⍝ List R functions:
      fns←r.⎕nl 3
      ⍴fns
2058 34                   ⍝ There are lots of them!

⎕DESC can be used to get the full R function list together with details of the parameters (Caution: the result is very large):

      fns2←r.⎕desc 3
      fns2[1445+⍳5;]
pwilcox (q, m, n, lower.tail = TRUE, log.p = FALSE)
q (save = "default", status = 0, runLast = TRUE)
qbeta (p, shape1, shape2, ncp = 0, lower.tail = TRUE, log.p = FALSE)
qbinom (p, size, prob, lower.tail = TRUE, log.p = FALSE)
qbirthday (prob = 0.5, classes = 365, coincident = 2)

R naming conventions

R function names can have characters such as a < and - in them, which are not legal as symbol names in APLX. To call these in APLX as direct method calls, you need to escape the illegal character with a $ character. (This is not of course necessary when using ⎕EVAL, where the string is passed as-is to R).

For example, to call attr<- from APLX, you would call r.attr$<$-.

Conversion of R data types to APL data

Simple numeric arrays and arrays of strings passed from APLX to R are converted directly to the R equivalent array, and are converted back automatically ('unboxed') when referenced or returned from an R function call, unless you use ⎕REF to force an object reference to be returned:

      r.y←2.2 3.3 4.4

      r.y
2.2 3.3 4.4
      r.y.⎕ref
[r:numeric]
      (r.y.⎕ref).⎕ds    ⍝ Use R to format the R array
[1] 2.2 3.3 4.4
      r.⎕eval 'mean(y)'
3.3

Complex, NA and NAN data types

The APLX R interface defines three special object classes for NA ('Not Available'), NaN ('Not A Number') and complex-number data, which R routines may return, or which you may want to pass as arguments into R functions.

For example, the following R expression returns a complex number:

      c←r.⎕eval '3+4i'
      c
[r:complex]
      c.format
3+4i

Instances of these object classes can be created by using ⎕NEW:

      NA←'r' ⎕new 'NA'
      NA
[r:NA]
      NAN←'r' ⎕new 'NAN'
      r.z←55.6 77.4 NAN 81 NA
      r.z
55.6 77.4 [r:NAN] 81 [r:NA]
      r.sqrt (⊂r.z)
7.456540753 8.797726979 [r:NAN] 9 [r:NA]

The complex class allows you to create either a single complex number, by using a constructor with two numbers for real/imaginary parts:

      c←'r' ⎕new 'complex' 2 3
      c
[r:complex]
      c.format
2+3i

or to build an R complex array by passing an array of length-2 vectors of the real and imaginary parts of each complex number:

      m←'r' ⎕new 'complex' (3 2⍴(1 2) (3 4) (5 6) (7 8) (9 10) (11 12))
      m
[r:matrix]
      m.format
 1+ 2i  3+ 4i
 5+ 6i  7+ 8i
 9+10i 11+12i

You can access or specify the real and imaginary parts directly using the pseudo-properties real and imag of the complex object:

      m.real
1  3
5  7
9 11
      m.imag←3 2⍴.1×⍳6
      m.format
 1+0.1i  3+0.2i
 5+0.3i  7+0.4i
 9+0.5i 11+0.6i
      m.imag
0.1 0.2
0.3 0.4
0.5 0.6

NAs and NaNs are also supported in Complex arrays:

      v←'r' ⎕new 'complex' ((3.2 3.4) NA (1.1 8.2))
      v.format
3.2+3.4i        NA 1.1+8.2i
      
      v.real
3.2 [r:NA] 1.1
      v.imag
3.4 [r:NA] 8.2
      (r.sqrt v).format
1.983563+0.857043i                  NA 2.164885+1.893865i

Advanced R data types

Other R types, such as factors and lists, are left 'boxed up' as references to the underlying R object (unless you use ⎕VAL to force an unbox, if this is possible):

      lst←r.⎕eval 'list(name="Fred",age=99)
      lst
[r:list]
      lst.⎕val
 Fred  99
      ⎕display lst.⎕val

An object which is still boxed up can be passed as an argument to an R function:

      r.length lst
2
      r.names lst
 name age

As a convenience you can also write this last example as:

      lst.length 
2
     lst.names
 name age

This works because APLX treats the expression

 	obj.function arg1,arg2,...

...as equivalent to:

 	r.function   obj,arg1,arg2,...

Examining an object with ⎕DS

The system method ⎕DS can be used to examine an R object. It's equivalent to calling the print method when working in an interactive R session.

      lst←r.⎕eval 'list(name="Fred",age=99)
      lst
[r:list]
      lst.⎕ds
$name
[1] "Fred"

$age
[1] 99

Functions on the left side of an R assignment

In R, a function name can sometimes be given on the left side of an R assignment as the fourth line of the following example written in the R language shows:

      > lst<-list(name="Fred",age=99)
      > names(lst)
      [1] "name" "age" 
      > names(lst)<-c("firstname", "age")
      > names(lst)
      [1] "firstname" "age"      

What actually happens 'under the hood' is that R treats an assignment like:

      function(obj) <- value

...as being a call to a function called "function<-" with the function result assigned to the object, i.e.

      obj  <-  "function<-" (obj, value)

If you wanted to call this function in APLX you could do so, using the $ character to escape the function name:

      lst←lst.names$<$- (⊂'firstname' 'age')
      lst.names
 firstname age

However, APLX also supports a much more convenience syntax:

	lst.names←'firstname' 'age'

Indexing lists by name

In the R language a list can be indexed either by number or by name, e.g.

      > lst[[2]]
      $age
      [1] 99

      > lst$age
      [1] 99

This is achieved by special R indexing functions called [[ and $ which can also be called from APLX (once again using a $ to escape the function name):

      lst.$[$[ 2
99
      lst.$$ 'age'
99

It is also possible to change the value of a list item, which you would do in R by writing "lst$age<-95". Under the hood, R is using a function called $<- which we can call from APLX:

      lst←lst.$$$<$- 'age' 95

Attributes

R objects can have attributes attached to them. By convention, any reference to ∆XXX is interpreted as an implicit call to attr(obj, XXX):

      ⍝ Get a copy of the R 'Iris' variable, a sample 'data.frame'
      iris←r.iris
      iris
[r:frame]
      (iris.attributes).names
 names row.names class
      iris.∆names
 Sepal.Length Sepal.Width Petal.Length Petal.Width Species

You can also change the value of attributes or add your own. Any assignment to ∆XXX is interpreted as an implicit call to attr<-(obj, XXX):

      f.∆mycustomatt ← 'Some attribute'
      f.∆mycustomatt
Some attribute
      ⍝ Longer-winded way of doing the same thing, but creating a new object:
      f2←r.attr$<$- f 'mycustomattr' 'Some other attribute'
      r.attr f2 'mycustomattr'
Some other attribute

Here is an example of creating an R data.frame object from some APL data:

      data←?3 5⍴100      ⍝ Random APL data for demo                 
      data
95  6  77 78 83
13  2  69 87 63
74 73 100 89 24

      frame←r.data.frame (⊂data)
      frame.attributes.⎕ds
$names
[1] "X1" "X2" "X3" "X4" "X5"

$row.names
[1] 1 2 3

$class
[1] "data.frame"


      frame.∆names←'Fish' 'Chips' 'Ham' 'Eggs' 'Tea'

      frame.⎕ds
  Fish Chips Ham Eggs Tea
1   95     6  77   78  83
2   13     2  69   87  63
3   74    73 100   89  24

      frame.summary.⎕ds
      Fish           Chips           Ham             Eggs            Tea
 Min.   :13.00   Min.   : 2.0   Min.   : 69.0   Min.   :78.00   Min.   :24.00
 1st Qu.:43.50   1st Qu.: 4.0   1st Qu.: 73.0   1st Qu.:82.50   1st Qu.:43.50
 Median :74.00   Median : 6.0   Median : 77.0   Median :87.00   Median :63.00
 Mean   :60.67   Mean   :27.0   Mean   : 82.0   Mean   :84.67   Mean   :56.67
 3rd Qu.:84.50   3rd Qu.:39.5   3rd Qu.: 88.5   3rd Qu.:88.00   3rd Qu.:73.00
 Max.   :95.00   Max.   :73.0   Max.   :100.0   Max.   :89.00   Max.   :83.00
 
     frame.plot

Using the R interface from multiple APL tasks

Because it is not safe to call the R interpreter from multiple threads, you cannot use the R interface from more than one APL task at a time. If you try to do so, you will get an error message and a FILE LOCKED error:

      r←'r' ⎕new 'r'
This interface cannot be used by more than one APL task at a time
FILE LOCKED
      r←'r' ⎕new 'r'
         ^

The lock will be cleared when the APL task which has been accessing R executes a )CLEAR, )LOAD, or )OFF.


Topic: APLX Help : Interfacing to other languages : Interfacing to R
[Next | Previous | Contents | Index | APL Home ]