Saturday, 29 October 2011

Using 'R' for Statistical Computing

I recently tried out the 'R' toolkit for performing some statistical operations and was impressed with the power and the extensibility of the toolkit in manipulating terra-byte sized data and its ability to produce publication ready graphs. Being open-source, it is available under the GNU GPL and can be downloaded from a CRAN site and set up to run on Windows, UNIX or Mac OS.
The toolkit basically provides a command-line enviroment for manipulating and loading data into memory arrays which can be subject to further statistical analysis. While the majority of the users see 'R' as a statistical package, it can be also be used as a modelling tool for linear / non-linear models.Apart from providing some high level functions for statistical computing, 'R' also has a well-developed suite for looping and conditional execution. It  also allows extensibility so that users can write their own functions.While the user-interface is easy to get used to, if you are coming from a UNIX background, it might take some getting used to for Windows users.

The 'Help' module is a well designed and is easy to use. Documentation on any function can be loaded from the command-line by preceeding the function name with a '?'. Overall, 'R' is a powerful toolkit for statistical computing and an excellent choice for manipulating large data sets.

No comments: