John's Techno Phile: August 2013

Saturday 17 August 2013

MongoDB - A database for Big Data

The need for managing large datasets that grow very rapidly has always been there. With social networking sites, generating millions of lines of data every minute, the need to store, process and analyse large datasets is not only required but also critical for surving in the race to remain relevant. This scenario has given an impetus to the rise of NO SQL databases. NO SQL or Not Only SQL databases can be divided into 4 main categories :

Key-Value databases : Voldemort
Graph databases : InfiniteGraph, Neo4J
Document databases : CouchDB, MongoDB
Column Family stores : Cassandra, HBase, Hyperbase

MongoDB is a non-relational JSON document store or database. It doesn't support the relational algebra that is most often expressed as SQL.
Documents are expressed as JSON and are stored within the MongoDB database in the BSON (Binary JSON) (bsonspec.org) format.
BSON supports all the data types available in JSON and a few more. It can support Strings, Floating-point numbers,Arrays, Objects and Timestamps data types.
MongoDB supports documents in the same collection that do not have have the same schema.This is referred to as supporting dynamic schema or it is schemaless.
There is no SQL, no transaction management and no Joins.
The fact that there is no transaction management across multiple documents and no JOINs makes MongoDB a better suited for scalability and performance.

Problems with traditional RDBMs and the need for MongoDB type No-SQL databases :

Traditional RDBMSs are based on concepts that promote strong referential integrity, data normalization and transaction management. This implies that every data model will have several database tables and to satisfy a query, it might need to perform a JOIN. JOINs are not your best friend when you are looking for speed and performance.

Transaction Management is another key area that provides reliablity and data consistency. However this leads to a drop in performance.

MongoDB does not utilize JOINs or provide Transaction Management. This results in a performance boost and data access is very fast as compared to Traditional RDBMSs.

While Transaction Management is not supported within MongoDB, it does however guarentee atomic transactions. Atomic transactions are updates affecting one document only. This one document may contain other sub-documents.

MongoDB uses binary-encoded JSON (BSON) which stores different data types efficiently, and the fact that no storage is allocated for fields that do not exist in a particular document,

implies that documents which do not have a complete set of entries, significant savings in storage can be achieved in comparison to RDBMs, where space must be reserved in every row for every field whether populated or null

Large document sets can also be split (or sharded) over multiple servers and automatically redistributed when additional servers are added for additional scalability

Real world use cases : SAP, Sourceforge, MTV, Twitter (http://www.mongodb.org/about/production-deployments/)

Thus, MongoDB is all about performance, scalability and speed, however there are still some scenarios where MongoDB might not be the best fit. These are as follows:

1. Not suited for an application requiring transaction management.

2. Designed to work behind fire walls so it has less security relative to RDBMs.

3. Documents in MongoDB are limited to 16MB. Once a collection reaches this size, it has to broken up and hosted across various shards.

Friday 2 August 2013

Securing Fedora Commons 3.6.2 with XACML Policies.

The Flexible Extensible Digital Object Repository Architecture aka Fedora Repository uses XACML based policies for authentication and authorization. XACML is an XML based policy language that is used to define access control lists and secure applications using a standard policy based approach.
The latest release of Fedora 3.6.2 promotes the JAAS based FeSL as the default security layer. FeSL which was introduced in 2011 is designed to improve upon the legacy XACML based security scheme that has been Fedora's backbone since its release. While FeSL simplifies security, XACML still retains its relevance in terms of managing and setting up access to the API-A and API-M interfaces. The API-A interface provides a read-only access to the repository's properties and its objects. API-M on the other hand enables management of the repository and allows edit access to the contained objects.

While Fedora ships with a basic set of XACML policies that provide a basis for securing access to the two interfaces, there will be scenaros when authoring a custom policy will be required. When attempting to write the policy writing guide is a good place to start. XACML policies are rule based and generally enforce either an PERMIT or a DENY result to a specific resource. Apart from these two results, there are the the Not Applicable and the Indeterminate results too.
When attempting to understand the application of the policies and the final results that are generated, the two key points to take away is that the DENY rule supercede's the PERMIT rule. For example, if an administrator is PERMITTED to access the API-A interface but there is a DENY rule on a specific API-A operation that applies to all users including the administrator, then the administrator will not have access to that specific operation.

Secondly, access to a resource has to be explictly granted. For example, if there is a DENY rule that limits access to the API-A interface to all non-admin users, it does not explicitly imply that administrators will be able to access the API-A interface. There has to be a PERMIT rule that gives them that access. Thus, while designing policies, do remember to check out section 3.3 in the policy enforcement guide. It might save you hours when you are trying to figure out why a certain user cannot access an interface !!