Despite the phenomenal rise of Python for data science, programming language R continues to attract developers building analytics applications. R’s open source community, the richness of its algorithms, its appeal with academia, and its convenience in use in consulting businesses are driving growth.
R, designed by two statisticians and released in 1995, was created for statistical modelling. The objective was to enable checking of statistics concepts quickly, use those concepts for data wrangling and analysis, and enable decision-making. “Whenever there is a need to quickly explore data or do reports with limited interventions, R becomes one of the preferred choices,” says Yogesh Parte, data science practice lead at CitiusTech.
Data has three formats – binary (numbers), text or voice, and pictures. R processes large volumes of binaries quickly, but not so text or images. That’s where Python scores. “On the computational front, R has lost to Python over the last five years. Python is the new champion, not on the richness of its algorithms, but on the huge volume of unstructured data – voice, text or pictures – that it can process,” says Allen Roy, head of analytics at Mashreq Global Services.
On the computational front, R has lost to Python over the last five years. But R has very high-level statistical and mathematical maturity
Allen Roy, Head of Analytics, Mashreq Global Services
R also provides more optimisation algorithms, which are used to find the most optimum solution. For one algorithm in Python or Java, there could be at least 30 variations of the same optimisation algorithm in R. “That is because people who developed it have done it for a specific purpose, and it involves a lot of granularities that define the specifications,” says Roy. Someone has to do a little research to pick the right package from R. “Those who are using R should be more literate in statistics, and they need to understand why they are using that optimisation algorithm,” says Roy.
R is very simple to understand and use. If you are not doing very complicated analysis or not building an engineering product, then R will suffice
Mayank Kumar, co-founder, upGrad
Often, R is used more as a learning package than as an implementation package. That’s because R lacks the sophistication of Python to be used in live production scenarios, to do real-time analytics and real-time implementation. So Python scores over R in machine learning applications.
For clinical trials and associated workflows, R is the most preferred language because many of the algorithms are readily available in R
Yogesh Parte, Data Science Practice Lead, CitiusTech
But Parte believes R can still be used effectively as a part of the data engineering backend, especially to build data pipelines. “R is good to build ETL (extract (raw data), transform and load) pipelines, though current industry trend favours Python. ETL can be called the backend of the data engineering infrastructure. The top layer is always either Python or Java. The core model can remain R, which can be integrated with Python,” says Parte.