Data Analysis: Python? or R? – A Debate

Written by Suzie Shin: 

Given the wide variety of data analytics platforms available, each with their own strengths and weaknesses, choosing the right programming language when starting a new project is daunting. Currently, Python and R are two of the most popular languages people use when analyzing data. For newbie data analysts that are looking to break into the field, it can be overwhelming to decide which languages they should start learning first. Here, we provide an evaluation of each platform and our recommendations. The thoughts here are based on some of our own experiences, but also on some information and feedback we summarized from various internet discussions.  Enjoy, and please weigh in your own thoughts in the comments!

 

Usability:

Python is a general-purpose programming language known for its easy-to-learn and use nature.  It was originally designed for software development and engineering. It is not surprising that people with a programming background prefer using Python. This language provides a more general approach to data science and places a significant focus on production and deployment. Therefore, Python can be used to achieve a wider range of project goals and outcomes.

By contrast, the primary objective of R is more specific to the statistical analysis of data. The functionality of R was created with statisticians in mind; the program relies heavily on statistical models and relies less on syntax and rules. As a result, it is used more by scholars and researchers. Due to its statistics-oriented nature, R offers field-specific advantages over Python, such as more advanced and interactive data visualization capabilities (i.e. graphs). They are more informative and customizable, and can be presented directly. Python lacks R’s alternatives for popular data science libraries and its visualization capability is less informative, interactive, and less pleasing to look at.

Due to its general-purpose nature, Python can be integrated with other programming softwares, while R is better suited to run locally. Python is also more equipped to handle massive quantities of data; it can even source information directly from the internet. R is better designed for analyzing more specific datasets at a time and is better suited for importing and working with files directly (e.g. Excel).

 

Learning Curve:

Python is a general-purpose programming language with a readable syntax. R, however, is built by statisticians and encompasses their specific language. The general consensus is that while R is more difficult to learn at first, it is easy to navigate and use once you pass that initial hurdle. Python has a smoother and more linear learning trajectory compared to R.

 

Recommendation:

To summarize, knowing how to use Python provides you the versatility to work with a broad range of data-centric projects, while learning R gives you a stronger hold on the specific statistics and data science principles needed when analyzing information. Knowing how to utilize both Python and R is invaluable for the Data Science field and gives you an upper hand in Data Science projects. Both Python and R are not too difficult to learn, but if you have no previous programming experience, the task can seem daunting. So which should I learn first?

I recommend that you consider your current skill set. For instance, someone who has experience conducting research and using programs like MATLAB may find R much simpler to learn. Alternatively, as Python is an “Object Oriented Programming language” like C++ and Java, someone familiar with programming languages would find Python more intuitive than R. I also recommend that you think about your long-term career goals. If you’re passionate about the statistical calculation and data visualization portions of data analysis, mastering R would be more useful in your professional endeavors. If you have interests beyond this and see yourself working with big data, artificial intelligence, and deep learning algorithms, etc. Mastering Python would be useful. They are both excellent tools and any development of your coding skills will be worthwhile in our data-soaked world!

So, what is your vote?  R or Python?!