In a strange twist of fate I have been tasked to introduce my overseas colleagues to R. LOL
I shall sneak some python bits in.
Before you begin, ask questions.
Ask if they have heard of R before. Ask if there is any reason anyone has asked them to learn about R. Ask what tools they are currently using, and what are their pain points.
From here you can establish a starting point, and an end point. What knowledge should they walk away with?
What is R? R is a language and environment for statistical computing and graphics. It is free. It is open source - meaning people can modify and share the source code. Think of it as a public collaboration.
What isn’t R? R is not a general programming language - it has its roots in statistics.
You can do minimal general programming in R such as automating mail sending in R (see below) or build a webapp using Shiny for data visualisation/displaying your Machine Learining output (see https://www.r-bloggers.com/tutorial-build-webapp-in-r-using-shiny/)).
library (RDCOMClient) OutApp <- COMCreate("Outlook.Application") outMail = OutApp$CreateItem(0) outMail[["To"]] = "[email protected]" outMail[["subject"]] = "Test Subject" outMail[["body"]] = "Body of email" outMail$Send()
The conclusion: if you are already operating in R for all your data use, by all means continue with it to build simple apps for your own personal use - you don’t need to know JS, HTML or CSS!
See here for more comments: https://www.quora.com/Do-you-prefer-using-Rs-Shiny-or-Pythons-Flask-to-make-data-focused-web-apps
Why not VBA/Excel?
Reproduction of previously conducted analyses on new datasets is not easy. In Excel, to replicate a report you did last month/quarter/year, you copy and paste the new data into the worksheets with formulas. There are a million and one excel sheets and references, and if you forget to drag the formula to cover sufficient rows, or someone accidentally presses a button that edits the formula…
It’s hard to detect such errors.
But fine, there is VBA. However, VBA is only used within Excel/Access. And at times, Excel is slow/unable to handle large datasets.
Also, here we are talking about basic data manipulation.
R can allow you to store a series of data manipulation steps in a script, and re-apply it across similar data. It can create data tables, but also graphs for visualisation. On top of that, hello machine learning algorithms!
Why not (just) R? After processing the data in R, you deliver a nice, clean set of numbers in a report to the end user.
They want to know why the numbers have changed. They want to cut, slice, dice the data, interact with it. Sure, you have data visualisation tools, but 1) where did those numbers come from? 2) how can I filter it by my own requirements?
- Where did those numbers come from? People want to know how you calculated those numbers (what do these numbers actaully represent?), see the intermediate steps.
It can be very convenient to just click on the cell to see how each number is actually derived.
- I want to filter/slice/dice it my way People want to interact with their numbers. Just for fun. This is where Tableau shines (can’t believe I’m saying this!), because the data it reads from is basically just a plain table, and you fix the visualisation type. Voila - the user, based on that type of visualisation, can select, filter, and change the data however they wish because Tableau is helping them to execute the backend querying from the plain table, and immediately visualising it for them. Point and click - boom!
(I am still not the biggest fan of Tableau, but I NEED TO FIND AN ALTERNATIVE FIRST)
Where can I get R? Install it at http://cran.r-project.org/ Also install R Studio from http://www.rstudio.com/ which provides a user friendly environment to work in.
How do I learn more about R?
A quick scroll reveals the basic functions, to the more complex machine capabilities of R. https://www.analyticsvidhya.com/blog/2016/02/complete-tutorial-learn-data-science-scratch/
For a more in depth tutorial to begin learning from scratch, I would suggest the below: http://paldhous.github.io/ucb/2016/dataviz/week7.html https://www.listendata.com/2016/08/dplyr-tutorial.html
Alternatives to R? Python! Highly recommended for readability and ability to extend to general programming ;)
It can do 95% of what R can do (some say in a more clunky manner as R’s syntax is really clean; but Python is more readable!!).