The goal of this tutorial was to demonstrate how to use Apache SparkR for analyzing large-scale datasets in R. For demo purposes data extracted from the Bitcoin blockchain were used to produce a time series plot similar to this one.
The talk was structured as follows:
- Intro + Architecture Spark [Roland]
- Intro Bitcoin Use Case [Bernhard]
- Demo Standard R (+ some extra packages) [Bernhard]
- Demo SparkR [Bernhard]
- Demo SparkR - Cluster [Roland]
Materials are available at https://github.com/behas/sparkR-tutorial.
Best,
-ViennaR