What I learned from processing big data with Spark

During my semester project, I was faced with the task of processing a large data set (6 TB) consisting of all the revisions in the English Wikipedia till October 2016. We chose Apache Spark as our cluster-computing framework, and hence I ended up spending a lot of time working with... [Read More]