This paper highlights how Stratio’s connector for Apache Spark implements the PrunedFilteredScan API instead of the TableScan API which effectively allows you to avoid scanning the entire collection.
It’s been almost two months since we introduced Stratio Sparkta at Strata London 2015, showing a demo for real-time insights on twitter hashtags. During this time we added some new features to the real-time aggregation engine based on Spark Streaming.
Once the Data Sources API has been released, we’ve wanted to take advantage of these new features and, for this reason, we have developed a Spark-MongoDB library. With this new connector we help the growing MongoDB community to simplify the interaction with this datasource via Spark.
In the next tutorial you will learn how to migrate data from MySQL to MongoDB. We will show you how to do it using Spark step by step. From creating a configuration for the player RDD to the installation guide for prerequisites components. Easy and intuitive!