{"id":482,"date":"2015-07-23T07:17:28","date_gmt":"2015-07-23T07:17:28","guid":{"rendered":"http:\/\/blog.stratio.com\/?p=482"},"modified":"2023-09-20T13:37:03","modified_gmt":"2023-09-20T13:37:03","slug":"mongodb-spark-connector-whitepaper","status":"publish","type":"post","link":"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/","title":{"rendered":"MongoDB &#8211; Spark Connector Whitepaper"},"content":{"rendered":"<p dir=\"ltr\" style=\"text-align: justify;\">We recently worked with<strong>\u00a0<a href=\"https:\/\/www.mongodb.com\/blog\/post\/leaf-in-the-wild-stratio-integrates-apache-spark-and-mongodb-to-unlock-new-customer-insights-for-one-of-worlds-largest-banks\" target=\"_blank\" rel=\"noopener\">MongoDB<\/a>\u00a0<\/strong>and their developer team for the analysis of their Hadoop based connector Vs our native connector solution. The paper highlights how\u00a0<strong>Stratio&#8217;s connector for\u00a0<a href=\"http:\/\/spark.apache.org\/\" target=\"_blank\" rel=\"noopener\">Apache Spark<\/a><\/strong>\u00a0implements the PrunedFilteredScan API instead of the TableScan API which effectively allows you to avoid scanning the entire collection.<\/p>\n<p dir=\"ltr\"><!--more--><\/p>\n<p dir=\"ltr\" style=\"text-align: justify;\">Our connector supports the\u00a0<strong>Spark Catalyst<\/strong>\u00a0optimizer for both rule-based and cost-based query optimization. To operate against multi-structured data, the connector infers the schema by sampling documents from the MongoDB collection. This process is controlled by the samplingRatio parameter. If the schema is known, the developer can provide it to the connector, avoiding the need for any inference.<strong>\u00a0Once data is stored in MongoDB, Stratio provides an ODBC\/JDBC connector for integrating results with any BI tool.<\/strong><\/p>\n<p dir=\"ltr\" style=\"text-align: justify;\">The connector can be downloaded from the community\u00a0<a href=\"http:\/\/spark-packages.org\/package\/Stratio\/spark-mongodb\" target=\"_blank\" rel=\"noopener\">Spark Packages repository<\/a>. Installation is simple \u2013 the connector can be included in a Spark application with a single command. One of the main advantages of implementing the Dataframe API from Spark is that you can integrate different data sources, i.e you could make a join between a\u00a0<strong>MongoDB<\/strong>\u00a0table and an\u00a0<strong>ElasticSearch<\/strong>\u00a0collection.<\/p>\n<p dir=\"ltr\" style=\"text-align: justify;\">Many thanks to\u00a0<strong>Mat Keep<\/strong>\u00a0and\u00a0<strong>Sam Weaver<\/strong>\u00a0from\u00a0<strong>MongoDB<\/strong>, and our team of devs for making the analysis. Download the whitepaper\u00a0<a href=\"https:\/\/www.mongodb.com\/collateral\/apache-spark-and-mongodb-turning-analytics-into-real-time-action\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This paper highlights how\u00a0Stratio&#8217;s connector for\u00a0Apache Spark\u00a0implements the PrunedFilteredScan API instead of the TableScan API which effectively allows you to avoid scanning the entire collection.<\/p>\n","protected":false},"author":2,"featured_media":498,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[597],"tags":[19],"ppma_author":[794],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v22.9 (Yoast SEO v22.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>MongoDB-Spark connector Whitepaper - Stratio Blog<\/title>\n<meta name=\"description\" content=\"Stratio recently worked with MongoDB and their developer team for the analysis of their Hadoop based connector Vs our native connector solution.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"MongoDB - Spark Connector Whitepaper\" \/>\n<meta property=\"og:description\" content=\"Stratio recently worked with MongoDB and their developer team for the analysis of their Hadoop based connector Vs our native connector solution.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/\" \/>\n<meta property=\"og:site_name\" content=\"Stratio\" \/>\n<meta property=\"article:published_time\" content=\"2015-07-23T07:17:28+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-09-20T13:37:03+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2015\/07\/Connector-1-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"730\" \/>\n\t<meta property=\"og:image:height\" content=\"312\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@stratiobd\" \/>\n<meta name=\"twitter:site\" content=\"@stratiobd\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/\"},\"author\":{\"name\":\"admin\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/af4f5fbbeb95bd7d55f79d9a677e615d\"},\"headline\":\"MongoDB &#8211; Spark Connector Whitepaper\",\"datePublished\":\"2015-07-23T07:17:28+00:00\",\"dateModified\":\"2023-09-20T13:37:03+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/\"},\"wordCount\":218,\"commentCount\":4,\"publisher\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2015\/07\/Connector-1-1.png\",\"keywords\":[\"Big Data\"],\"articleSection\":[\"Whitepapers\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/\",\"url\":\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/\",\"name\":\"MongoDB-Spark connector Whitepaper - Stratio Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2015\/07\/Connector-1-1.png\",\"datePublished\":\"2015-07-23T07:17:28+00:00\",\"dateModified\":\"2023-09-20T13:37:03+00:00\",\"description\":\"Stratio recently worked with MongoDB and their developer team for the analysis of their Hadoop based connector Vs our native connector solution.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#primaryimage\",\"url\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2015\/07\/Connector-1-1.png\",\"contentUrl\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2015\/07\/Connector-1-1.png\",\"width\":730,\"height\":312},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.stratio.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"MongoDB &#8211; Spark Connector Whitepaper\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#website\",\"url\":\"https:\/\/www.stratio.com\/blog\/\",\"name\":\"Stratio Blog\",\"description\":\"Corporate blog\",\"publisher\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.stratio.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#organization\",\"name\":\"Stratio\",\"url\":\"https:\/\/www.stratio.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png\",\"contentUrl\":\"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png\",\"width\":260,\"height\":55,\"caption\":\"Stratio\"},\"image\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/stratiobd\",\"https:\/\/es.linkedin.com\/company\/stratiobd\",\"https:\/\/www.youtube.com\/c\/StratioBD\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/af4f5fbbeb95bd7d55f79d9a677e615d\",\"name\":\"admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/image\/589aaf4b404b1fe099b09564062c4563\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9b181ae4395243dccaf1c3e3a4749d81?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/9b181ae4395243dccaf1c3e3a4749d81?s=96&d=mm&r=g\",\"caption\":\"admin\"}}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"MongoDB-Spark connector Whitepaper - Stratio Blog","description":"Stratio recently worked with MongoDB and their developer team for the analysis of their Hadoop based connector Vs our native connector solution.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/","og_locale":"en_US","og_type":"article","og_title":"MongoDB - Spark Connector Whitepaper","og_description":"Stratio recently worked with MongoDB and their developer team for the analysis of their Hadoop based connector Vs our native connector solution.","og_url":"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/","og_site_name":"Stratio","article_published_time":"2015-07-23T07:17:28+00:00","article_modified_time":"2023-09-20T13:37:03+00:00","og_image":[{"width":730,"height":312,"url":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2015\/07\/Connector-1-1.png","type":"image\/png"}],"author":"admin","twitter_card":"summary_large_image","twitter_creator":"@stratiobd","twitter_site":"@stratiobd","twitter_misc":{"Written by":"admin","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#article","isPartOf":{"@id":"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/"},"author":{"name":"admin","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/af4f5fbbeb95bd7d55f79d9a677e615d"},"headline":"MongoDB &#8211; Spark Connector Whitepaper","datePublished":"2015-07-23T07:17:28+00:00","dateModified":"2023-09-20T13:37:03+00:00","mainEntityOfPage":{"@id":"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/"},"wordCount":218,"commentCount":4,"publisher":{"@id":"https:\/\/www.stratio.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#primaryimage"},"thumbnailUrl":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2015\/07\/Connector-1-1.png","keywords":["Big Data"],"articleSection":["Whitepapers"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/","url":"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/","name":"MongoDB-Spark connector Whitepaper - Stratio Blog","isPartOf":{"@id":"https:\/\/www.stratio.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#primaryimage"},"image":{"@id":"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#primaryimage"},"thumbnailUrl":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2015\/07\/Connector-1-1.png","datePublished":"2015-07-23T07:17:28+00:00","dateModified":"2023-09-20T13:37:03+00:00","description":"Stratio recently worked with MongoDB and their developer team for the analysis of their Hadoop based connector Vs our native connector solution.","breadcrumb":{"@id":"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#primaryimage","url":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2015\/07\/Connector-1-1.png","contentUrl":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2015\/07\/Connector-1-1.png","width":730,"height":312},{"@type":"BreadcrumbList","@id":"https:\/\/www.stratio.com\/blog\/mongodb-spark-connector-whitepaper\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.stratio.com\/blog\/"},{"@type":"ListItem","position":2,"name":"MongoDB &#8211; Spark Connector Whitepaper"}]},{"@type":"WebSite","@id":"https:\/\/www.stratio.com\/blog\/#website","url":"https:\/\/www.stratio.com\/blog\/","name":"Stratio Blog","description":"Corporate blog","publisher":{"@id":"https:\/\/www.stratio.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.stratio.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.stratio.com\/blog\/#organization","name":"Stratio","url":"https:\/\/www.stratio.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png","contentUrl":"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png","width":260,"height":55,"caption":"Stratio"},"image":{"@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/stratiobd","https:\/\/es.linkedin.com\/company\/stratiobd","https:\/\/www.youtube.com\/c\/StratioBD"]},{"@type":"Person","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/af4f5fbbeb95bd7d55f79d9a677e615d","name":"admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/image\/589aaf4b404b1fe099b09564062c4563","url":"https:\/\/secure.gravatar.com\/avatar\/9b181ae4395243dccaf1c3e3a4749d81?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9b181ae4395243dccaf1c3e3a4749d81?s=96&d=mm&r=g","caption":"admin"}}]}},"authors":[{"term_id":794,"user_id":2,"is_guest":0,"slug":"admin","display_name":"admin","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/9b181ae4395243dccaf1c3e3a4749d81?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts\/482"}],"collection":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/comments?post=482"}],"version-history":[{"count":10,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts\/482\/revisions"}],"predecessor-version":[{"id":13943,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts\/482\/revisions\/13943"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/media\/498"}],"wp:attachment":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/media?parent=482"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/categories?post=482"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/tags?post=482"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=482"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}