{"id":719,"date":"2016-04-13T10:12:19","date_gmt":"2016-04-13T10:12:19","guid":{"rendered":"http:\/\/blog.stratio.com\/?p=719"},"modified":"2023-09-20T13:44:03","modified_gmt":"2023-09-20T13:44:03","slug":"using-spark-sqlcontext-hivecontext-spark-dataframes-api","status":"publish","type":"post","link":"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/","title":{"rendered":"Using Spark SQLContext, HiveContext &amp; Spark Dataframes API with ElasticSearch, MongoDB &amp; Cassandra"},"content":{"rendered":"<p><span style=\"font-size: inherit; text-align: justify;\">In this post we will show how to use the different SQL contexts for data query on Spark.<\/span><!--more--><\/p>\n<p style=\"text-align: justify;\"><span style=\"line-height: 1.5;\">We will begin with Spark SQL and follow up with HiveContext. In addition to this, we will conduct queries on various NoSQL databases and analyze the advantages \/ disadvantages of using them, so without further ado, let\u2019s get started!<\/span><\/p>\n<p style=\"text-align: justify;\">First of all we need to create a context that will add Spark to the configuration options for connecting to Cassandra:<\/p>\n<pre class=\"lang:default decode:true\">val sparkConf = new SparkConf().setAppName(\"sparkSQLExamples\").setMaster(\"local[*]\")\n    .setIfMissing(\"hive.execution.engine\", \"spark\")\n    .setIfMissing(\"spark.cassandra.connection.host\", \"127.0.0.1\")\n    .setIfMissing(\"spark.cassandra.connection.port\", \"9042\")\nval sparkContext = new SparkContext(sparkConf)\n<\/pre>\n<p style=\"text-align: justify;\">Spark SQLContext allows us to connect to different Data Sources to write or read data from them, but it has limitations, namely that when the program ends or the Spark shell is closed, all links to the datasoruces we have created are temporary and will not be available in the next session.<\/p>\n<p style=\"text-align: justify;\">This limitation is solved with HiveContext, since it uses a MetaStore to store the information of those &#8220;external&#8221; tables. In our example, this MetaStore is MySql. This configuration is included in a resource file (hive-site.xml) used by Hive. You can see the different properties in the <a href=\"https:\/\/github.com\/compae\/SparkSqlExamples\"><strong>GitHub<\/strong><\/a> project as the user and be careful if you are setting in the environment variables as the HADOOP_USER_NAME and HADOOP_CONF_DIR<\/p>\n<p style=\"text-align: justify;\">At first glance it seems that everything is solved, but we have lost high availability. If you do not want to miss out on this, you can use XDContext with Stratio Crossdata, which is capable of storing the MetaStore in Zookeeper.<\/p>\n<pre class=\"lang:default decode:true\">val sqlContext = new SQLContext(sparkContext)\nval hiveContext = new HiveContext(sparkContext)\n<\/pre>\n<p>To use the datasources\u2019 API we need to know how to create DataFrames. Two concepts that are basic:<\/p>\n<ol>\n<li>Schema:<\/li>\n<\/ol>\n<p>In one DataFrame Spark is nothing more than an RDD composed of Rows which have a schema where we indicate the name and type of each column of the Rows.<\/p>\n<pre class=\"lang:default decode:true \">val schema = new StructType(Array(StructField(\"id\", StringType, false)))<\/pre>\n<ol start=\"2\">\n<li>RDD[Row]:<\/li>\n<\/ol>\n<p style=\"text-align: justify;\">Each element of the RDD has to be a Row, which is a set of values. Normally we have to transform an RDD of another type to an RDD of Rows.<\/p>\n<pre class=\"lang:default decode:true \">val registers = for (a &lt;- 0 to 10000) yield a.toString\nval rdd = sparkContext.parallelize(registers)\nval rddOfRow = rdd.map(Row(_))\n<\/pre>\n<p>With all this we are able to create a Data Frame both with SqlContext as well as HiveContext:<\/p>\n<pre class=\"lang:default decode:true \">val dataFrame = sqlContext.createDataFrame(rddOfRow, schema)<\/pre>\n<p style=\"text-align: justify;\">Another option to create DataFrames is using RDD[Case Class]. Each element of the RDD has to be a case class. Normally we have to transform an RDD of another type to an RDD of our case class.<\/p>\n<pre class=\"lang:default decode:true \">case class IdClass(id: String)  \nval registers = for (a &lt;- 0 to 10000) yield a.toString\nval rdd = sparkContext.parallelize(registers)\nval rddOfClass = rdd.map(IdClass(_))\nval dataFrame = sqlContext.createDataFrame(rddOfClass)\n<\/pre>\n<p style=\"text-align: justify;\">We will be able to store any DataFrame we have created with simple configuration parameters in tables, indexes or collections in Cassandra, Elasticsearch, or MongoDB, respectively. Thanks to the different implementations that Spark Packages DataStax, Elastic or Stratio offer us.<\/p>\n<p style=\"text-align: justify;\">MongoDB:<\/p>\n<pre class=\"lang:default decode:true\">val mongoDbOptions = Map(\n    \"host\" -&gt; \"localhost:27017\",\n    \"database\" -&gt; \"mongodatabase\",\n    \"collection\" -&gt; \"mongoclient\"\n)\n\ndataFrame.write\n    .format(\"com.stratio.datasource.mongodb\")\n    .mode(SaveMode.Append)\n    .options(mongoDbOptions)\n    .save()\n<\/pre>\n<p>Note: We can also insert items in a collection using the functions that the Stratio library offers us.<\/p>\n<pre class=\"lang:default decode:true\">val mongoDbOptionsLib = Map(\n    \"host\" -&gt; \"localhost:27017\",\n    \"database\" -&gt; \"mongodatabase\",\n    \"collection\" -&gt; \"mongoclientlib\"\n)    \t\nval libarayConfig = MongodbConfigBuilder(mongoDbOptionsLib)\ndataFrame.saveToMongodb(libarayConfig.build)\n<\/pre>\n<p>ElasticSearch:<\/p>\n<pre class=\"lang:default decode:true\">val elasticOptions = Map(\"es.mapping.id\" -&gt; \"id\",\n    \"es.nodes\" -&gt; \"localhost\",\n    \"es.port\" -&gt; \"9200\",\n    \"es.index.auto.create\" -&gt; \"yes\"\n)\n\ndataFrame.write.format(\"org.elasticsearch.spark.sql\")\n    .mode(SaveMode.Append)\n    .options(elasticOptions)\n    .save(s\"$elasticIndex\/$elasticMapping\")\n<\/pre>\n<p>Note: We can also insert items in a collection using the functions that the Elastic library offers us.<\/p>\n<pre class=\"lang:default decode:true \">dataFrame.saveToEs(s\"$elasticIndex\/$elasticMappingLib\", elasticOptions)<\/pre>\n<p>Cassandra:<\/p>\n<pre class=\"lang:default decode:true\">val cassandraOptions = Map(\"table\" -&gt; cassandraTable, \"keyspace\" -&gt; cassandraKeyspace)\n\ndataFrame.write\n    .format(\"org.apache.spark.sql.cassandra\")\n    .mode(SaveMode.Append)\n    .options(cassandraOptions)\n    .save()\n<\/pre>\n<p>Now that we know how to write information in each of the NoSQL databases, lets see how we can consult and read from each of them:<\/p>\n<p>1. Using the new API functions of DataFrames.<\/p>\n<pre class=\"lang:default decode:true\">val dataFrameSelectElastic = sqlContext.read.format(\"org.elasticsearch.spark.sql\")\n    .options(elasticOptions)\n    .load()\n    .select(\"id\")\n\nval dataFrameSelectMongo = sqlContext.read.format(\"com.stratio.datasource.mongodb\")\n    .options(mongoDbOptions)\n    .load()\n    .select(\"id\")\n\nval dataFrameSelectCassandra = sqlContext.read.format(\"org.apache.spark.sql.cassandra\")\n    .options(cassandraOptions)\n    .load()\n    .select(\"id\")\n\ndataFrameSelectElastic.registerTempTable(\"tempelastic\")\ndataFrameSelectMongo.registerTempTable(\"tempmongo\")\ndataFrameSelectCassandra.registerTempTable(\"tempcassandra\")\n\nsqlContext.sql(\"select * from tempelastic\")\nsqlContext.sql(\"select * from tempmongo\")\nsqlContext.sql(\"select * from tempcassandra\")\n<\/pre>\n<p style=\"text-align: justify;\">2. Creating the physical tables and temporary external tables within the Spark SqlContext are experimental, if you use HiveContext only create the temporary table, for use this feature correctly you can use CrossdataContext (XDContext).<\/p>\n<pre class=\"lang:default decode:true\">XDContext.createExternalTable(\"externalelastic\", \"org.elasticsearch.spark.sql\", schema, elasticOptions)\nXDContext.createExternalTable(\"externalmongo\", \"com.stratio.datasource.mongodb\", schema, mongoDbOptions)\nXDContext.createExternalTable(\"externalcassandra\", \"org.apache.spark.sql.cassandra\", schema, cassandraOptions)\n\nXDContext.sql(\"select * from externalelastic\")\nXDContext.sql(\"select * from externalmongo\")\nXDContext.sql(\"select * from externalcassandra\")\n<\/pre>\n<p>3. Using HiveContext creating a link to the physical tables and storing it in Hive\u2019s MetaStore.<\/p>\n<pre class=\"lang:default decode:true \">hiveContext.sql(s\"\"\"CREATE TABLE IF NOT EXISTS testElastic(id STRING)\n\t\t       |USING org.elasticsearch.spark.sql\n\t\t       |OPTIONS (\n\t\t       |   path '$elasticIndex\/$elasticMapping', readMetadata 'true', nodes '127.0.0.1', port '9200', cluster 'default'\n\t\t       | )\n\t\t     \"\"\".stripMargin)\n\nhiveContext.sql(s\"\"\"CREATE TABLE IF NOT EXISTS testCassandra(id STRING)\n\t\t        |USING \"org.apache.spark.sql.cassandra\"\n\t\t        |OPTIONS (\n\t\t        |   table 'cassandraclient', keyspace 'testkeyspace'\n\t\t        | )\n\t\t     \"\"\".stripMargin)\n\nhiveContext.sql(s\"\"\"CREATE TABLE IF NOT EXISTS testMongo(id STRING)\n\t\t        |USING com.stratio.datasource.mongodb\"\n\t\t        |OPTIONS (\n\t\t        |  host 'localhost:27017', database 'mongodatabase', collection 'mongoclient'\n\t\t        | )\n\t\t     \"\"\".stripMargin)\n\nval queryElastic = hiveContext.sql(s\"SELECT id FROM testElastic limit 100\")\nval queryMongo = hiveContext.sql(s\"SELECT id FROM testMongo limit 100\")\nval queryCassandra = hiveContext.sql(s\"SELECT id FROM testCassandra limit 100\")\n<\/pre>\n<p style=\"text-align: justify;\">In this way we can have access to a SQL language with more functionality than each DataSource provides natively, for optimal access I recommend using Crossdata as it optimizes queries that run natively on each of the three databases NoSQL.<\/p>\n<p style=\"text-align: justify;\">The biggest advantage it offers, apart from the execution of queries in a cluster of Spark, based on memory is that we can do JOINS on the various NoSQL databases:<\/p>\n<pre class=\"lang:default decode:true\">val joinElasticCassandraMongo = hiveContext.sql(s\"SELECT tc.id from testCassandra as tc\" +\n    \ts\" JOIN testElastic as te ON tc.id = te.id\" +\n    \ts\" JOIN testMongo tm on tm.id = te.id\")\n<\/pre>\n<p>In the following link you can see all the code of the project, in <a href=\"https:\/\/github.com\/compae\/SparkSqlExamples\"><strong>GitHub<\/strong><\/a><strong>.<\/strong><\/p>\n<p>In order to run it is necessary to have Elasticsearch, Cassandra and MongoDB installed and running.<\/p>\n<p>Used versions:<\/p>\n<p>* Scala 2.10.4<\/p>\n<p>* Spark 1.5.2<\/p>\n<p>* Spark-MongoDb 0.11.1<\/p>\n<p>* Spark-ElasticSearch 2.2.0<\/p>\n<p>* Spark-Cassandra 1.5.0<\/p>\n<p>* Elasticsearch 1.7.2<\/p>\n<p>* Cassandra 2.2.5<\/p>\n<p>* MongoDB 3.0.7<\/p>\n<p>I hope I have clarified the different ways to access and write data with Spark in each of the three major NoSQL databases.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post we will show how to use the different SQL contexts for data query on Spark. We will begin with Spark SQL and follow up with HiveContext. In addition to this, we will conduct queries on various NoSQL databases and analyze the advantages \/ disadvantages of using them.<\/p>\n","protected":false},"author":1,"featured_media":721,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[686],"tags":[85],"ppma_author":[795],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v22.9 (Yoast SEO v22.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Using Spark SQLContext, HiveContext &amp; Spark Dataframes API with ElasticSearch, MongoDB &amp; Cassandra - Stratio Blog<\/title>\n<meta name=\"description\" content=\"Discover how to use the different SQL contexts for data query on Spark. From Spark SQL to HiveContext. In addition, queries on NoSQL databases.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Using Spark SQLContext, HiveContext &amp; Spark Dataframes API with ElasticSearch, MongoDB &amp; Cassandra\" \/>\n<meta property=\"og:description\" content=\"Discover how to use the different SQL contexts for data query on Spark. From Spark SQL to HiveContext. In addition, queries on NoSQL databases.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/\" \/>\n<meta property=\"og:site_name\" content=\"Stratio\" \/>\n<meta property=\"article:published_time\" content=\"2016-04-13T10:12:19+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-09-20T13:44:03+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/04\/post-2-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"730\" \/>\n\t<meta property=\"og:image:height\" content=\"312\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Stratio\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@stratiobd\" \/>\n<meta name=\"twitter:site\" content=\"@stratiobd\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Stratio\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/\"},\"author\":{\"name\":\"Stratio\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/d0377b199cd052b17e15c9ba44c45ab7\"},\"headline\":\"Using Spark SQLContext, HiveContext &amp; Spark Dataframes API with ElasticSearch, MongoDB &amp; Cassandra\",\"datePublished\":\"2016-04-13T10:12:19+00:00\",\"dateModified\":\"2023-09-20T13:44:03+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/\"},\"wordCount\":689,\"publisher\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/04\/post-2-1.png\",\"keywords\":[\"spark\"],\"articleSection\":[\"Product\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/\",\"url\":\"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/\",\"name\":\"Using Spark SQLContext, HiveContext & Spark Dataframes API with ElasticSearch, MongoDB & Cassandra - Stratio Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/04\/post-2-1.png\",\"datePublished\":\"2016-04-13T10:12:19+00:00\",\"dateModified\":\"2023-09-20T13:44:03+00:00\",\"description\":\"Discover how to use the different SQL contexts for data query on Spark. From Spark SQL to HiveContext. In addition, queries on NoSQL databases.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/#primaryimage\",\"url\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/04\/post-2-1.png\",\"contentUrl\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/04\/post-2-1.png\",\"width\":730,\"height\":312},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.stratio.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Using Spark SQLContext, HiveContext &amp; Spark Dataframes API with ElasticSearch, MongoDB &amp; Cassandra\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#website\",\"url\":\"https:\/\/www.stratio.com\/blog\/\",\"name\":\"Stratio Blog\",\"description\":\"Corporate blog\",\"publisher\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.stratio.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#organization\",\"name\":\"Stratio\",\"url\":\"https:\/\/www.stratio.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png\",\"contentUrl\":\"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png\",\"width\":260,\"height\":55,\"caption\":\"Stratio\"},\"image\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/stratiobd\",\"https:\/\/es.linkedin.com\/company\/stratiobd\",\"https:\/\/www.youtube.com\/c\/StratioBD\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/d0377b199cd052b17e15c9ba44c45ab7\",\"name\":\"Stratio\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/image\/bb38888f58c2bb664646155f78ae6ccc\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e3387ad00609f34a56d6796400eb8191?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e3387ad00609f34a56d6796400eb8191?s=96&d=mm&r=g\",\"caption\":\"Stratio\"},\"description\":\"Stratio guides businesses on their journey through complete #DigitalTransformation with #BigData and #AI. Stratio works worldwide for large companies and multinationals in the sectors of banking, insurance, healthcare, telco, retail, energy and media.\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Using Spark SQLContext, HiveContext & Spark Dataframes API with ElasticSearch, MongoDB & Cassandra - Stratio Blog","description":"Discover how to use the different SQL contexts for data query on Spark. From Spark SQL to HiveContext. In addition, queries on NoSQL databases.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/","og_locale":"en_US","og_type":"article","og_title":"Using Spark SQLContext, HiveContext &amp; Spark Dataframes API with ElasticSearch, MongoDB &amp; Cassandra","og_description":"Discover how to use the different SQL contexts for data query on Spark. From Spark SQL to HiveContext. In addition, queries on NoSQL databases.","og_url":"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/","og_site_name":"Stratio","article_published_time":"2016-04-13T10:12:19+00:00","article_modified_time":"2023-09-20T13:44:03+00:00","og_image":[{"width":730,"height":312,"url":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/04\/post-2-1.png","type":"image\/png"}],"author":"Stratio","twitter_card":"summary_large_image","twitter_creator":"@stratiobd","twitter_site":"@stratiobd","twitter_misc":{"Written by":"Stratio","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/#article","isPartOf":{"@id":"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/"},"author":{"name":"Stratio","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/d0377b199cd052b17e15c9ba44c45ab7"},"headline":"Using Spark SQLContext, HiveContext &amp; Spark Dataframes API with ElasticSearch, MongoDB &amp; Cassandra","datePublished":"2016-04-13T10:12:19+00:00","dateModified":"2023-09-20T13:44:03+00:00","mainEntityOfPage":{"@id":"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/"},"wordCount":689,"publisher":{"@id":"https:\/\/www.stratio.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/#primaryimage"},"thumbnailUrl":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/04\/post-2-1.png","keywords":["spark"],"articleSection":["Product"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/","url":"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/","name":"Using Spark SQLContext, HiveContext & Spark Dataframes API with ElasticSearch, MongoDB & Cassandra - Stratio Blog","isPartOf":{"@id":"https:\/\/www.stratio.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/#primaryimage"},"image":{"@id":"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/#primaryimage"},"thumbnailUrl":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/04\/post-2-1.png","datePublished":"2016-04-13T10:12:19+00:00","dateModified":"2023-09-20T13:44:03+00:00","description":"Discover how to use the different SQL contexts for data query on Spark. From Spark SQL to HiveContext. In addition, queries on NoSQL databases.","breadcrumb":{"@id":"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/#primaryimage","url":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/04\/post-2-1.png","contentUrl":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/04\/post-2-1.png","width":730,"height":312},{"@type":"BreadcrumbList","@id":"https:\/\/www.stratio.com\/blog\/using-spark-sqlcontext-hivecontext-spark-dataframes-api\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.stratio.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Using Spark SQLContext, HiveContext &amp; Spark Dataframes API with ElasticSearch, MongoDB &amp; Cassandra"}]},{"@type":"WebSite","@id":"https:\/\/www.stratio.com\/blog\/#website","url":"https:\/\/www.stratio.com\/blog\/","name":"Stratio Blog","description":"Corporate blog","publisher":{"@id":"https:\/\/www.stratio.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.stratio.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.stratio.com\/blog\/#organization","name":"Stratio","url":"https:\/\/www.stratio.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png","contentUrl":"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png","width":260,"height":55,"caption":"Stratio"},"image":{"@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/stratiobd","https:\/\/es.linkedin.com\/company\/stratiobd","https:\/\/www.youtube.com\/c\/StratioBD"]},{"@type":"Person","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/d0377b199cd052b17e15c9ba44c45ab7","name":"Stratio","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/image\/bb38888f58c2bb664646155f78ae6ccc","url":"https:\/\/secure.gravatar.com\/avatar\/e3387ad00609f34a56d6796400eb8191?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e3387ad00609f34a56d6796400eb8191?s=96&d=mm&r=g","caption":"Stratio"},"description":"Stratio guides businesses on their journey through complete #DigitalTransformation with #BigData and #AI. Stratio works worldwide for large companies and multinationals in the sectors of banking, insurance, healthcare, telco, retail, energy and media."}]}},"authors":[{"term_id":795,"user_id":1,"is_guest":0,"slug":"stratioadmin","display_name":"Stratio","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/e3387ad00609f34a56d6796400eb8191?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts\/719"}],"collection":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/comments?post=719"}],"version-history":[{"count":10,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts\/719\/revisions"}],"predecessor-version":[{"id":13974,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts\/719\/revisions\/13974"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/media\/721"}],"wp:attachment":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/media?parent=719"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/categories?post=719"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/tags?post=719"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=719"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}