{"id":968,"date":"2016-11-07T18:05:33","date_gmt":"2016-11-07T18:05:33","guid":{"rendered":"http:\/\/blog.stratio.com\/?p=968"},"modified":"2023-09-20T13:47:59","modified_gmt":"2023-09-20T13:47:59","slug":"stratio-crossdata-vs-presto","status":"publish","type":"post","link":"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/","title":{"rendered":"Stratio Crossdata vs Presto"},"content":{"rendered":"<h2>Introduction<\/h2>\n<p style=\"text-align: justify;\">Nowadays, there are a lot of Big Data query engines available. Some companies struggle to choose which one to use. Benchmarks exist, but results can be contradictory and thus difficult to trust.<\/p>\n<p style=\"text-align: justify;\">One Big Data query engine that is frequently mentioned\u00a0is Presto. We wanted to find out more about its potential and decided to compare it with Crossdata in a controlled environment, given that Crossdata is a data hub that extends the capabilities of Apache Spark. We detected that\u00a0the most popular persisting layers in our projects are Apache Cassandra, MongoDB and HDFS+Parquet, but that\u00a0MongoDB is not supported by Presto. The benchmark was therefore carried out with Apache Cassandra and HDFS+parquet only.<\/p>\n<p style=\"text-align: justify;\">Crossdata provides additional features and optimizations to the SQLContext of Spark through the XDContext. It can be deployed as a\u00a0<a href=\"http:\/\/spark-packages.org\/package\/Stratio\/spark-crossdata\">library of Apache Spark<\/a>\u00a0or using a Client-Server architecture where the cluster of servers form a P2P structure.<\/p>\n<p><!--more--><\/p>\n<p>In the first case, these additional features include:<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Native access to Apache Cassandra, MongoDB and Elasticsearch (resolve queries without using Spark resources)<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Mixing data from different data sources<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Mixing data from batch and streaming with an SQL-like language<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Metadata discovery (importing all tables from a datastore using only one command)<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Logical views<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Persistent metadata catalog (no need to register all tables in every session)<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Creating tables in the datastores<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Data insertions.<\/p>\n<p>When using the P2P deployment, Crossdata offers:<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 JDBC\/ODBC self-contained<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Flat view of subdocuments and arrays<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 High-availability<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Load balancing<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 User groups<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Apache Zeppelin interpreter<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Query builder<\/p>\n<p>In the near future, the Crossdata team will be implementing more features such as:<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Global indexes (usage of inverted indexes).<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 API Rest.<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Spark procedures from a SQL-like language<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Drop tables in datastores.<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Usage of statistics for query planning optimization.<\/p>\n<h2>Why compare?<\/h2>\n<p style=\"text-align: justify;\">One of the main goals of the comparison between Crossdata and Presto is to check their behaviour when launching derived TPC-DS queries, which allows us to check their performance in Big Data environments. Two persistence layers were used for the benchmark: Cassandra and HDFS+Parquet. Cassandra is a NoSQL database ideal for high-speed, online transactional data while the combination of HDFS+Parquet focuses on data warehousing and data lake use cases.\u00a0The benchmark covers these\u00a0different\u00a0scenarios, but both fall within the Big Data landscape.<\/p>\n<p style=\"text-align: justify;\">It is important to highlight that these tests work with a single-user scenario and that no secondary indexes of Cassandra were used. Currently, Crossdata takes advantage of Stratio\u2019s Cassandra Lucene Index in order to resolve the queries natively when possible.<\/p>\n<p style=\"text-align: justify;\">The queries used for this benchmark give some insight into how Crossdata and Presto work when large volumes of data are examined, queries of various operational requirements and complexities (e.g., ad-hoc, reporting, iterative OLAP, data mining) are executed, and high CPU and IO load are needed.<\/p>\n<h2 style=\"text-align: justify;\">Environment<\/h2>\n<p style=\"text-align: justify;\">As mentioned before, the benchmark was\u00a0executed in a controlled environment. The Crossdata team made use of 8 Huawei XH628 servers located in our European offices. The following hardware was used:<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 2 Intel Xeon E5-2630V3 processors with 8 cores and 2\u00b44GHz.<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 64 GB RAM DDR4.<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 4 SATA disk with 1TB at 7\u00b45k.<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 2 10GbE interfaces<\/p>\n<p style=\"text-align: justify;\">Before\u00a0the benchmark, some configurations were tested with Presto and Crossdata, adjusting their parameters for the best stability and throughput results.<\/p>\n<p style=\"text-align: justify;\">Both Presto and Spark had 32 GB available in the JVM per node, with one server acting as Master, and the other seven acting as workers.<\/p>\n<p>In the end,\u00a0this was the configuration used:<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Presto<\/p>\n<p>\u25cb\u00a0\u00a0\u00a0\u00a0 112 cores in total (shared with Cassandra)<\/p>\n<p>\u25cb\u00a0\u00a0\u00a0\u00a0 134.4GB available for executing queries<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 Crossdata<\/p>\n<p>\u25cb\u00a0\u00a0\u00a0\u00a0 Spark Cluster (Standalone deployment)<\/p>\n<p>\u25a0\u00a0\u00a0\u00a0\u00a0 66 cores in Spark + 66 cores in Cassandra<\/p>\n<p>\u25a0\u00a0\u00a0\u00a0\u00a0 224GB available for executing queries<\/p>\n<p style=\"text-align: justify;\">The reason Presto has less available\u00a0memory than Spark, is because it automatically reserves 40% of the available JVM memory as heap for internal use in the cluster (7*32GB = 224 GB -&gt; 224 GB * 0.6 = 134.4GB available for all queries executed in the cluster). The memory in Spark works differently\u00a0given that it doesn\u2019t reserve a part of the memory statically to manage the cluster resources or operations. It just reserves and frees the memory as it is requested by the cluster so that it is more elastic in the memory management.<\/p>\n<p style=\"text-align: justify;\">The TPC-DS dataset used had a size of 840GB and it was created with tools respecting the official metadata of the specifications and creating all the tables required by the standard.<\/p>\n<p style=\"text-align: justify;\">The subset consists of the following 21 queries: q7, q11, q13, q15, q19, q25, q26, q29, q31, q43, q48, q55, q59, q66, q74, q76, q78, q84, q85, q91 and q93. These 21 queries were chosen because they are supported grammatically by both frameworks and because they represent a heterogeneous subset of the TPC-DS queries. Thus, different aspects and capabilities of resolving different types of queries were tested.<\/p>\n<p style=\"text-align: justify;\">However, for the comparison with Cassandra, a different set of queries were used:<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 q84 from the derived TPC-DS set of queries mentioned above.<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 n1: select * from customer<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 n2: select * from customer where c_customer_sk = 9808094<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 n3: select * from store_sales limit 5000<\/p>\n<p>\u25cf\u00a0\u00a0\u00a0\u00a0 n4: select * from store_sales where ss_item_sk = 51097 and ss_ticket_number = 36448921<\/p>\n<h2 style=\"text-align: justify;\">Results<\/h2>\n<p style=\"text-align: justify;\">The first thing to be noticed is that Presto didn\u2019t resolve all queries because it threw\u00a0an OutOfMemoryException and, therefore, only some queries of the 21 initial queries were valid for the comparison. It\u2019s important to mention that different configuration parameters were tried in order to maximize the number of queries resolved by Presto and, finally, the configuration mentioned above was the most successful one.<\/p>\n<p style=\"text-align: center;\"><span style=\"color: #ff0000;\"><a href=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2016\/11\/Table-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-992\" src=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2016\/11\/Table-1.png\" alt=\"Table\" width=\"557\" height=\"844\" srcset=\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Table-1.png 557w, https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Table-1-198x300.png 198w\" sizes=\"(max-width: 557px) 100vw, 557px\" \/><\/a><\/span><\/p>\n<p>Only the following queries were used for the Benchmark with HDFS+Parquet: q15, q19, q31, q43, q55, q76, q78 and q84.<\/p>\n<p style=\"text-align: center;\"><a href=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2016\/11\/Table2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-993\" src=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2016\/11\/Table2.png\" alt=\"Table2\" width=\"557\" height=\"237\" srcset=\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Table2.png 557w, https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Table2-300x128.png 300w\" sizes=\"(max-width: 557px) 100vw, 557px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">Only the\u00a0following queries were used for the Benchmark with Cassandra: n2 and n4.<\/p>\n<p style=\"text-align: justify;\">Again, we have to keep in mind that this benchmark was performed in a single-user environment.<\/p>\n<p style=\"text-align: justify;\">The following\u00a0graph represents the results with HDFS+Parquet with the aforementioned set of queries:<\/p>\n<p style=\"text-align: center;\"><a href=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2016\/11\/1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-988\" src=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2016\/11\/1.png\" alt=\"1\" width=\"600\" height=\"371\" srcset=\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/1.png 600w, https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/1-300x186.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">Due to the difference in resolution time of the different queries, the above graph has been created using\u00a0the Presto time for every query as the referent ratio (fixed to 1) versus the query resolution time in Crossdata. As shown, Crossdata resolves queries faster 5 out of 9 times\u00a0and on average, Crossdata obtains a global ratio that represents half of the one of Presto, approximately.<\/p>\n<p style=\"text-align: center;\"><a href=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2016\/11\/2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-989\" src=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2016\/11\/2.png\" alt=\"2\" width=\"600\" height=\"371\" srcset=\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/2.png 600w, https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/2-300x186.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">The above chart represents the total time of the 9 queries in both systems. Again, this figure shows that the accumulated time for resolving the queries in Presto is twice that of resolving the queries in Crossdata.<\/p>\n<p style=\"text-align: justify;\">The\u00a0following graph represents the results with Cassandra with the aforementioned set of queries:<\/p>\n<p style=\"text-align: center;\"><a href=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2016\/11\/Crossdata.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-995\" src=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2016\/11\/Crossdata.jpg\" alt=\"Crossdata\" width=\"600\" height=\"371\" srcset=\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Crossdata.jpg 600w, https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Crossdata-300x186.jpg 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">The above chart represents the resolution time of Presto and Crossdata with the 2 aforementioned queries on Cassandra. In this case, Crossdata executes these 2 queries natively, that is, Crossdata doesn\u2019t make use of the Spark cluster because the Crossdata planner identifies that both queries can be resolved using the direct access to Cassandra.<\/p>\n<h2>Conclusions<\/h2>\n<p style=\"text-align: justify;\">The first thing to highlight is that this benchmark doesn\u2019t take into account the queries that couldn\u2019t be resolved by Presto. In most cases, Presto threw an OutOfMemoryException, which is quite frustrating for the aggressive usage of the memory that Presto requires. On the other hand, Apache Spark behaves much more reliably given that, before having a problem with the memory space, it leverages the space in disk in a very efficient way. We admit that Presto might have been configured in a more effective way, but, even after spending days reading the documentation, we didn\u2019t find any better way to do it. This makes the tuning of Presto a very difficult task and, in many cases, requires very powerful environments with a large amount of resources, which contradicts the mantra in Big Data systems about using commodity hardware.<\/p>\n<p style=\"text-align: justify;\">In addition, even when Presto is able to resolve the query, the resolution time is longer in comparison with Crossdata when data is stored in HDFS with Parquet format. If you look closely, Crossdata is a little slower than Presto when resolving low latency queries &#8211; by about a few milliseconds. However, when executing iterative OLAP queries, Presto takes much longer than Crossata in resolving the query, therefore, the penalty is about seconds with heavy load calculations.<\/p>\n<p style=\"text-align: justify;\">By using Crossdata and Presto with Apache Cassandra, the results show that Crossdata, with the native access, and Apache Spark, with the push-downs of filters, make a better use of the datastore capabilities in order to have a better throughput.<\/p>\n<p style=\"text-align: justify;\">Crossdata is therefore the more suitable option in most cases and is more reliable than Presto in all of the scenarios. Crossdata is faster for cases where native access is going to be used.<\/p>\n<p style=\"text-align: justify;\">This post was written by Stratio&#8217;s Crossdata team: Miguel Angel Fernandez (<a href=\"https:\/\/twitter.com\/miguel_afd\" target=\"_blank\" rel=\"noopener\">@miguel_afd<\/a>),\u00a0Pablo Francisco Perez (<a href=\"https:\/\/twitter.com\/pfcoperez\" target=\"_blank\" rel=\"noopener\">@pfcoperez<\/a>),\u00a0Unai Sarasola,\u00a0David Arroyo, Juanjo Lopez (<a href=\"https:\/\/twitter.com\/Orcrsit\" target=\"_blank\" rel=\"noopener\">@Orcrsit<\/a>) and\u00a0Hugo Dominguez (<a href=\"https:\/\/twitter.com\/Huguito1906\" target=\"_blank\" rel=\"noopener\">@Huguito1906<\/a>)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Nowadays, there are a lot of Big Data query engines available. Some companies struggle to choose which one to use. Benchmarks exist, but results can be contradictory and thus difficult to trust.<\/p>\n","protected":false},"author":1,"featured_media":997,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[686],"tags":[85],"ppma_author":[795],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v22.9 (Yoast SEO v22.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Stratio Crossdata vs Presto - Stratio Blog<\/title>\n<meta name=\"description\" content=\"Nowadays, there are a lot of Big Data query engines available. Presto or Crossdata? Find out the becnhmark resuts executed in a controlled environment.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Stratio Crossdata vs Presto\" \/>\n<meta property=\"og:description\" content=\"Nowadays, there are a lot of Big Data query engines available. Presto or Crossdata? Find out the becnhmark resuts executed in a controlled environment.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/\" \/>\n<meta property=\"og:site_name\" content=\"Stratio\" \/>\n<meta property=\"article:published_time\" content=\"2016-11-07T18:05:33+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-09-20T13:47:59+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Presto_Crossdata.gif\" \/>\n\t<meta property=\"og:image:width\" content=\"730\" \/>\n\t<meta property=\"og:image:height\" content=\"312\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/gif\" \/>\n<meta name=\"author\" content=\"Stratio\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@stratiobd\" \/>\n<meta name=\"twitter:site\" content=\"@stratiobd\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Stratio\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/\"},\"author\":{\"name\":\"Stratio\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/d0377b199cd052b17e15c9ba44c45ab7\"},\"headline\":\"Stratio Crossdata vs Presto\",\"datePublished\":\"2016-11-07T18:05:33+00:00\",\"dateModified\":\"2023-09-20T13:47:59+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/\"},\"wordCount\":1468,\"publisher\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Presto_Crossdata.gif\",\"keywords\":[\"spark\"],\"articleSection\":[\"Product\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/\",\"url\":\"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/\",\"name\":\"Stratio Crossdata vs Presto - Stratio Blog\",\"isPartOf\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Presto_Crossdata.gif\",\"datePublished\":\"2016-11-07T18:05:33+00:00\",\"dateModified\":\"2023-09-20T13:47:59+00:00\",\"description\":\"Nowadays, there are a lot of Big Data query engines available. Presto or Crossdata? Find out the becnhmark resuts executed in a controlled environment.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/#primaryimage\",\"url\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Presto_Crossdata.gif\",\"contentUrl\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Presto_Crossdata.gif\",\"width\":730,\"height\":312},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.stratio.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Stratio Crossdata vs Presto\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#website\",\"url\":\"https:\/\/www.stratio.com\/blog\/\",\"name\":\"Stratio Blog\",\"description\":\"Corporate blog\",\"publisher\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.stratio.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#organization\",\"name\":\"Stratio\",\"url\":\"https:\/\/www.stratio.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png\",\"contentUrl\":\"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png\",\"width\":260,\"height\":55,\"caption\":\"Stratio\"},\"image\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/stratiobd\",\"https:\/\/es.linkedin.com\/company\/stratiobd\",\"https:\/\/www.youtube.com\/c\/StratioBD\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/d0377b199cd052b17e15c9ba44c45ab7\",\"name\":\"Stratio\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/image\/bb38888f58c2bb664646155f78ae6ccc\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e3387ad00609f34a56d6796400eb8191?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e3387ad00609f34a56d6796400eb8191?s=96&d=mm&r=g\",\"caption\":\"Stratio\"},\"description\":\"Stratio guides businesses on their journey through complete #DigitalTransformation with #BigData and #AI. Stratio works worldwide for large companies and multinationals in the sectors of banking, insurance, healthcare, telco, retail, energy and media.\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Stratio Crossdata vs Presto - Stratio Blog","description":"Nowadays, there are a lot of Big Data query engines available. Presto or Crossdata? Find out the becnhmark resuts executed in a controlled environment.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/","og_locale":"en_US","og_type":"article","og_title":"Stratio Crossdata vs Presto","og_description":"Nowadays, there are a lot of Big Data query engines available. Presto or Crossdata? Find out the becnhmark resuts executed in a controlled environment.","og_url":"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/","og_site_name":"Stratio","article_published_time":"2016-11-07T18:05:33+00:00","article_modified_time":"2023-09-20T13:47:59+00:00","og_image":[{"width":730,"height":312,"url":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Presto_Crossdata.gif","type":"image\/gif"}],"author":"Stratio","twitter_card":"summary_large_image","twitter_creator":"@stratiobd","twitter_site":"@stratiobd","twitter_misc":{"Written by":"Stratio","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/#article","isPartOf":{"@id":"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/"},"author":{"name":"Stratio","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/d0377b199cd052b17e15c9ba44c45ab7"},"headline":"Stratio Crossdata vs Presto","datePublished":"2016-11-07T18:05:33+00:00","dateModified":"2023-09-20T13:47:59+00:00","mainEntityOfPage":{"@id":"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/"},"wordCount":1468,"publisher":{"@id":"https:\/\/www.stratio.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/#primaryimage"},"thumbnailUrl":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Presto_Crossdata.gif","keywords":["spark"],"articleSection":["Product"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/","url":"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/","name":"Stratio Crossdata vs Presto - Stratio Blog","isPartOf":{"@id":"https:\/\/www.stratio.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/#primaryimage"},"image":{"@id":"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/#primaryimage"},"thumbnailUrl":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Presto_Crossdata.gif","datePublished":"2016-11-07T18:05:33+00:00","dateModified":"2023-09-20T13:47:59+00:00","description":"Nowadays, there are a lot of Big Data query engines available. Presto or Crossdata? Find out the becnhmark resuts executed in a controlled environment.","breadcrumb":{"@id":"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/#primaryimage","url":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Presto_Crossdata.gif","contentUrl":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2016\/11\/Presto_Crossdata.gif","width":730,"height":312},{"@type":"BreadcrumbList","@id":"https:\/\/www.stratio.com\/blog\/stratio-crossdata-vs-presto\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.stratio.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Stratio Crossdata vs Presto"}]},{"@type":"WebSite","@id":"https:\/\/www.stratio.com\/blog\/#website","url":"https:\/\/www.stratio.com\/blog\/","name":"Stratio Blog","description":"Corporate blog","publisher":{"@id":"https:\/\/www.stratio.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.stratio.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.stratio.com\/blog\/#organization","name":"Stratio","url":"https:\/\/www.stratio.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png","contentUrl":"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png","width":260,"height":55,"caption":"Stratio"},"image":{"@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/stratiobd","https:\/\/es.linkedin.com\/company\/stratiobd","https:\/\/www.youtube.com\/c\/StratioBD"]},{"@type":"Person","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/d0377b199cd052b17e15c9ba44c45ab7","name":"Stratio","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/image\/bb38888f58c2bb664646155f78ae6ccc","url":"https:\/\/secure.gravatar.com\/avatar\/e3387ad00609f34a56d6796400eb8191?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e3387ad00609f34a56d6796400eb8191?s=96&d=mm&r=g","caption":"Stratio"},"description":"Stratio guides businesses on their journey through complete #DigitalTransformation with #BigData and #AI. Stratio works worldwide for large companies and multinationals in the sectors of banking, insurance, healthcare, telco, retail, energy and media."}]}},"authors":[{"term_id":795,"user_id":1,"is_guest":0,"slug":"stratioadmin","display_name":"Stratio","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/e3387ad00609f34a56d6796400eb8191?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts\/968"}],"collection":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/comments?post=968"}],"version-history":[{"count":7,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts\/968\/revisions"}],"predecessor-version":[{"id":13942,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts\/968\/revisions\/13942"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/media\/997"}],"wp:attachment":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/media?parent=968"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/categories?post=968"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/tags?post=968"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=968"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}