Discover the 3 most-read posts on our blog
Since Stratio’s creation in 2014, we have posted a total of 86 posts on our blog. We would like to congratulate and thank all those Stratians who have written their posts and taught us about their specialities and discoveries in relation to Spark, Machine Learning, Deep Learning, Scala, business, Kafka… We know that is hard to find time to read all of the blog posts, so here you have a recap of the 3 most-read posts published on our blog!
Using Spark SQLContext, HiveContext & Spark Dataframes API with ElasticSearch, MongoDB & CassandraIn this post we will show how to use the different SQL contexts for data query on Spark. We will begin with Spark SQL and follow up with HiveContext. In addition to this, we will conduct queries on various NoSQL databases and analyze the advantages / disadvantages of using them.
Optimizing Spark Streaming applications reading data from Apache KafkaSpark Streaming is one of the most widely used frameworks for real time processing in the world with Apache Flink, Apache Storm and Kafka Streams. However, when compared to the others, Spark Streaming has more performance problems and its process is through time windows instead of event by event, resulting in delay.
Profiling and segmentation: A graph database clustering solutionThis post is about an exciting journey that starts with a problem and ends with a solution. One of the top banks in Europe came to us with a request: they needed a better profiling system.
Silvia Mariscal is part of the Communications and Marketing team at Stratio. She studied Journalism and Media Studies at CEU San Pablo University and she has a MA in Corporate Communications and PR from the University of Leeds (UK). She is passionate about writing, cinema and yoga!