Blog

Paper of the week: “BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data” [1]

by | Mar 14, 2014 | Developer | 0 comments

This paper has been presented at the Eurosys 2013 conference and is avaiblable for download at the conference website. The paper presents BlinkDB that, despite its name, is not a database but a query engine on top of Hive and Shark, and it is used for running interactive SQL queries on large volumes of data using data samples.

BlinkDB is built using two key ideas: an adaptive optimization framework to build and maintain stratified samples, and a dynamic sample selection strategy to select appropiately sized sample based on a query’s accuracy or response time requirements.
 
This paper offers an interesting introduction on how to apply statistical inference technics on Big Data and makes clear that there is always a trade-off between accuracy and performance. In that regard, BlinkDB offers information about query accuracy so the user can make decisions. Although it is not clear what the cost of maintaining stratified samples is, the paper provides a good seed for future works in the area.
 
[1] Agarwal, Sameer, et al. “BlinkDB: queries with bounded errors and bounded response times on very large data.” Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013.

About cookies on this site

We use our own and third party cookies to enhance your browsing experience. By using this website you agree to our use of cookies.

Privacy Settings saved!
About cookies on this site

When you visit any web site, it may store or retrieve information on your browser, mostly in the form of cookies. Control your personal Cookie Services here.

These cookies are necessary for the website to function and cannot be switched off in our systems.

In order to use this website we use the following technically required cookies
  • wordpress_test_cookie
  • wordpress_logged_in_
  • wordpress_sec

Decline all Services
Accept all Services
X