{"id":11361,"date":"2018-06-05T11:01:07","date_gmt":"2018-06-05T11:01:07","guid":{"rendered":"http:\/\/stblog.lunaeme.com\/?p=11361"},"modified":"2023-09-20T13:02:49","modified_gmt":"2023-09-20T13:02:49","slug":"statistical-comparison-of-machine-learning-algorithms-part-2","status":"publish","type":"post","link":"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/","title":{"rendered":"Statistical Comparison of Machine Learning Algorithms (Part 2)"},"content":{"rendered":"<p>This is the second (and last) part of the series dealing with the formal comparison of Machine Learning (ML) algorithms from a statistical point of view. In this post, we examine how statistical tests are applied to performance data of ML algorithms.<\/p>\n<p><!--more--><\/p>\n<h2>Application of statistical tests to algorithm performance data<\/h2>\n<p>Let\u2019s assume that we have performance data (for instance, the AUC or the F-measure) of multiple algorithms (instead of medicines) on multiple datasets (instead of patients), like in Table 1 (reproduced from [3]). The column names represent four different ML algorithms, while the rows are datasets on which they were tested. In research articles, it is common to use well-known datasets (most of them with real-world data collected in real studies) from open databases, such as the <a href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets.html\" target=\"_blank\" rel=\"noopener noreferrer\">UCI ML Repository<\/a>. Parentheses indicate the rank per dataset, needed in Friedman.<\/p>\n<p><figure id=\"attachment_11368\" aria-describedby=\"caption-attachment-11368\" style=\"width: 1268px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11368 size-full\" src=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2018\/06\/Foto-1.jpg\" alt=\"Graph\" width=\"1268\" height=\"650\"><figcaption id=\"caption-attachment-11368\" class=\"wp-caption-text\">Table 1: average accuracy on test examples of four classification algorithms [3]<\/figcaption><\/figure>Two very important points arise here. First of all, how is each datum obtained and what exactly does it represent? Although this point deserves a long discussion, for our case each number is an average of the performance (in this case, classification accuracy) of the algorithm in different test folds (for instance) of a dataset, as done in a<a href=\"https:\/\/en.wikipedia.org\/wiki\/Cross-validation_(statistics)\" target=\"_blank\" rel=\"noopener noreferrer\"> Cross Validation<\/a> scheme. Of course, in order for the comparison to be fair, this average must be made exactly under the same conditions for all algorithms within the same dataset (i.e. the folds of the cross-validation in a dataset are exactly the same for all algorithms).<\/p>\n<h2>Friedman\u2019s two-way (repeated measures) analysis of variance by ranks<\/h2>\n<p>Secondly, we are applying the algorithms to exactly the same dataset (the latter are the <em>experimental unit<\/em> here), so the results are <em>paired<\/em> per dataset (per row of the table). The statistical test that is commonly used in this case is called <em>Friedman\u2019s two-way analysis of variance by ranks<\/em> [see 3, 4]. Friedman\u2019s test statistic is:<\/p>\n<p style=\"text-align: right;\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11371 aligncenter\" src=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2018\/06\/Formula-1.jpg\" alt=\"Formula 1\" width=\"270\" height=\"65\">(1)<\/p>\n<p>A corrected version was later proposed by Iman and Davenport:<\/p>\n<p style=\"text-align: right;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-11389\" src=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2018\/06\/formula-2.jpg\" alt=\"\" width=\"202\" height=\"54\">&nbsp;(2)<\/p>\n<p>In Eq. (1), the test statistic follows a Chi-square distribution with <em>k<\/em>-1 degrees of freedom under the null hypothesis when <em>n<\/em> and <em>k<\/em> are big enough (<em>n<\/em> &gt; 10, <em>k<\/em> &gt; 5), as stated in [3], where <em>n<\/em> is the number of datasets and <em>k<\/em> is the number of algorithms being compared. R<sub>j<\/sub> stands for the average rank of algorithm <em>j<\/em> (in Table 1, R<sub>1<\/sub> = 1.771, R<sub>2<\/sub> = 2.479, R<sub>3<\/sub> = 2.479, R<sub>4<\/sub> = 3.271). In Eq. (2), the test statistic follows an F distribution with k-1 and (k-1)(n-1) degrees of freedom (in our case, 3 and 69) under the null hypothesis of equal performance of all algorithms. If you do the maths, you will find that X<sup>2<\/sup><sub>F<\/sub> = 16.225 and F<sub>F<\/sub> = 6.691, the latter having a very low p-value according to an F<sub>3<\/sub>,<sub>69<\/sub> distribution. Hence we reject the null hypothesis which states that all algorithms have the same performance, and conclude that at least one of them behaves differently.<\/p>\n<h2>Friedman\u2019s aligned ranks test and Quade\u2019s test<\/h2>\n<p>As stated in [3], Friedman\u2019s test ignores a lot of information, and since the a rank is built per dataset, it is not possible to compare them across multiple datasets. A variant known as <em>Friedman\u2019s aligned ranks test<\/em> first computes the average performance of all algorithms in a dataset, and subtracts this value from the individual performances to see which algorithms are above or below the average performance at each dataset. A single global rank is built according to such differences, which are called <em>aligned ranks<\/em> (Table 2). According to [3], when using the aligned ranks \u201c&#8230;<em>the ranking scheme is the same as that employed by a multiple comparison procedure which employs independent samples<\/em>\u201d. The Friedman aligned ranks test statistic is:<\/p>\n<p style=\"text-align: right;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-11374\" src=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2018\/06\/Formula-3.jpg\" alt=\"\" width=\"393\" height=\"72\">&nbsp;(3)<\/p>\n<p>where R\u0302<sub>i.<\/sub><sup>2<\/sup> is equal to the rank total of the i-th data set and R\u0302<sub>.j<\/sub><sup>2<\/sup> is the rank total of the<em> j<\/em>-th algorithm. Under the null hypothesis of equal performance of all algorithms, T follows a chi-square distribution with k-1 degrees of freedom.<\/p>\n<p><figure id=\"attachment_11376\" aria-describedby=\"caption-attachment-11376\" style=\"width: 1269px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11376 size-full\" src=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2018\/06\/im-2.jpg\" alt=\"Table 2\" width=\"1269\" height=\"673\"><figcaption id=\"caption-attachment-11376\" class=\"wp-caption-text\">Table 2. Aligned observations, and corresponding global rank (parentheses) from [3]<\/figcaption><\/figure>There is another, similar test known as Quade\u2019s test which assigns weights to the datasets to account for their different difficulty. In this way, the differences in performance achieved in some datasets are considered more informative than others. Please refer to [5] for further details.<\/p>\n<p>When Friedman\u2019s test is able to reject the null hypothesis, we must go ahead and check which algorithm is better or worse than each other by doing pairwise statistical comparisons. Since we are going to compare two samples each time (one vs one for all combinations), we use a statistical test for two samples and apply it repeatedly for every combination of two samples &#8211; this is known as conducting a post-hoc test.<\/p>\n<h2>Post-hoc analysis after Friedman\u2019s test, and the concept of p-value<\/h2>\n<p>The problem of comparing two paired populations that do not meet parametric assumptions can be solved in various ways. One can apply a conventional non-parametric two-sample test for paired samples such as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Wilcoxon_signed-rank_test\" target=\"_blank\" rel=\"noopener noreferrer\">Wilcoxon\u2019s signed rank test<\/a> to every comparison of two samples, for instance. However, if either Quade\u2019s test or any variant of Friedman\u2019s test were applied previously, one can re-use the computations to obtain a statistic that can be used for pairwise comparisons in a post-hoc test. We provide below the expressions to compute such statistic after Friedman\u2019s test when comparing algorithm i vs algorithm j. The reader interested in the expression of a post-hoc test after Quade\u2019s test can find it in [3].<\/p>\n<figure id=\"attachment_11378\" aria-describedby=\"caption-attachment-11378\" style=\"width: 221px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11378\" src=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2018\/06\/Formula-4.jpg\" alt=\"\" width=\"221\" height=\"61\"><figcaption id=\"caption-attachment-11378\" class=\"wp-caption-text\">(a) Post-hoc statistic for Friedman\u2019s test<\/figcaption><\/figure>\n<figure id=\"attachment_11379\" aria-describedby=\"caption-attachment-11379\" style=\"width: 221px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11379\" src=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2018\/06\/formula-5.jpg\" alt=\"\" width=\"221\" height=\"57\"><figcaption id=\"caption-attachment-11379\" class=\"wp-caption-text\">(b) Post-hoc statistic for Friedman\u2019s Aligned Ranks test<\/figcaption><\/figure>\n<p>In both cases, the <em>z<\/em> statistic follows a N(0, 1), i.e. a normal distribution with mean 0 and variance 1 (called the <em>standard normal distribution<\/em>) when the null hypothesis is true (a.k.a <em>under the null hypothesis<\/em>).<\/p>\n<p>And, although we have already mentioned it in the result of Friedman\u2019s test, we have to stop here and discuss the concept of a p-value. In a statistical test, we build a statistic (which is basically an expression involving our data), which is actually a random variable that is known to follow certain probability distribution when the null hypothesis of the test (i.e. the hypothesis stating the opposite of what we want to prove) is true. In the post-hoc analysis example, the null hypothesis states that both samples come from the same population, and we are interested in rejecting that hypothesis because it is not compatible with our data.<\/p>\n<p>Since the value of our statistic in our problem (i.e. when the expression is evaluated with our specific data) can be considered as a realization of the random variable constituted by the statistic, we can check how likely it is to get such value according to the probability distribution of our statistic under the null hypothesis. If it is extremely unlikely to get such value when we sample from the statistical distribution followed by the statistic under the null hypothesis, then we conclude that \u201cwe were wrong in our initial assumption and therefore the statistic does not follow such distribution\u201d. In practice, this is means that our null hypothesis does not hold (it is false).<\/p>\n<p>A p-value is the probability of obtaining a value that is equal to or more extreme than our observed value when sampling from the probability distribution followed by our statistic under the null hypothesis. Formally, it is the Pr(our statistic &gt;= our observed value). When should we consider the p-value low enough to reject the null hypothesis? A general threshold of 0.05 is established so that the null hypothesis happens to be true only 5 % of the times that we end up rejecting it by mistake. It is also very common to pre-compute the value of the statistic at the threshold point of 0.05 because under the null hypothesis, any value greater than such limit has a probability of less than 0.05 of appearing.<\/p>\n<figure id=\"attachment_11380\" aria-describedby=\"caption-attachment-11380\" style=\"width: 571px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-11380 \" src=\"http:\/\/blog.stratio.com\/wp-content\/uploads\/2018\/06\/Im-3.jpg\" alt=\"\" width=\"571\" height=\"418\"><figcaption id=\"caption-attachment-11380\" class=\"wp-caption-text\">Figure 1. Graphical depiction of the concept of p-value as the probability that the test statistic (as a random variable) is greater or equal to the observed value.<\/figcaption><\/figure>\n<h2>P-value adjustment for multiple pairwise comparisons<\/h2>\n<p>When we set a threshold of 0.05 to consider the p-value as low enough, we admit that there is a 5 % chance of incorrectly rejecting the null hypothesis when it is actually true (this is known as <em>committing a<a href=\"https:\/\/en.wikipedia.org\/wiki\/Type_I_and_type_II_errors#Type_I_error\" target=\"_blank\" rel=\"noopener noreferrer\"> type I error<\/a><\/em>). But if we do multiple pairwise comparisons and we want to have 5 % chance of making a mistake globally, then we must adjust the p-value for multiple comparisons. This must be done whenever we are conducting multiple comparisons, no matter the statistical test where the p-values came from. Basically there are two approaches for this: adjusting the p-value of each comparison, or leaving the p-values unchanged but adjusting the significance threshold to be much less than 0.05. The exact procedures to achieve this are beyond the scope of this article, but the interested reader can search for methods like <em>Hochberg<\/em> (the most common), <em>Holm<\/em>, and <em>Bonferroni<\/em> among others (again, see [3] for more details).<\/p>\n<p>What we must keep in mind is that any problem in which we want to do pairwise statistical comparisons and get an ordering always needs p-value adjustment.<\/p>\n<p>With this, we can determine, from a statistical point of view, which algorithm works better or worse than any other, and our conclusions will not be subjective but supported by the strength of inferential statistics.<\/p>\n<h2>References<\/h2>\n<p>[1] Dem\u0161ar, J. <a href=\"http:\/\/www.jmlr.org\/papers\/volume7\/demsar06a\/demsar06a.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Statistical comparisons of classifiers over multiple data sets.<\/a> Journal of Machine Learning Research 7:1-30 (2006).<\/p>\n<p>[2] Garc\u00eda, S., and Herrera, F. <a href=\"http:\/\/www.jmlr.org\/papers\/volume9\/garcia08a\/garcia08a.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons<\/a>. Journal of Machine Learning Research 9:2677-2694 (2008).<\/p>\n<p>[3] Garc\u00eda, S., Fern\u00e1ndez, A., Luengo, J., and Herrera, F. <a href=\"http:\/\/sci2s.ugr.es\/sites\/default\/files\/files\/TematicWebSites\/sicidm\/2010-Garcia-INS.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power<\/a>. Information Sciences 180:2044\u20132064 (2010).<\/p>\n<p>[4] Daniel, W. W. &#8220;Friedman two-way analysis of variance by ranks&#8221;. Applied Nonparametric Statistics (2nd ed.). Boston: PWS-Kent. pp. 262\u201374. ISBN 0-534-91976-6 (1990).<\/p>\n<p>[5] Quade, D. <a href=\"https:\/\/www.jstor.org\/stable\/2286991?seq=1#page_scan_tab_contents\" target=\"_blank\" rel=\"noopener noreferrer\">Using weighted rankings in the analysis of complete blocks with additive block effects<\/a>, Journal of the American Statistical Association 74:680\u2013683 (1979).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is the second (and last) part of the series dealing with the formal comparison of Machine Learning (ML) algorithms from a statistical point of view. In this post, we examine how statistical tests are applied to performance data of ML algorithms.<\/p>\n","protected":false},"author":2,"featured_media":13557,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[686],"tags":[297],"ppma_author":[794],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v22.9 (Yoast SEO v22.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Statistical Comparison - Machine Learning<\/title>\n<meta name=\"description\" content=\"This is the second part of the series dealing with the formal comparison of Machine Learning algorithms from a statistical point of view. In this post, we examine how statistical tests are applied to performance data of ML algorithms.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Statistical Comparison of Machine Learning Algorithms (Part 2)\" \/>\n<meta property=\"og:description\" content=\"This is the second part of the series dealing with the formal comparison of Machine Learning algorithms from a statistical point of view. In this post, we examine how statistical tests are applied to performance data of ML algorithms.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/\" \/>\n<meta property=\"og:site_name\" content=\"Stratio\" \/>\n<meta property=\"article:published_time\" content=\"2018-06-05T11:01:07+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-09-20T13:02:49+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2018\/06\/Statistical-Comparison-of-Machine-Learning-Algorithms-Part-2.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1300\" \/>\n\t<meta property=\"og:image:height\" content=\"820\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@stratiobd\" \/>\n<meta name=\"twitter:site\" content=\"@stratiobd\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/\"},\"author\":{\"name\":\"admin\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/af4f5fbbeb95bd7d55f79d9a677e615d\"},\"headline\":\"Statistical Comparison of Machine Learning Algorithms (Part 2)\",\"datePublished\":\"2018-06-05T11:01:07+00:00\",\"dateModified\":\"2023-09-20T13:02:49+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/\"},\"wordCount\":1718,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2018\/06\/Statistical-Comparison-of-Machine-Learning-Algorithms-Part-2.jpg\",\"keywords\":[\"Algorithms\"],\"articleSection\":[\"Product\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/\",\"url\":\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/\",\"name\":\"Statistical Comparison - Machine Learning\",\"isPartOf\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2018\/06\/Statistical-Comparison-of-Machine-Learning-Algorithms-Part-2.jpg\",\"datePublished\":\"2018-06-05T11:01:07+00:00\",\"dateModified\":\"2023-09-20T13:02:49+00:00\",\"description\":\"This is the second part of the series dealing with the formal comparison of Machine Learning algorithms from a statistical point of view. In this post, we examine how statistical tests are applied to performance data of ML algorithms.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#primaryimage\",\"url\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2018\/06\/Statistical-Comparison-of-Machine-Learning-Algorithms-Part-2.jpg\",\"contentUrl\":\"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2018\/06\/Statistical-Comparison-of-Machine-Learning-Algorithms-Part-2.jpg\",\"width\":1300,\"height\":820},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.stratio.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Statistical Comparison of Machine Learning Algorithms (Part 2)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#website\",\"url\":\"https:\/\/www.stratio.com\/blog\/\",\"name\":\"Stratio Blog\",\"description\":\"Corporate blog\",\"publisher\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.stratio.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#organization\",\"name\":\"Stratio\",\"url\":\"https:\/\/www.stratio.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png\",\"contentUrl\":\"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png\",\"width\":260,\"height\":55,\"caption\":\"Stratio\"},\"image\":{\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/stratiobd\",\"https:\/\/es.linkedin.com\/company\/stratiobd\",\"https:\/\/www.youtube.com\/c\/StratioBD\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/af4f5fbbeb95bd7d55f79d9a677e615d\",\"name\":\"admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/image\/589aaf4b404b1fe099b09564062c4563\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9b181ae4395243dccaf1c3e3a4749d81?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/9b181ae4395243dccaf1c3e3a4749d81?s=96&d=mm&r=g\",\"caption\":\"admin\"}}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Statistical Comparison - Machine Learning","description":"This is the second part of the series dealing with the formal comparison of Machine Learning algorithms from a statistical point of view. In this post, we examine how statistical tests are applied to performance data of ML algorithms.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/","og_locale":"en_US","og_type":"article","og_title":"Statistical Comparison of Machine Learning Algorithms (Part 2)","og_description":"This is the second part of the series dealing with the formal comparison of Machine Learning algorithms from a statistical point of view. In this post, we examine how statistical tests are applied to performance data of ML algorithms.","og_url":"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/","og_site_name":"Stratio","article_published_time":"2018-06-05T11:01:07+00:00","article_modified_time":"2023-09-20T13:02:49+00:00","og_image":[{"width":1300,"height":820,"url":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2018\/06\/Statistical-Comparison-of-Machine-Learning-Algorithms-Part-2.jpg","type":"image\/jpeg"}],"author":"admin","twitter_card":"summary_large_image","twitter_creator":"@stratiobd","twitter_site":"@stratiobd","twitter_misc":{"Written by":"admin","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#article","isPartOf":{"@id":"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/"},"author":{"name":"admin","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/af4f5fbbeb95bd7d55f79d9a677e615d"},"headline":"Statistical Comparison of Machine Learning Algorithms (Part 2)","datePublished":"2018-06-05T11:01:07+00:00","dateModified":"2023-09-20T13:02:49+00:00","mainEntityOfPage":{"@id":"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/"},"wordCount":1718,"commentCount":0,"publisher":{"@id":"https:\/\/www.stratio.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#primaryimage"},"thumbnailUrl":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2018\/06\/Statistical-Comparison-of-Machine-Learning-Algorithms-Part-2.jpg","keywords":["Algorithms"],"articleSection":["Product"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/","url":"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/","name":"Statistical Comparison - Machine Learning","isPartOf":{"@id":"https:\/\/www.stratio.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#primaryimage"},"image":{"@id":"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#primaryimage"},"thumbnailUrl":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2018\/06\/Statistical-Comparison-of-Machine-Learning-Algorithms-Part-2.jpg","datePublished":"2018-06-05T11:01:07+00:00","dateModified":"2023-09-20T13:02:49+00:00","description":"This is the second part of the series dealing with the formal comparison of Machine Learning algorithms from a statistical point of view. In this post, we examine how statistical tests are applied to performance data of ML algorithms.","breadcrumb":{"@id":"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#primaryimage","url":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2018\/06\/Statistical-Comparison-of-Machine-Learning-Algorithms-Part-2.jpg","contentUrl":"https:\/\/www.stratio.com\/blog\/wp-content\/uploads\/2018\/06\/Statistical-Comparison-of-Machine-Learning-Algorithms-Part-2.jpg","width":1300,"height":820},{"@type":"BreadcrumbList","@id":"https:\/\/www.stratio.com\/blog\/statistical-comparison-of-machine-learning-algorithms-part-2\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.stratio.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Statistical Comparison of Machine Learning Algorithms (Part 2)"}]},{"@type":"WebSite","@id":"https:\/\/www.stratio.com\/blog\/#website","url":"https:\/\/www.stratio.com\/blog\/","name":"Stratio Blog","description":"Corporate blog","publisher":{"@id":"https:\/\/www.stratio.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.stratio.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.stratio.com\/blog\/#organization","name":"Stratio","url":"https:\/\/www.stratio.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png","contentUrl":"https:\/\/stratio.com\/blog\/wp-content\/uploads\/2020\/06\/stratio-web-logo-1.png","width":260,"height":55,"caption":"Stratio"},"image":{"@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/stratiobd","https:\/\/es.linkedin.com\/company\/stratiobd","https:\/\/www.youtube.com\/c\/StratioBD"]},{"@type":"Person","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/af4f5fbbeb95bd7d55f79d9a677e615d","name":"admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.stratio.com\/blog\/#\/schema\/person\/image\/589aaf4b404b1fe099b09564062c4563","url":"https:\/\/secure.gravatar.com\/avatar\/9b181ae4395243dccaf1c3e3a4749d81?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9b181ae4395243dccaf1c3e3a4749d81?s=96&d=mm&r=g","caption":"admin"}}]}},"authors":[{"term_id":794,"user_id":2,"is_guest":0,"slug":"admin","display_name":"admin","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/9b181ae4395243dccaf1c3e3a4749d81?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts\/11361"}],"collection":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/comments?post=11361"}],"version-history":[{"count":34,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts\/11361\/revisions"}],"predecessor-version":[{"id":13560,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/posts\/11361\/revisions\/13560"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/media\/13557"}],"wp:attachment":[{"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/media?parent=11361"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/categories?post=11361"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/tags?post=11361"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.stratio.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=11361"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}