Cost-based choice between subquery materialization and EXISTS

In a previous post, I had demonstrated how subquery materialization, introduced in MySQL 5.6.5, improves the performance of certain queries, like query Q16 of DBT3. Such improvement was easily explained:
  • Subquery materialization has a high start up cost (it needs to create and fill the temporary table).
  • But afterwards it has fast lookups (temporary table has a hash index, no duplicates, and is in memory).
  • In other words, compared to EXISTS, the first evaluation of the IN predicate is slow (high start up cost) and all following evaluations are fast (just a hash lookup).
  • In the DBT 3 setup, one outer table (named "part") has 200,000 rows, so there are 200,000 evaluations of IN, thus subquery materialization wins over EXISTS because the time it loses in the first evaluation is more than compensated by the many faster following evaluations.
However, if there were only few outer rows, then subquery materialization should logically be slower than EXISTS (the compensation would not happen anymore)... MySQL 5.6.5, by blindly always choosing subquery materialization, takes the risk of making certain queries slower. There needs to be a cost-based choice between the two strategies, to pick the best, depending on the situation! That is what I have implemented in MySQL 5.6.7.

To show it in action, I will use query Q16 again. First I will run it with the normal "part" table which has 200,000 rows. Then I will reduce this table to only 200 rows, and run the query again. Each time, I will run EXPLAIN to see what subquery strategy is chosen by the optimizer. I will also, by tweaking the optimizer_switch variable, force the optimizer to use the other strategy which it didn't like, in order to verify that it is indeed worse.

For brevity, let me jump directly to the results, obtained with a release build of MySQL 5.6.7 on my machine:

Rows in part Optimizer chooses Execution time If I force alternative
200,000 Materialization 550 ms 830 ms
200 EXISTS 1 ms 10 ms

We can see that in both cases the optimizer has made the right choice!

Commentaires

  1. So MySQL 5.6.7 does have this feature after all. Will the docs/changelog be updated to include it?

    It will be interesting compare with MariaDB's implementation. I've made some observations here: http://s.petrunia.net/blog/?p=73. I'm wondering whether these are all the differences, or there is something else...

    RépondreSupprimer
  2. Hello Sergey, the change log is now here:
    http://dev.mysql.com/doc/refman/5.6/en/news-5-6-7.html
    "Performance: Certain instances of subquery materialization etc".
    Thanks for all your detailed observations, I have on my todo to study them after finishing some urgent review work.

    RépondreSupprimer
  3. Hello Sergey. I have queued patches for bug#67511 and for the second problem. Probably they will be included in some 5.6 release. Thanks again for sharing your findings!

    RépondreSupprimer
  4. The price is 7258, source. Additionally, with a blood test, it will be detectable in the bloodstream within seconds of inhalation or ingestion. It's reabsorbed into the blood and broken down very quickly. Finally, there is the saliva drug test, which takes a sample from your oral fluid. Marijuana can enter your saliva through smoking. You could even be exposed to someone else who is smoking. Though the latter is rare, it's something that should be kept in the back of your mind with an upcoming drug test. Weed, specifically, is usually detectable in your oral fluid for the following amounts of time after your last use: • 1-3 days: occasional users • 1-29 days: chronic users Click Here to Detox Your System in 5 Days > The active ingredient in marijuana is a chemical compound called THC, or tetrahydrocannabinol. It enters the body via the bloodstream and is then temporarily stored in different organs, fatty tissues, and other elements of the body. In the kidneys, it can be reabsorbed in the bloodstream. The liver breaks down THC. It has over 80 different metabolites, which linger in the body much longer than THC itself. Drug tests will look for traces of these metabolites in your system. Some of the more robust metabolites include 11-OH-THC and THCCOOH. Eventually, your liver will break down these elements and excrete them via urine and stool. Other than the different methods of drug testing and frequency of use, there are a few other factors that will play a major role when determining how long weed will stay in your system. The three major ones being: • Metabolism • Body mass (BMI) • THC levels in your system The amount of time it takes for THC to leave your body varies from person to person. It will depend on some personal health factors - such as age, body mass, and metabolic rate. This natural timeline will be contingent on the duration of exposure, frequency of exposure, and the overall potency of the marijuana. For the most part, frequent marijuana users will have traces of THC and its metabolites for a longer period than infrequent users. Inconsistent, uncommon use results in a shorter amount of time until your body has flushed the drug out and is completely cleansed.

    RépondreSupprimer

Enregistrer un commentaire

Posts les plus consultés de ce blog

WITH RECURSIVE and MySQL

Faster subqueries with materialization