Wprowadziłem kilka ulepszeń w JOIN
wersja; patrz poniżej.
Głosuję za podejściem JOIN dla szybkości. Oto jak to ustaliłem:
HAVING, wersja 1
mysql> FLUSH STATUS;
mysql> SELECT city
-> FROM us_vch200
-> WHERE state IN ('IL', 'MO', 'PA')
-> GROUP BY city
-> HAVING count(DISTINCT state) >= 3;
+-------------+
| city |
+-------------+
| Springfield |
| Washington |
+-------------+
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_external_lock | 2 |
| Handler_read_first | 1 |
| Handler_read_key | 2 |
| Handler_read_last | 1 |
| Handler_read_next | 4175 | -- full index scan
(etc)
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+
| 1 | SIMPLE | us_vch200 | range | state_city,city_state | city_state | 769 | NULL | 4176 | Using where; Using index for group-by (scanning) |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+
„Extra” wskazuje, że postanowił zająć się GROUP BY
i użyj INDEX(city, state)
mimo że INDEX(state, city)
może mieć sens.
HAVING, wersja 2
Zmiana na INDEX(state, city)
plony:
mysql> FLUSH STATUS;
mysql> SELECT city
-> FROM us_vch200 IGNORE INDEX(city_state)
-> WHERE state IN ('IL', 'MO', 'PA')
-> GROUP BY city
-> HAVING count(DISTINCT state) >= 3;
+-------------+
| city |
+-------------+
| Springfield |
| Washington |
+-------------+
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_external_lock | 2 |
| Handler_read_key | 401 |
| Handler_read_next | 398 |
| Handler_read_rnd | 398 |
(etc)
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+
| 1 | SIMPLE | us_vch200 | range | state_city,city_state | state_city | 2 | NULL | 397 | Using where; Using index; Using filesort |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+
DOŁĄCZ
mysql> SELECT x.city
-> FROM us_vch200 x
-> JOIN us_vch200 y ON y.city= x.city AND y.state = 'MO'
-> JOIN us_vch200 z ON z.city= x.city AND z.state = 'PA'
-> WHERE x.state = 'IL';
+-------------+
| city |
+-------------+
| Springfield |
| Washington |
+-------------+
2 rows in set (0.00 sec)
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_external_lock | 6 |
| Handler_read_key | 86 |
| Handler_read_next | 87 |
(etc)
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+
| 1 | SIMPLE | y | ref | state_city,city_state | state_city | 2 | const | 81 | Using where; Using index |
| 1 | SIMPLE | z | ref | state_city,city_state | state_city | 769 | const,world.y.city | 1 | Using where; Using index |
| 1 | SIMPLE | x | ref | state_city,city_state | state_city | 769 | const,world.y.city | 1 | Using where; Using index |
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+
Tylko INDEX(state, city)
jest potrzebne. Numery obsługi są najmniejsze dla tego sformułowania, więc wnioskuję, że jest to najszybsze.
Zwróć uwagę, jak optymalizator sam podjął decyzję, od której tabeli zacząć, prawdopodobnie z powodu
+-------+----------+
| state | COUNT(*) |
+-------+----------+
| IL | 221 |
| MO | 81 | -- smallest
| PA | 96 |
+-------+----------+
Wnioski
JOIN
(bez niepotrzebnego t
tabeli) jest prawdopodobnie najszybszy. Dodatkowo potrzebny jest ten złożony indeks:INDEX(state, city)
.
Aby przetłumaczyć z powrotem na swój przypadek użycia:
city --> documentid
state --> termid
Zastrzeżenie:YMMV, ponieważ rozkład wartości dla documentid i termid może być zupełnie inny niż w przypadku testowym, którego użyłem.