Experiments on TPC-H Benchmarks¶
Note (Updated Frequently!)
This page is maintained to keep track of the translation from SQL queries to HorseIR programs.
TPC-H benchmarks¶
Query profile data: Database Schema
ID | Tables* | Pred. | Join | Aggr. | Group | Sort | Return |
---|---|---|---|---|---|---|---|
1 | L | 1 | 0 | 8 | 2 | 2 | 10 |
2 | P,S,PS,N,R | 13 | 8 | 1 | 0 | 4 | 9 |
3 | C,O,L | 5 | 2 | 1 | 3 | 2 | 4 |
4 | O,L | 5 | 1 | 1 | 1 | 1 | 3 |
5 | C,O,L,S,N,R | 9 | 5 | 1 | 1 | 1 | 2 |
6 | L | 4 | 0 | 1 | 0 | 0 | 1 |
7 | S,L,O,C,N | 9 | 5 | 1 | 3 | 3 | 8 |
8 | P,S,L,O,C,N,R | 10 | 7 | 1 | 1 | 1 | 5 |
9 | P,S,L,PS,O,N | 7 | 5 | 1 | 2 | 2 | 6 |
10 | C,O,L,N | 6 | 3 | 1 | 7 | 1 | 8 |
11 | PS,S,N | 6 | 5 | 2 | 1(big) | 1 | 3 |
12 | O,L | 6 | 1 | 2 | 1 | 1 | 3 |
13 | C,O | 2 | 1(o) | 2 | 2 | 2 | 4 |
14 | L,P | 3 | 1 | 1 | 0 | 0 | 1 |
15 | S,L | 2 | 2 | 2 | 2 | 1 | 5 |
16 | PS,P,S | 6 | 2 | 1 | 3 | 4 | 5 |
17 | L,P | 4 | 3 | 2 | 0 | 0 | 2 |
18 | C,O,L | 3 | 3 | 1 | 5 | 2 | 7 |
19 | L,P | 21 | 1 | 1 | 0 | 0 | 1 |
20 | S,N,PS,P,L | 9 | 4 | 1 | 0 | 1 | 5 |
21 | S,L,O,N | 13 | 5 | 1 | 1 | 2 | 4 |
22 | C,O | 6 | 2 | 3 | 1 | 1 | 7 |
* List of tables (On GitHub)
Short | Long | Short | Long |
---|---|---|---|
N | Nation | R | Region |
P | Part | S | Supplier |
PS | PartSupp | C | Customer |
O | Orders | L | Lineitem |
Basic classification of queries
- 0 join : 6,1
- 1 join : 4,12,13,14,19
- 2 joins: 3,15,16,22
- 3 joins: 10,17,18
- 4 joins: 20
- 5 joins: 5,7,9,11,21
- Others : 2,8 (>5)
Translation¶
Automatic Translation¶
Manual Translation (Expired)¶
The translation from SQL to HorseIR can be done in the following steps.
module default{
import Bultin.*;
def main() : table{
// step 0: load table
// step 1: where clause
// step 2: group by
// step 3: select
// step 4: order by
// step 5: materialization
return z;
}
}
Performance issues¶
TO-DO List
- Short floating numbers to integers, e.g. DECIMAL(15,2) to INT
DECIMAL(6,2)
: 6 digits with 2 decimal places, range from 9999.99 to -9999.99 (mysql decimal)
Q1¶
PROFILE(15, pfnGroup(g3,g2));
...
PROFILE(17, pfnEachRight(s0, t3, g7, pfnIndex)); //l_quantity
... other Each functions
Q3¶
Porblems
PROFILE(35, pfnEachRight(g5, p8, g3, pfnIndex)); // <--- slow
PROFILE(36, pfnEach(g6, g5, pfnSum)); // <--- slow
Explanation
- After groupby, the size of each cell has "min = 1, max = 7"
- However, there are 11620 cells
Q6¶
Loop fusion
- Place 1: line [6, 17] return t15 (good)
- Place 2: line [18,20] return t18 (optional)