Skip to content

Experiments on TPC-H Benchmarks

Note (Updated Frequently!)

This page is maintained to keep track of the translation from SQL queries to HorseIR programs.

TPC-H benchmarks

Query profile data: Database Schema

ID Tables* Pred. Join Aggr. Group Sort Return
1 L 1 0 8 2 2 10
2 P,S,PS,N,R 13 8 1 0 4 9
3 C,O,L 5 2 1 3 2 4
4 O,L 5 1 1 1 1 3
5 C,O,L,S,N,R 9 5 1 1 1 2
6 L 4 0 1 0 0 1
7 S,L,O,C,N 9 5 1 3 3 8
8 P,S,L,O,C,N,R 10 7 1 1 1 5
9 P,S,L,PS,O,N 7 5 1 2 2 6
10 C,O,L,N 6 3 1 7 1 8
11 PS,S,N 6 5 2 1(big) 1 3
12 O,L 6 1 2 1 1 3
13 C,O 2 1(o) 2 2 2 4
14 L,P 3 1 1 0 0 1
15 S,L 2 2 2 2 1 5
16 PS,P,S 6 2 1 3 4 5
17 L,P 4 3 2 0 0 2
18 C,O,L 3 3 1 5 2 7
19 L,P 21 1 1 0 0 1
20 S,N,PS,P,L 9 4 1 0 1 5
21 S,L,O,N 13 5 1 1 2 4
22 C,O 6 2 3 1 1 7

* List of tables (On GitHub)

Short Long Short Long
N Nation R Region
P Part S Supplier
PS PartSupp C Customer
O Orders L Lineitem

Basic classification of queries

  • 0 join : 6,1
  • 1 join : 4,12,13,14,19
  • 2 joins: 3,15,16,22
  • 3 joins: 10,17,18
  • 4 joins: 20
  • 5 joins: 5,7,9,11,21
  • Others : 2,8 (>5)

Translation

Automatic Translation

Manual Translation (Expired)

The translation from SQL to HorseIR can be done in the following steps.

module default{
    import Bultin.*;
    def main() : table{
        // step 0: load table
        // step 1: where clause
        // step 2: group by
        // step 3: select
        // step 4: order by
        // step 5: materialization
        return z;
   }
}

Performance issues

TO-DO List

  • Short floating numbers to integers, e.g. DECIMAL(15,2) to INT
    • DECIMAL(6,2): 6 digits with 2 decimal places, range from 9999.99 to -9999.99 (mysql decimal)

Q1

PROFILE(15, pfnGroup(g3,g2));
...
PROFILE(17, pfnEachRight(s0, t3, g7, pfnIndex)); //l_quantity
... other Each functions

Q3

Porblems

PROFILE(35, pfnEachRight(g5, p8, g3, pfnIndex));  // <--- slow
PROFILE(36, pfnEach(g6, g5, pfnSum));             // <--- slow

Explanation

  • After groupby, the size of each cell has "min = 1, max = 7"
  • However, there are 11620 cells

Q6

Loop fusion

  • Place 1: line [6, 17] return t15 (good)
  • Place 2: line [18,20] return t18 (optional)