Skip to content

Database

Database related

A collection of database functions and operations. (Currently not maintained)

Database Functions

MySQL functions

MySQL functions HorseIR builtins Description
date_add date_add Returns the new date after given an interval
date_sub date_sub Returns the difference in days between two date values
NOW time_stamp 2014-11-22 12:45:34
CURDATE 2014-11-22
CURTIME 12:45:34

See more about MySQL reference.

Important Database Operations

A couple of important database operations are listed as follows.

Join

A join may have various kinds of joins which can be implemented with different efficient algorithms. We consider the following joins which are included in our implementation. By default, a typical hash join algorithm is implemented.

SQL joins (StackOverflow)

  • Nature join
  • Inner join (Equi-join)
  • Outer join
  • Outer join less
  • Left join
  • Left join less
  • Right join
  • Right join less

Order

There are a couple of built-in functions designed for handling ordering problem. If only one order condition is specified, either asc or desc is used. Otherwise, a more general function sort is used for multiple-column sorts.

For example, ORDER BY salary DESC can be represented as follows.

// t0 is a partial or full set of the salary column
t1:i64 = desc(t0);

A list of integers (indices) is returned and saved into t1;

Group by

This clause aggregates items with the same value.

For example, GROUP BY lastname

// t0 is a partial or full set of the lastname column
t1:list<i64> = @group_by(t0);

The returned t1 is a list of lists. The internal list has integers (indices) which point to their original position in the list t0.

Limit / Top

Limit (or top) is useful in selecting a limited amount of items out of a long list.

For example, LIMIT 10 takes the first 10 elements.

// t0 is a partial or full set of the column
// @take is a primitive function
t1:? = @take(t0, 10:i32);

Another similar function, limit_range for range queries, can select items within a range (m,n).

// t0 is a partial or full set of the column
t1:i64 = @substract(n,m);  // range size
t2:i64 = @range(t0);       // iota(t0)
t3:i64 = @add(t2, m);      // addition
t4:?   = @index(t0, t3);   // indexing

Implementing Database Operations Using Array Operations

References:

  • Zhou, Jingren and Kenneth A. Ross. “Implementing database operations using SIMD instructions.” SIGMOD Conference (2002).
  • Govindaraju, Naga K., Brandon Lloyd, Wei Wang, Ming C. Lin and Dinesh Manocha. “Fast computation of database operations using graphics processors.” SIGGRAPH Courses (2004).

Scan-like Operators

Problem description:

Supposedly, we have two vectors x {x1,x2,...,xn} and y {y1,y2,...,ym}, where n and m are the number of elements in x and y respectively. A match operation is to find out the same element y[i] in x. There are a couple of senarios we should consider as follows.

Return the first match

The values of x elements should be distinct.

t0:list<i32> = index_of(x,y);

For example,

   x:i64 = 25 36 15
   y:i64 = 36 17 25
   t0:list<i64> = @index_of(x,y);
   t1:i64       = @print(t0);
> 1 3 0

We see

  • y[0] (36) is the 2nd element in x (return 1)
  • y[1] (17) is not in x (return 3, the length of x)
  • y[2] (25) is the 1st element in x (return 0)

Return all matches

t0:list<bool> = outer(@eq,x,y);
t1:list<i64>  = each(@where,t0);

For example,

     x:i32         = 25 36 36;
     y:i32         = 36 17 25;
     t0:list<bool> = @outer(@eq,x,y);
(0 1 1;0 0 0;1 0 0)
     t1:list<i64>  = @each(@where, t0);
     t2:i64        = @print(t1);
(1 2; ;0)

We see

  • (0 1 1;0 0 0;1 0 0) is the result of comparing (eq) x and y in an outer-product way
  • each(where, t0) operates where in the each list of t0

Aggregation

Basic operations for aggregation

  • count
  • min and max
  • sum and avg

We provide a general function reduce to aggregate a list T by an aggregation function f. Its syntax is reduce(@f, T).

For example,

   t0:list<i64> = (1 1 1;2 2);
   t1:list<i64> = @reduce(@count, t0);
   t2:i64       = @print(t1);
> (3;2)

Aggregation is a perfect candidate for data parallelism despite it may take different aggregation functions.

Examples for database operations

GROUP BY

Source

Table

CustomerID   : i32
CustomerName : str
ContactName  : str
Address      : str
City         : str
PostalCode   : str
Country      : sym

SQL

SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
ORDER BY COUNT(CustomerID) DESC;

HorseIR

modeul _default{
    c0:dict<sym,i32> = column(Customers:table, `CustomerID:sym);
    c1:dict<sym,str> = column(Customers:table, `CustomerName:sym);
    c2:dict<sym,str> = column(Customers:table, `ContactName:sym);
    c3:dict<sym,str> = column(Customers:table, `Address:sym);
    c4:dict<sym,str> = column(Customers:table, `City:sym);
    c5:dict<sym,str> = column(Customers:table, `PostalCode:sym);
    c6:dict<sym,sym> = column(Customers:table, `Country:sym);

    t0:list<i32> = value(c6);
    t1:list<i32> = unique(t0);
    t2:list<list<i32>,list<i32>> = outer(`equal:fun, t1, t0);
    t3:list<i32> = reduce(`plus:fun, t2);
    t4:list<i32> = reduce(`plus:fun, t2);
    t5:list<i32> = desc(t4);

    c7:dict<sym,i32> = dict(`count_customerid:sym;, t3); //group by
    c8:dict<sym,i32> = c7[t5];
    c9:dict<sym,sym> = c6[t5];

    z0:list<?> = list(c8,c9);
    z:table = createTable(z0);
}

ORDER BY

Source

Note: the same table Customers in GROUP BY.

SQL

SELECT * FROM Customers
ORDER BY Country ASC, CustomerName DESC;

HorseIR

modeul _default{
    c0:dict<sym,i32> = column(Customers:table, `CustomerID:sym);
    c1:dict<sym,str> = column(Customers:table, `CustomerName:sym);
    c2:dict<sym,str> = column(Customers:table, `ContactName:sym);
    c3:dict<sym,str> = column(Customers:table, `Address:sym);
    c4:dict<sym,str> = column(Customers:table, `City:sym);
    c5:dict<sym,str> = column(Customers:table, `PostalCode:sym);
    c6:dict<sym,sym> = column(Customers:table, `Country:sym);

    t0:list<i32> = value(c6);
    t1:list<str> = value(c1);
    t2:list<?>   = list(t0, t1);
    t3:list<bool>= list(1:bool, 0:bool); // ASC:true; DESC: false
    t4:list<i32> = order(t2, t3);

     c7:dict<sym,i32> = c0[t4];
     c8:dict<sym,i32> = c1[t4];
     c9:dict<sym,i32> = c2[t4];
    c10:dict<sym,i32> = c3[t4];
    c11:dict<sym,i32> = c4[t4];
    c12:dict<sym,i32> = c5[t4];
    c13:dict<sym,i32> = c6[t4];

    z0:list<?> = list(c7,c8,c9,c10,c11,c12,c13);
    z:table = createTable(z0);
}

ELI code for order by

    x <- 1 2 3 2 2
    y <- 5 3 7 1 2
    <x,[1.5]y
1 4 5 2 3
    x[1 4 5 2 3]
1 2 2 2 3
    y[1 4 5 2 3]
5 1 2 3 7

MIN and MAX

Definition of MIN and MAX

Table Products

ProductID   : i32
ProductName : str
SupplierID  : i32
CategoryID  : i32
Unit        : str
Price       : f64

SQL

SELECT MIN(Price) AS SmallestPrice
FROM Products;

HorseIR

modeul default{
    import Builtin.*;
    def main():table{
        c0:i32 = column(Products:table, `ProductID:sym);
        c1:str = column(Products:table, `ProductName:sym);
        c2:i32 = column(Products:table, `SupplierID:sym);
        c3:i32 = column(Products:table, `CategoryID:sym);
        c4:str = column(Products:table, `Unit:sym);
        c5:f64 = column(Products:table, `Price:sym);

        t0:f64 = min(c5);

        z0:sym = `SmallestPrice:sym;
        z1:list<f64> = list(t0)
        z:table = @table(z0, z1); 
        return z;
    }
}