McGill University:
School of Computer Science
Winter 1999 Projects for 308-251B

DATA STRUCTURES AND ALGORITHMS

Project #25: SKIP LISTS

Last update: March 2, 1999


History

Skip lists were invented by William Pugh in 1989.  They were proposed by him in 1990 as an alternative to the BST (Binary Search Tree) and other balanced trees.


Definition

A skip list is a probabilistic data structure where elements are kept sorted by key.  It allows quick search, insertions and deletions of elements with simple algorithms.  It is basically a linked list with additional pointers such that intermediate nodes can be skipped.  It uses a random number generator to make some decisions.


Use

Skip Lists are used as an alternative to balanced trees.  A skip list is a practical data structure that gives good results while keeping the implementation simple.


Theory

A linked list is a relatively easy data structure to implement.  It is simple to keep a linked list of n elements sorted.  To perform a search n comparisons are required in the worst case.  Now, suppose a second pointer pointing two nodes ahead is added to every other node, the number of comparisons required goes down to ceil(n/2)+1 (in the worst case).  Adding one more pointer to every fourth node and making them point to the fourth node ahead reduces the number of comparisons to ceil(n/2)+2.   If that strategy is continued so every node with i pointers points to 2i-1 nodes ahead, O(log n) performance is obtained and the number of pointers has only doubled (n + n/2 + n/4 + n/8 + n/16 + ....  = 2n).



 
 

Terminology :
A forward pointer is a pointer that points to a node ahead in the list.
A level i node is a node that has i forward pointers.
 

Skip List : A Probabilistic Data Structure

The nodes of the data structure described above are in the following proportions:
50% of them are level 1 nodes, 25% are level 2 nodes, 12.5% are level 3 nodes, etc.

The number of levels required is log2 n where n is the expected number of elements.

The data structure described above is great for searching but insertions and deletions would be really difficult to implement and inefficient as almost all the pointers must be modified.  Maintaining perfect balancing is time consuming.  This data structure can be made more flexible for simple insert and delete operations.  This can be done quite easily while keeping a good (though not perfect) balancing as follows:

The level of every new node to be inserted is chosen randomly with a probability distribution that keeps approximately the proportions of nodes of level i  as described above.  Probabilistic analysis shows that the average cost of insert, delete and search operations is O(log n).
 
 

A Skip List

 
 

Skip lists operations are O(n) in the worst case.  It happens when all (or almost all) the nodes are given level 1 at insertion.  In that case, the skip list becomes an ordinary linked list.  The Skip List motto is Don't worry, be happy!  as this case is highly unlikely to happen as n increases.  For example, the probability that 10 nodes in a row are at level 1 is 1/1024!

Skip lists may be viewed as  trees where the header is the root, the nodes levels correspond to the levels in a tree and level 1 nodes are the leaves.


Algorithms

The following algorithms were written by William Pugh.
 

Remarks on algorithms
- forward pointers of a level i node are stored in an array called forward indexed from 1 to i
- there is a constant called MaxLevel.  All levels must be smaller than or equal to this constant
- the level of a node is not stored
- the level of a list = maximum {levels of the nodes in the list}
 

Initialization :
An new list is initialized as follows :
1) a node called NIL is created and its key is set to a value greater than the greatest key that could possibly be used in the list (i.e. if the list will contain only keys between 1 and 999, then 1000 may be taken as the key in NIL).  Every level ends with NIL.
2) the level of a new list is 1
3) all forward pointers of the header point to NIL
 

Searching
1) Start at the highest level of the list.
2) Move forward following the pointers at the same level until the next key is greater than the searched key
3) If the current level is not the lowest, go down one level and repeat the search at that level from the current node
4) Stop when the level is 1 and the next key is greater than the searched key
5) If the current key is the searched key return the value of that node.  Otherwise, return a failure.
 

SEARCH(list, searchKey)

  1. x <- list.header
  2. for i <- list.level downto 1
  3.     do    while x.forward[i].key < searchKey
  4.               do    x <- x.forward[i]
  5. x <- x.forward[1]
  6. if x.key = searchKey
  7.     then return x.value
  8.     else return failure




 
 

Insertion / Deletion :
-The insertion or deletion of a node consists mainly in a search followed by pointers update.
-An array update is used to store the last node reached at every level.  It is used when changing the pointers after a node has been inserted or deleted.
-The level of the new inserted node is determined randomly by the function RANDOM-LEVEL.
 

INSERT(list, searchKey, newValue)

  1. x <- list.header
  2. for i <- list.level downto 1
  3.     do    while x.forward[i].key < searchKey
  4.                  do     x <- x.forward[i]
  5.             update[i] <- x
  6. x <- x.forward[1]
  7. if x.key = searchKey
  8.     then    x.value <- newValue
  9.     else    newLevel <- RANDOM-LEVEL()
  10.                if newLevel > list.level
  11.                     then    for i <- list.level + 1 to newLevel
  12.                                     do    update[i] <- list.header
  13.                                 list.level <- newLevel
  14.                 x <- MAKE-NODE(newLevel, searchKey, newValue)
  15.                 for i <- 1 to newLevel
  16.                     do     x.forward[i] <- update[i].forward[i]
  17.                              update[i].forward[i] <- x



 
 

DELETE(list, searchKey)

  1. x <- list.header
  2. for i <- list.level downto 1
  3.     do    while x.forward[i].key < searchKey
  4.                  do     x <- x.forward[i]
  5.             update[i] <- x
  6. x <- x.forward[1]
  7. if x.key = searchKey
  8.     then    for i <- 1 to list.level
  9.                     do     if update[i].forward[i] <> x
  10.                                     then break
  11.                             update[i].forward[i] <- x.forward[i]
  12.                 FREE(x)
  13.                 while list.level > 1 and list.header.forward[list.level] = NIL
  14.                     do list.level <- list.level - 1



 
 

RANDOM-LEVEL()

  1. newLevel <- 1
  2. while RANDOM() < p
  3.     do     newLevel <- newLevel + 1
  4. return MIN(newLevel, MaxLevel)


RandomLevel
This last algorithm deserves some explanations.

-The function RANDOM() returns a number between 0 and 1.0.
-p is a constant between 0 and 1.0 (suppose p = 0.5).

RandomLevel works like flipping a coin.  Say head is a win and tail is a lost.  The electronic coin is flipped until a tail is obtained.  Every time it is head the level is increased by 1 and the coin is flipped again.

Note : if p = 1/4, there is an average of 1.33 pointers per node.  This saves space without reducing significally the search time.


Benefits


The BST is efficient but may become easily unbalanced after several insertions and deletions.  Balanced trees (2-3 tree, red-black trees, ...) are guaranteed to remain balanced and thus does the basic operations in O(log n) in the worst case.  Theoretically, they are efficient but their implementation is complicated.  On the other hand skip lists are easier to implement.  The algorithms for insertions and deletions are simpler and faster.  They do not guarantee O(log n) performance but they do have an  O(log n) performance in the average case (for insert, delete, search) and the probability of a high deviation from the average is quite high.  Therefore, a very bad performance (O(n)) is extremely unlikely and its probability decreases exponentially as n increases.  For most applications they are as efficient as balanced trees structures (see timings table below).  They are also space efficient as no balance information needs to be stored in the nodes and they can work well with an average of only 1 1/3 pointers per node.  Unlike the BST, the performance of a skip list does not depend in the order of insertion.
 

Their implementation simplicity makes them a good alternative to BST and other tree structures.
 

Timings of Implementations of Different Algorithms (W. Pugh, Communications of the ACM, June 1990, p. 675.)
Implementation Search Time Insertion Time Deletion Time
Skip lists
0.051 msec (1.0) 0.065 msec (1.0) 0.059 msec (1.0)
non-recursive AVL trees
0.046 msec (0.91) 0.10 msec (1.55) 0.085 msec (1.46)
recursive 2-3 trees
0.054 msec (1.05) 0.21 msec(3.2) 0.21 msec (3.65)
Self-adjusting trees :
top-down splaying
0.15 msec (3.0) 0.16 msec (2.5) 0.18 msec (3.1)
bottom-up splaying
0.49 msec (9.6) 0.51 msec (7.8) 0.53 msec (9.0)


Links to related material


 
The University of Texas at Austin
CS328 : Abstract Data Types
Lecture notes on skip lists (postscript file).
Sorting and Searching Algorithms : A Cookbook
by Thomas Niemann
Theory and implementations of skip lists in C.  Click here to go directly to the skip lists page.
FTP site : ftp://ftp.cs.umd.edu/pub/skipLists A FTP site where the article : Skip Lists : A Probabilistic Alternative to Balanced Trees by William Pugh and other papers on skip lists can be downloaded.  Implementations of skip lists in C and in Pascal are also available.
Roger Cattin's Home Page Theory and algorithms.  Some links are broken, the Java implementation is available here.
San Diego State University CS 660 : Combinatorial Algorithms - Skip Lists Lecture notes on skip lists.


References

  1. Clifford A. Shaffer. A Practical Introduction to Data Structures and Algorithm Analysis -- Java Edition. Prentice-Hall, Inc., 1998: 365-371.
  2. William Pugh. Skip Lists : A Probabillistic Alternative to Balanced Trees. Communications of the ACM, v. 33, June 1990: 668-676.
  3. Sandeep Sen. Some observations on skip lists. Information Processing Letters, v. 39, n 4, Aug 30 1991: 173-176.
  4. B. Schneier. Skip lists. Dr. Dobb's Journal, v. 19, Jan. 1994: 50, 52.
  5. Mark A. Weiss. Data Structures and Algorithm Analysis. The Benjamin/Cummings Publishing Company, Inc., 1995.
  6. Rajeev Motwani, Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press, 1995


Web page creator

This web page was created by David Bélanger (dbelan2(à)cs.mcgill.ca). The figures were drawn with xfig.  The Grab utility of Xview was used to capture the figures and save them as gif files.


Last updated March 2, 1999 by David Bélanger. Copyright © 1999, David Bélanger.