Skip lists were invented by William Pugh in 1989. They were proposed by him in 1990 as an alternative to the BST (Binary Search Tree) and other balanced trees.
A skip list is a probabilistic data structure where elements are kept sorted by key. It allows quick search, insertions and deletions of elements with simple algorithms. It is basically a linked list with additional pointers such that intermediate nodes can be skipped. It uses a random number generator to make some decisions.
Skip Lists are used as an alternative to balanced trees. A skip list is a practical data structure that gives good results while keeping the implementation simple.
A linked list is a relatively easy data structure to implement. It is simple to keep a linked list of n elements sorted. To perform a search n comparisons are required in the worst case. Now, suppose a second pointer pointing two nodes ahead is added to every other node, the number of comparisons required goes down to ceil(n/2)+1 (in the worst case). Adding one more pointer to every fourth node and making them point to the fourth node ahead reduces the number of comparisons to ceil(n/2)+2. If that strategy is continued so every node with i pointers points to 2i-1 nodes ahead, O(log n) performance is obtained and the number of pointers has only doubled (n + n/2 + n/4 + n/8 + n/16 + .... = 2n).
Terminology :
A forward pointer is a pointer that points to a node ahead in the
list.
A level i node is a node that has i forward pointers.
Skip List : A Probabilistic Data Structure
The nodes of the data structure described above are in the following proportions:
50% of them are level 1 nodes, 25% are level 2 nodes, 12.5% are level 3 nodes,
etc.
The number of levels required is log2 n where n is the expected number of elements.
The data structure described above is great for searching but insertions and deletions would be really difficult to implement and inefficient as almost all the pointers must be modified. Maintaining perfect balancing is time consuming. This data structure can be made more flexible for simple insert and delete operations. This can be done quite easily while keeping a good (though not perfect) balancing as follows:
The level of every new node to be inserted is chosen randomly with a probability
distribution that keeps approximately the proportions of nodes of
level i as described above. Probabilistic analysis shows that
the average cost of insert, delete and search operations is
O(log n).
A Skip List
Skip lists operations are O(n) in the worst case. It happens when all (or almost all) the nodes are given level 1 at insertion. In that case, the skip list becomes an ordinary linked list. The Skip List motto is Don't worry, be happy! as this case is highly unlikely to happen as n increases. For example, the probability that 10 nodes in a row are at level 1 is 1/1024!
Skip lists may be viewed as trees where the header is the root, the nodes levels correspond to the levels in a tree and level 1 nodes are the leaves.
The following algorithms were written by William Pugh.
Remarks on algorithms
- forward pointers of a level i node are stored in an array called
forward indexed from 1 to i
- there is a constant called MaxLevel. All levels must be smaller
than or equal to this constant
- the level of a node is not stored
- the level of a list = maximum {levels of the nodes in the list}
Initialization :
An new list is initialized as follows :
1) a node called NIL is created and its key is set to a value greater
than the greatest key that could possibly be used in the list (i.e. if the
list will contain only keys between 1 and 999, then 1000 may be taken as
the key in NIL). Every level ends with NIL.
2) the level of a new list is 1
3) all forward pointers of the header point to NIL
Searching
1) Start at the highest level of the list.
2) Move forward following the pointers at the same level until the next key
is greater than the searched key
3) If the current level is not the lowest, go down one level and repeat the
search at that level from the current node
4) Stop when the level is 1 and the next key is greater than the searched
key
5) If the current key is the searched key return the value of that node.
Otherwise, return a failure.
SEARCH(list, searchKey)
Insertion / Deletion :
-The insertion or deletion of a node consists mainly in a search followed
by pointers update.
-An array update is used to store the last node reached at every
level. It is used when changing the pointers after a node has been
inserted or deleted.
-The level of the new inserted node is determined randomly by the function
RANDOM-LEVEL.
INSERT(list, searchKey, newValue)
DELETE(list, searchKey)
RANDOM-LEVEL()
RandomLevel
This last algorithm deserves some explanations.
-The function RANDOM() returns a number between 0 and 1.0.
-p is a constant between 0 and 1.0 (suppose p = 0.5).
RandomLevel works like flipping a coin. Say head is a win and tail is a lost. The electronic coin is flipped until a tail is obtained. Every time it is head the level is increased by 1 and the coin is flipped again.
Note : if p = 1/4, there is an average of 1.33 pointers per node. This saves space without reducing significally the search time.
The BST is efficient but may become easily unbalanced after several insertions
and deletions. Balanced trees (2-3 tree, red-black trees, ...) are
guaranteed to remain balanced and thus does the basic operations in O(log
n) in the worst case. Theoretically, they are efficient but their
implementation is complicated. On the other hand skip lists are easier
to implement. The algorithms for insertions and deletions are simpler
and faster. They do not guarantee O(log n) performance but they do
have an O(log n) performance in the average case (for insert, delete,
search) and the probability of a high deviation from the average is quite
high. Therefore, a very bad performance (O(n)) is extremely unlikely
and its probability decreases exponentially as n increases. For most
applications they are as efficient as balanced trees structures (see timings
table below). They are also space efficient as no balance information
needs to be stored in the nodes and they can work well with an average of
only 1 1/3 pointers per node. Unlike the BST, the performance of a
skip list does not depend in the order of insertion.
Their implementation simplicity makes them a good alternative to BST and
other tree structures.
Timings of Implementations of Different Algorithms (W. Pugh, Communications of the ACM, June 1990, p. 675.)
| Implementation | Search Time | Insertion Time | Deletion Time |
Skip lists
|
0.051 msec (1.0) | 0.065 msec (1.0) | 0.059 msec (1.0) |
non-recursive AVL trees
|
0.046 msec (0.91) | 0.10 msec (1.55) | 0.085 msec (1.46) |
recursive 2-3 trees
|
0.054 msec (1.05) | 0.21 msec(3.2) | 0.21 msec (3.65) |
| Self-adjusting trees : | |||
top-down splaying
|
0.15 msec (3.0) | 0.16 msec (2.5) | 0.18 msec (3.1) |
bottom-up splaying
|
0.49 msec (9.6) | 0.51 msec (7.8) | 0.53 msec (9.0) |
| The University of
Texas at Austin CS328 : Abstract Data Types |
Lecture notes on skip lists (postscript file). |
| Sorting and Searching
Algorithms : A Cookbook by Thomas Niemann |
Theory and implementations of skip lists in C. Click here to go directly to the skip lists page. |
| FTP site : ftp://ftp.cs.umd.edu/pub/skipLists | A FTP site where the article : Skip Lists : A Probabilistic Alternative to Balanced Trees by William Pugh and other papers on skip lists can be downloaded. Implementations of skip lists in C and in Pascal are also available. |
| Roger Cattin's Home Page | Theory and algorithms. Some links are broken, the Java implementation is available here. |
| San Diego State University CS 660 : Combinatorial Algorithms - Skip Lists | Lecture notes on skip lists. |
This web page was created by David
Bélanger
(dbelan2(à)cs.mcgill.ca). The figures
were drawn with