Representations: Hash Table vs Binary Search Tree (self-balancing)
There are two main efficient data structures used to represent associative
arrays, the hash table and the self-balancing binary search tree. Skip lists
are also an alternative, though relatively new and not as widely used.
Relative advantages and disadvantages include:
* Hash tables have faster average lookup and insertion time (O(1)),
while some kinds of binary search tree have faster worst-case lookup and
insertion time (O(log n) instead of O(n)). Hash tables have seen extensive
use in real time systems, but trees can be useful in high-security real time
systems where untrusted users may deliberately supply information that
triggers worst-case performance in a hash table, although careful design can
remove that issue. Hash tables shine in very large arrays, where O(1)
performance is important. Skip lists have worst-case operation time of O(n),
but average-case of O(log n), with much less insertion and deletion
overhead than balanced binary trees.
* Hash tables can have more compact storage for small value types,
especially when the values are bits.
* There are simple persistent versions of balanced binary trees, which
are especially prominent in functional languages.
* Building a hash table requires a reasonable hash function for the key
type, which can be difficult to write well, while balanced binary trees and
skip lists only require a total ordering on the keys. On the other hand,
with hash tables the data may be cyclically or partially ordered without any
problems.
* Balanced binary trees and skip lists preserve ordering — allowing one
to efficiently iterate over the keys in order or to efficiently locate an
association whose key is nearest to a given value. Hash tables do not
preserve ordering and therefore cannot perform these operations as
efficiently.
* Balanced binary trees can be easily adapted to efficiently assign a
single value to a large ordered range of keys, or to count the number of
keys in an ordered range.