Hash table access is now available in DB2. Hash organized tables allow for direct access to a specific row based on a hashing algorithm that is applied to the primary key of the table, without the need for an index. This can greatly improve performance for certain kinds of tables, in that “traditional” access via an index can typically involve up to four bufferpool getpages, and possibly up to four physical I/O’s to disk. With a well designed and well organized hash table, these numbers can ideally be reduced to one bufferpool getpage and zero or one physical I/O.
Hash tables should generally only be considered for tables that are stable or predictable in size and where the data is mostly accessed by its primary key (in particular, the entire primary key, if the PK consists of multiple columns). The primary key data should be diverse enough to minimize the probability of collisions that result from too many rows hashing to the same location. When a collision occurs, subsequently inserted rows will need to be relocated from their targeted hashed location, with a pointer inserted in the target location that points to the new location. The more relocated rows there are in the hash table, the less benefit hash table access will provide as additional getpages and I/O’s may be needed to retrieve data, thus negating the intent of hashing.
Since the nature of hash tables dictates that rows are randomly distributed throughout the physical tablespace, clustering indexes are not allowed to be defined on them, though non-clustered indexes can be. Therefore, hash organization for a table is not a viable choice if sequential processing of large portions of the table is desired.