libdacav 0.9.0
Hash Tables

This module provides a simple hash table implementation. More...

Typedefs

typedef struct dhash dhash_t
 Opaque type for hash table.
typedef uintptr_t(* dhash_cb_t )(const void *key)
 Callback for hashing of a generic object.
typedef struct pair dhash_pair_t
 Opaque structure that contains the <key,value> pair.

Enumerations

enum  dhash_result_t { DHASH_FOUND = 0, DHASH_NOTFOUND = 1 }
 Return values for hash lookup functions. More...

Functions

const void * dhash_key (dhash_pair_t *p)
 Given a dhash_pair_t element, returns the corresponding key.
void * dhash_val (dhash_pair_t *p)
 Given a dhash_pair_t element, returns the corresponding value.
dhash_tdhash_new (unsigned nbuckets, dhash_cb_t hf, dcmp_cb_t cmp, const dcprm_t *key_cprm, const dcprm_t *val_cprm)
 Complete constructor for a hash table.
void dhash_free (dhash_t *htab)
 Destructor for a hash table.
dhash_result_t dhash_insert (dhash_t *htab, const void *key, const void *value)
 Value insertion procedure.
dhash_result_t dhash_search_default (dhash_t *htab, const void *key, void **found, dcreate_cb_t create, void *ctx)
 Search for a value by providing a key.
dhash_result_t dhash_search (dhash_t *htab, const void *key, void **found)
 Search for a value by providing a key.
dhash_result_t dhash_delete (dhash_t *htab, const void *key, void **found)
 Delete the value associated to the given key.
diter_tdhash_iter_new (dhash_t *t)
 Build an iterator for the hash table.
void dhash_iter_free (diter_t *i)
 Destroy the iterator.

Detailed Description

This module provides a simple hash table implementation.

When allocating a new hash table, five parameters are usually needed:

The number of buckets provided as first parameter is a trade-off between performances in time and space required. It should depend on the predicted number of stored element. The more objects are sparse among buckets, the more data retrieving will be optimized. Choosing a prime number will help in reducing hash clashes.

Providing no hash function will result in the library using directly the value of the pointer as hash. Providing no comparison function will result in comparison of stored object addresses and search keys addresses. This can be useful, for instance, if you are storing primitive values and don't want to allocate memory just for them: just use the pointer as it were an integer.

Providing no copy-constructor and destructor for the keys will result in the library directly storing the user-provided keys. This is usually a good idea only if you plan to store primitive types! You may do it also for allocated composite types, but this may result in memory leaks if you overwrite a stored value with dhash_insert(), as only the value (and not the key) will be overwritten, and the key you provided is simply not used. If you are crazy enough to do something like that, please check the return value of dhash_insert() in this case to decide what to do with your local copy of the key.

Allocation and deallocation:

Example of hash table allocation:


dcprm_t cprm_k = {
    .cp = copy_key, // copy-constructor for key
    .rm = free_key  // destructor for key
};

dcprm_t cprm_v = {
    .cp = copy_key, // copy-constructor for value
    .rm = free_key  // destructor for value
};

dhash_t *tab = dhash_new(N_BUCKETS, hash_function, cmp_function,
                         &cprm_k, &cprm_v);
See also:
dhash_cb_t
dcmp_cb_t
dprm_t
dhash_new

Example using integers as keys

In this example we show to use integer values as keys.

A good hashing function for integer (the one used as default hashing function) may be:

static unsigned hash_int (const void *key)
{
    return (int) key;
}

Since the required semantic for the comparison is the same used by strcmp(3), a simple subtraction between two integers behaves correctly:

static int cmp_int (const void *v0, const void *v1)
{
    return (int) v0 - (int) v1;
}

Since integers are primitive types, we don't need to specify how to copy and destroy them. The declaration of such a hash table is so far the following:

dhash_t *tab = dhash_new(N_BUCKETS, hash_int, cmp_int, NULL, NULL);

Actually the shown hash and comparison functions are already defined internally, so in order to obtain the same result, the hash table initialization can be done by passing NULL as second and third argument.

dhash_t *tab = dhash_new(N_BUCKETS, NULL, NULL, NULL, NULL);

Note that this kind of behavior can also be exploited to achieve comparison based on object identity instead of object equality.

Using strings as keys

Suppose you want to map strings to strings with an hash table.

A good hashing function may be the following (which btw is the one standardized by the ELF specification):

static unsigned hash_str (const void *key)
{
    const char *name;
    unsigned h = 0, g;

    while (*name) {
        h = (h << 4) + *name++;
        if ((g = h & 0xf00000000) != 0) {
            h &= g >> 24;
        }
        h &= ~g;
    }
    return h;
}

A good destructor is just free(3), while a good copy-constructor for strings is strcpy(3), except it doesn't work with NULL pointers. We can wrap it, just beware of zero-termination:

static void * copy_str (const void *key)
{
    if (key == NULL) return NULL;
    return strdup((const char *)key);
}

The string comparison function is simply supplied by strcmp(3), however since the signature of strcmp(3) is slightly different from the dcmp_func_t data type, a casting is needed.

The initialization would be the following

dhash_cprm_t cprm = {
    .cp = copy_str,
    .rm = free
};
dhash_t *tab = dhash_new(N_BUCKETS, hash_str, (dcmp_func_t)strcmp, &cprm,
                         &cprm);

Iterators for hash tables:

Because of the particular nature of a hash table, for which every entry is composed by <key,value> couples, the return value of a diter_next() function called on a hash table iterator returns a pointer to an object of type dhash_pair_t

In order to obtain the correct key and the correct value for a dhash_pair_t object, the functions dhash_key() and dhash_val() must be called respectively.

By example, the following procedure will print all the <key, value> couples contained in a hash table and will remove them as long as they are listed:

void print_key_values ()
{
    diter_t *iter = dhash_iter_new(table);
    while (diter_hasnext(iter)) {
        dhash_pair_t *pair = diter_next(iter);
        int k = (int)dhash_key(pair);
        int v = (int)dhash_val(pair);
        printf("key=%d, value=%d\n", k, v);
        dhash_remove(iter);
    }
    dhash_iter_free(iter);
}
Note:
Keys and values yielded through iterators are exactly the ones stored into the hash table. Keys are constant, but values are not, thus if you stored pointers you may modified stored objects. Of course this is not true if you stored primitve types.
See also:
diter_next
dhash_pair_t
dhash_key
dhash_val
Iterators

Typedef Documentation

typedef uintptr_t(* dhash_cb_t)(const void *key)

Callback for hashing of a generic object.

typedef struct pair dhash_pair_t

Opaque structure that contains the <key,value> pair.

See also:
dhash_key is a function for key extraction;
dhash_val is a function for value extraction;
Hash Tables contains a section with further information about iterator semantics for hash tables.
typedef struct dhash dhash_t

Opaque type for hash table.


Enumeration Type Documentation

Return values for hash lookup functions.

See also:
dhash_search;
dhash_delete;
dhash_insert.
Enumerator:
DHASH_FOUND 

Object found;.

DHASH_NOTFOUND 

Object not found.


Function Documentation

dhash_result_t dhash_delete ( dhash_t htab,
const void *  key,
void **  found 
)

Delete the value associated to the given key.

This function behaves exactly as dhash_search() except that, if the object is found, it gets removed.

Parameters:
[in]htabThe hash table to search in;
[in]keyThe key to be searched;
[out]foundA pointer to a location in which the value will be stored; You may pass NULL as found parameter in order to simply delete the entry without having it back.
Return values:
dhash_result_t::DHASH_FOUNDon success;
dhash_result_t::DHASH_NOTFOUNDon lookup failure.
Note:
In case of failure the table remains unchanged.
void dhash_free ( dhash_t htab)

Destructor for a hash table.

Parameters:
[in]htabThe table to destroy;
dhash_result_t dhash_insert ( dhash_t htab,
const void *  key,
const void *  value 
)

Value insertion procedure.

Depending on how the constructor has been parametrized, this function may directly store the given pointers for key and value, or may build an internal copy of them. In the latter case the copies will also be freed when calling dhash_free().

Parameters:
[in]htabThe table in which insertion must be achieved;
[in]keyKey for the inserted value;
[in]valueThe value to be inserted.
Return values:
dhash_result_t::DHASH_FOUNDif there has been a replacement;
dhash_result_t::DHASH_NOTFOUNDif a new object has been added;
void dhash_iter_free ( diter_t i)

Destroy the iterator.

Parameters:
[in]iThe iterator to destroy.
See also:
diter_t.
diter_t* dhash_iter_new ( dhash_t t)

Build an iterator for the hash table.

Parameters:
[in]tThe table on which the iterator must be built.
Returns:
The iterator.
See also:
Hash Tables contains a section with further information about iterator semantics for hash tables.
const void* dhash_key ( dhash_pair_t p)

Given a dhash_pair_t element, returns the corresponding key.

This is used to extract the key from pairs yielded by the iterator. The returned key is internal to the table. As such it cannot be modified.

See also:
Hash Tables contains a section with further information about iterator semantics for hash tables.
Parameters:
[in]pThe <key,value> pair.
Returns:
The key.
dhash_t* dhash_new ( unsigned  nbuckets,
dhash_cb_t  hf,
dcmp_cb_t  cmp,
const dcprm_t key_cprm,
const dcprm_t val_cprm 
)

Complete constructor for a hash table.

Parameters:
[in]nbucketsThe number of buckets of the table (should be somehow proportional to the number of inserted objects). A prime number would be a good choice;
[in]hfThe callback used to retrieve the hash value for the keys. Provide NULL if the key pointer should be directly used as hash;
[in]cmpThe callback used to compare two keys, must behave like the strcmp(3) function. Provide NULL if the key pointer should be directly used as hash;
[in]key_cprmA pointer to a copy-constructor and destructor pair to be used with keys. If NULL is provided, the inserted keys shall be directly stored, and no internal copy for keys shall be allocated.
[in]val_cprmA pointer to a copy-constructor and destructor pair to be used with values. If NULL is provided, the inserted keys shall be directly stored, and no internal copy for values shall be allocated.
See also:
dhash_new_simple;
dcprm_t.
Returns:
A pointer to the allocated hash table.
dhash_result_t dhash_search ( dhash_t htab,
const void *  key,
void **  found 
)

Search for a value by providing a key.

Parameters:
[in]htabThe hash table to search in;
[in]keyThe key to be searched;
[out]foundA pointer to a location in which a pointer to the value will be stored. You may pass NULL as found parameter in order to simply test the presence of the required <key,value> pair.
Return values:
dhash_result_t::DHASH_FOUNDon success;
dhash_result_t::DHASH_NOTFOUNDon lookup failure.
Note:
In case of success *found will point to the internally stored value, thus modification will result in internal modification. In case of lookup failure, the *found pointer won't be modified.
dhash_result_t dhash_search_default ( dhash_t htab,
const void *  key,
void **  found,
dcreate_cb_t  create,
void *  ctx 
)

Search for a value by providing a key.

If it doesn't exist create it.

This function works as dhash_search(), but also allows to specify a callback to be called when the object you are searching cannot be found. The callback shall return an object which will be directly inserted as value.

Parameters:
[in]htabThe table in which insertion must be achieved;
[in]keyKey for the inserted value;
[out]foundThe value to be inserted.
[in]createThe callback for object creation;
[in]ctxThe context for the callback.
Return values:
dhash_result_t::DHASH_FOUNDif the element has been found;
dhash_result_t::DHASH_NOTFOUNDif the element has not been found and the new element has been allocated.
See also:
dcreate_cb_t.
void* dhash_val ( dhash_pair_t p)

Given a dhash_pair_t element, returns the corresponding value.

This is used to extract the value from pairs yielded by the iterator. The returned value is internal to the table, and unlike for the key, modifying it is allowed (it will result in changing the stored value).

See also:
Hash Tables contains a section with further information about iterator semantics for hash tables.
Parameters:
[in]pThe <key,value> pair.
Returns:
The value.
 All Data Structures Files Functions Variables Typedefs Enumerations Enumerator