JudyHS(3) JudyHS(3)

NAME
JudyHS macros - C library for creating and accessing a dynamic array, using an array-of-bytes of Length as an Index and a word as a Value.

SYNOPSIS
cc [flags] sourcefiles -lJudy

#include <Judy.h>

Word_t  * PValue;                           // JudyHS array element
int       Rc_int;                           // return flag
Word_t    Rc_word;                          // full word return value
Pvoid_t   PJHSArray = (Pvoid_t) NULL;       // initialize JudyHS array
uint8_t * Index;                            // array-of-bytes pointer
Word_t    Length;                           // number of bytes in Index
Pvoid_t   PJHSIter = (Pvoid_t) NULL;        // initialize JudyHSIter

JHSI( PValue,  PJHSArray, Index, Length);   // JudyHSIns()
JHSD( Rc_int,  PJHSArray, Index, Length);   // JudyHSDel()
JHSG( PValue,  PJHSArray, Index, Length);   // JudyHSGet()
JHSFA(Rc_word, PJHSArray);                  // JudyHSFreeArray()
JHSIF( PValue,  PJHSArray, PJHSIter, Index, Length);   // JudyHSIterFirst()
JHSIN( PValue,  PJHSArray, PJHSIter, Index, Length);   // JudyHSNIterext()
JHSIL( PValue,  PJHSArray, PJHSIter, Index, Length);   // JudyHSLIterast()
JHSIP( PValue,  PJHSArray, PJHSIter, Index, Length);   // JudyHSPIterrev()
JHSFI(Rc_word, PJHSIter);                   // JudyHSFreeIter()
DESCRIPTION
A JudyHS array is the equivalent of an array of word-sized value/pointers. An Index is a pointer to an array-of-bytes of specified length: Length. Rather than using a null terminated string, this difference from JudySL(3) allows strings to contain all bits (specifically the null character). This new addition (May 2004) to Judy arrays is a hybird using the best capabilities of hashing and Judy methods. JudyHS does not have a poor performance case where knowledge of the hash algorithm can be used to degrade the performance.

The Length of each array-of-bytes can be from 0 to the limits of malloc() (about 2GB). Since JudyHS is based on a hash method, Indexes are not stored in any externally useful order; therefore the JudyHSIterFirst(), JudyHSIterNext(), JudyHSIterPrev() and JudyHSIterLast() neighbor search functions should be considered as providing an unordered enumeration of keys and values.

The hallmark of JudyHS is speed with scalability, but memory efficiency is excellent. The speed is very competitive with the best hashing methods. The memory efficiency is similar to a linked list of the same Indexes and Values. JudyHS is designed to scale from 0 to billions of Indexes.

A JudyHS array is allocated with a NULL pointer

Pvoid_t PJHSArray = (Pvoid_t) NULL;

Because the macro forms of the API have a simpler error handling interface than the equivalent functions, they are the preferred way to use JudyHS.

JHSI(PValue, PJHSArray, Index, Length) // JudyHSIns()
Given a pointer to a JudyHS array (PJHSArray), insert an Index string of length: Length and a Value into the JudyHS array: PJHSArray. If the Index is successfully inserted, the Value is initialized to 0. If the Index was already present, the Value is not modified.

Return PValue pointing to Value. Your program should use this pointer to read or modify the Value, for example:

Value = *PValue;
*PValue = 1234;

Note: JHSI() and JHSD can reorganize the JudyHS array. Therefore, pointers returned from previous JudyHS calls become invalid and must be re-acquired (using JHSG()).

JHSD(Rc_int, PJHSArray, Index, Length) // JudyHSDel()
Given a pointer to a JudyHS array (PJHSArray), delete the specified Index along with the Value from the JudyHS array.

Return Rc_int set to 1 if successfully removed from the array. Return Rc_int set to 0 if Index was not present.

JHSG(PValue, PJHSArray, Index, Length) // JudyHSGet()
Given a pointer to a JudyHS array (PJHSArray), find Value associated with Index.

Return PValue pointing to Index's Value. Return PValue set to NULL if the Index was not present.

JHSFA(Rc_word, PJHSArray) // JudyHSFreeArray()
Given a pointer to a JudyHS array (PJHSArray), free the entire array.

Return Rc_word set to the number of bytes freed and PJHSArray set to NULL.

JudyHS Search Functions: JudyHSIter
The JudyHSIter search functions allow you to search for indexes in the array, using an opaque iterator structure to hold search state, which is passed into and out of each call. Supplying a null pointer for the iterator will cause a new active structure to be allocated; thereafter, passing the structure, index and length as last returned into the next call will optimally continue the search; when done searching, free the iterator structure with JHSFI. Supplying a null pointer for the index string, and a 0 length, causes the search to begin at the start of the internal hash order; supplying a null pointer for the index string, and a length of -1 (very large), causes the search to begin at the end of the internal hash order. You may search inclusively or exclusively, in either forward or reverse directions.

If successful, Index is returned set to a pointer to the found index (in a read-only buffer owned by the iterator which will be overwritten by later search operations), Length is returned set to the length of the found index, and PValue is returned set to a pointer to Index's Value. If unsuccessful, PValue is returned set to NULL, and Index and Length are returned set to NULL and 0 respectively. PValue must be tested for non-NULL prior to using Index and Length, since a search failure is possible.

A JudyHS iterator is allocated with a NULL pointer

Pvoid_t PJHSIter = (Pvoid_t) NULL;
and when freed is reset to a NULL pointer.

Example of use:

Pvoid_t PJHSIter;
uint8_t *Index;
Word_t Length;
Word_t Rc_word;

PJHSIter = (Pvoid_t) NULL;
Index = (uint8_t) NULL;
Length = 0;
JHSIF(PValue, PJHSArray, PJHSIter, Index, Length);
while (PValue) {
  use(Index, Length, PValue);
  JHSIN(PValue, PJHSArray, PJHSIter, Index, Length);
}
JHSFI(Rc_word, PJHSIter);

JHSIF(PValue, PJHSArray, PJHSIter, Index, Length) // JudyHSIterFirst()
Search (inclusive) for the first index present that is equal to or greater than the passed Index string in the internal hash order. (Use a NULL string and a Length of 0 to find the first index in the array.) JHSIF() is typically used to begin a hash-order scan of the valid indexes in a JudyHS array.

JHSIN(PValue, PJHSArray, PJHSIter, Index, Length) // JudyHSIterNext()
Search (exclusive) for the next index present that is greater than the passed Index string. JHSIN() is typically used to continue a hash-order scan of the valid indexes in a JudyHS array, or to locate a "neighbor" of a given index in the internal hash order.

JHSIL(PValue, PJHSArray, PJHSIter, Index, Length) // JudyHSIterLast()
Search (inclusive) for the last index present that is equal to or less than the passed Index string. (Use a NULL string and a Length of -1 to find the last index in the array.) JHSIL() is typically used to begin a reverse-hash-order scan of the valid indexes in a JudyHS array.

JHSIP(PValue, PJHSArray, PJHSIter, Index, Length) // JudyHSIterPrev()
Search (exclusive) for the previous index present that is less than the passed Index string. JHSIP() is typically used to continue a reverse-hash-order scan of the valid indexes in a JudyHS array, or to locate a "neighbor" of a given index.

JHSFI(Rc_word, PJHSIter) // JudyHSFreeIter()
Given a pointer to a JudyHS iterator (PJHSIter), free the entire iterator structure.

Return Rc_word set to the number of bytes freed and PJHSIter set to NULL.

ERRORS: See: Judy_3.htm#ERRORS

EXAMPLES
Show how to program with the JudyHS macros. This program will print duplicate lines and their line number from stdin.

#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <Judy.h>

//  Compiled:
//  cc -O PrintDupLines.c -lJudy -o PrintDupLines

#define MAXLINE 1000000                 /* max fgets length of line */
uint8_t   Index[MAXLINE];               // string to check

int     // Usage:  PrintDupLines < file
main()
{
    Pvoid_t   PJArray = (PWord_t)NULL;  // Judy array.
    PWord_t   PValue;                   // Judy array element pointer.
    Word_t    Bytes;                    // size of JudyHS array.
    Word_t    LineNumb = 0;             // current line number
    Word_t    Dups = 0;                 // number of duplicate lines

    while (fgets(Index, MAXLINE, stdin) != (char *)NULL)
    {
        LineNumb++;                     // line number

        // store string into array
        JHSI(PValue, PJArray, Index, strlen(Index)); 
        if (PValue == PJERR)            // See ERRORS section
        {
            fprintf(stderr, "Out of memory -- exit\n");
            exit(1);
        }
        if (*PValue != 0)               // check if duplicate
        {
            Dups++;
            printf("Duplicate lines %lu:%lu:%s", *PValue, LineNumb, Index);
        }
        else
        {
            *PValue = LineNumb;         // store Line number
        }
    }
    printf("%lu Duplicates, JudyHS array of %lu Lines\n", 
		    Dups, LineNumb - Dups);

    // dump lines in hash order
    printf("Lines in hash order:\n");
    {
	Pvoid_t PJHSIter = (Pvoid_t) NULL;  // JudyHS iterator
        uint8_t *Index2 = (uint8_t) NULL;   // JudyHS key: line
	Word_t Length2 = 0;                 // length of key
	PWord_t PValue2;                    // pointer to value
        Word_t IterBytes;                   // size of iterator
	
        JHSIF(PValue2, PJHSArray, PJHSIter, Index2, Length2);
        while (PValue2) {
            printf(" line %lu: %*.*s", *PValue2, Index2, Length2, Length2);
            JHSIN(PValue2, PJHSArray, PJHSIter, Index2, Length2);
        }
	JHSFI(IterBytes, PJHSIter);
	printf("JudyHSFreeIter() freed %lu bytes of memory\n", IterBytes);
    }
    
    JHSFA(Bytes, PJArray);              // free JudyHS array
    printf("JudyHSFreeArray() freed %lu bytes of memory\n", Bytes);
    return (0);
}

AUTHOR
JudyHS was invented and implemented by Doug Baskins after retiring from Hewlett-Packard.

SEE ALSO
Judy(3), Judy1(3), JudyL(3), JudySL(3),
malloc(),
the Judy website, http://judy.sourceforge.net, for further information and Application Notes.