439 lines
15 KiB
Text
439 lines
15 KiB
Text
Some thoughts about tsearch.
|
||
|
||
David Anderson
|
||
Updated 15 February 2014.
|
||
|
||
Tsearch() was defined and implemented fairly early in
|
||
the history of UNIX, but I don't know exactly when.
|
||
The earliest documentation I have is from the 1991
|
||
edition of UNIX System V Release 4 Programmer's Reference
|
||
Manual.
|
||
|
||
It is useful because it enables one to make a searchable tree
|
||
of, well, any data one might have need to search. Tsearch never needs
|
||
to understand the format of the data.
|
||
|
||
The functions defined there are tsearch(), tfind(),
|
||
tdelete(), and twalk(). The description is just
|
||
as difficult to understand as the documentation in release 3.54
|
||
of the Linux man-pages project (the most recent I have on hand).
|
||
|
||
The Single Unix Specification page on tsearch() etc is slightly
|
||
different text from the Programmer's Reference Manual and the
|
||
Linux man page, but no clearer.
|
||
|
||
The confusion is partly due to the interface definition. Until the
|
||
late 20th century only limited attention was given to interface
|
||
design. By the late 20th century it seems clear that any
|
||
newly designed tsearch-like functionality would have a significantly
|
||
different interface. An example of an improved interface is
|
||
the GNU/Linux function hsearch_r(). Making the return value
|
||
a small integer defining what the function actually did
|
||
(and using other arguments to return additional values as
|
||
necessary) significantly simplifies the discussion.
|
||
|
||
Here though we are talking about the old and standard interface.
|
||
We attempt to clarify the messy parts. See the examples provided
|
||
for actual code.
|
||
|
||
The functions and tables of tsearch are not thread safe.
|
||
Nor thread-aware. Any access to one of tables at the
|
||
same time the table is being updated will lead to chaos
|
||
eventually.
|
||
|
||
Any table tsearch maintains consists of records
|
||
of undefined (meaning the definition
|
||
is local to the internals) content each record
|
||
of which contains, as one element, a copy of a KEY pointer.
|
||
|
||
In the following I freely use tsearch for dwarf_tsearch
|
||
etc. The functionality and interfaces are identical.
|
||
|
||
=========
|
||
Further reading
|
||
|
||
The canonical reference for binary trees and the
|
||
algorithm reference for most of the tsearch code is
|
||
Donald E. Knuth,
|
||
"The Art of Computer Programming" Volume 3 Sorting and Searching,
|
||
Second Edition.
|
||
|
||
"Algorithms" Fourth Edition, Robert Sedgewick and Kevin Wayne,
|
||
is the basis for the red-black tree coding.
|
||
|
||
|
||
Wikipedia has quite a few entries of interest.
|
||
http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree
|
||
http://en.wikipedia.org/wiki/Tree_traversal
|
||
http://en.wikipedia.org/wiki/Red–black_tree
|
||
are just a few.
|
||
|
||
After implementing tsearch I looked briefly at
|
||
some other projects.
|
||
|
||
freecode.net: See the 'GNU libavl' project.
|
||
The libavl libary is a GNU project in C with each tree type
|
||
having a uniquely-named but consistent set of interfaces.
|
||
The interface design is much more sensible than
|
||
Unix TSEARCH. You will probably find it in most any
|
||
Linux distribution. The 2.0.3 tarfile source has no
|
||
configure step and the 2.0.3 Makefile failed for me,
|
||
it placed -lm too early on the command line building texitree
|
||
means the version mentioned on freecode.net:
|
||
ftp://ftp.gnu.org/pub/gnu/avl/avl-2.0.3.tar.gz
|
||
is rather old but shows internal evidence of
|
||
last being updated in 2007 (*.pdf and other file timestamps
|
||
in the tar file). The Makefile fails for me doing plain 'make'.
|
||
Doing 'make program' to just build the source works fine.
|
||
The interface definitions are more sensible than tsearch,
|
||
rb_delete returns the deleted item when it succeeds, for example.
|
||
|
||
freecode.net: See the 'libredblack' project.
|
||
This project tarfile was last updated in 2003.
|
||
It is on sourceforge at http://sourceforge.net/projects/redblacktree/
|
||
Implemented in C.
|
||
Configure worked (version 1.3) but the make step failed
|
||
running the python app ./rbgen due to the presence
|
||
of the string %prefix on line 9 of rbgen.
|
||
Removing that one line of python commentary from Eric Raymond
|
||
fixed the build!
|
||
There is a good amount of C #define magic making it harder to
|
||
read the source than one might expect.
|
||
The interface definitions are more sensible than tsearch,
|
||
rb_delete returns the deleted item when it succeeds, for example.
|
||
|
||
freecode.net: See the project 'Template based B+ Tree'.
|
||
Is in C++ and can be used for searches which are file or
|
||
memory based through use of C++ templates by the implemenation.
|
||
Have not attempted to build it.
|
||
|
||
=========
|
||
tsearch:
|
||
void *tsearch(const void *key, void **rootp,
|
||
int (*compar)(const void *l, const void *r));
|
||
|
||
Our terminology comes from the above declaration.
|
||
tsearch() returns a pointer. If tsearch
|
||
fails due to an out of memory or internal error
|
||
it returns NULL. Otherwise it returns a non-null
|
||
pointer (see Return value, below).
|
||
|
||
KEY:
|
||
First we will define KEY as it is ordinarily used.
|
||
KEY is a pointer to an object you define and initialize.
|
||
The object must remain in stable
|
||
storage for as long as the table has a copy of the KEY
|
||
pointing to the object. Example:
|
||
struct mystruct *key_a = malloc(sizeof(struct mystruct));
|
||
initialize(key_a, my data to fill in struct...);
|
||
void *r = tsearch(key_a,&treeroot,comparfunc);
|
||
|
||
ROOTP:
|
||
This must be the address of a void* datum.
|
||
Before the first call of tsearch the value of *rootp must
|
||
be NULL (set by you somehow).
|
||
The contents are maintained by tsearch thereafter.
|
||
|
||
COMPAR:
|
||
A function you write whose argments (when tsearch internals call it)
|
||
are KEY pointers from two records (your records). The function should
|
||
return -1 if the record KEY l points to is considered less than
|
||
the recorda KEY r points to. Return zero of the values
|
||
are considered to match. Return 1 otherwise.
|
||
Example: int comparefunc(const void *l,const void *r) {
|
||
struct mystruct *lp = l;
|
||
struct mystruct *rp = r;
|
||
if(lp->myv < rp->myv) return -1;
|
||
if(lp->myv == rp->myv) return 0;
|
||
return 1; }
|
||
Of course the comparison need not be of simple values, it could involve
|
||
anything. This is just a simple sketch.
|
||
|
||
Return value:
|
||
If tsearch() returns NULL something went wrong. There was insufficient memory
|
||
or an internal error of some kind.
|
||
|
||
Lets call the KEY passed in KEYa.
|
||
Otherwise tsearch() returns a pointer to a KEY for this object.
|
||
Dereference to get an actual KEY (a pointer to your object).
|
||
Lets call this dereferenced KEY key_deref.
|
||
struct mystruct *key_deref = *(struct mystruct *)r;
|
||
if KEYa == key_deref then:
|
||
KEYa was added to the tree.
|
||
Hence the tree now has a copy of
|
||
the value of key_a.
|
||
else:
|
||
KEYa matched a record in the tree and
|
||
you need to free any space you allocated
|
||
to build the key for this tsearch call.
|
||
In our example:
|
||
free(key_a).
|
||
|
||
|
||
I hope the above clarifies the use of tsearch somewhat.
|
||
============
|
||
Tfind:
|
||
void *find(const void *key, void **rootp,
|
||
int (*compar)(const void *l, const void *r));
|
||
|
||
The interfaces are the same but the return value is
|
||
(in contrast with tsearch) reasonably clear:
|
||
|
||
A non-null return is a key (pointer).
|
||
It never adds a new record, it simple tries to find a record.
|
||
A NULL return means either that the search-ed for record did not exist or that
|
||
something internal went wrong or perhaps that something incorrect
|
||
was detected at runtime in the arguments passed in.
|
||
|
||
============
|
||
Tdelete:
|
||
void *tdelete(const void *key, void **rootp,
|
||
int (*compar)(const void *l, const void *r));
|
||
|
||
The arguments are the same as tsearch/tfind,
|
||
but the return value is a bit odd.
|
||
|
||
If something goes wrong or if the record does not exist,
|
||
it returns NULL. That is straightforward.
|
||
|
||
If the record is deleted tdelete is supposed to return
|
||
a pointer to the parent of the deleted record.
|
||
The content of the deleted record is NOT deleted.
|
||
|
||
If the record deleted was the last record in the
|
||
tree, a NULL is returned and *rootp is set to NULL.
|
||
Thus restoring initial conditions of an empty tree.
|
||
|
||
If the record deleted was the root (of the records
|
||
in the tree as of the call) then what is this supposed
|
||
to return? No available documentation suggests an
|
||
answer. The current dwarf_tdelete implementations
|
||
return some record or other in this case.
|
||
|
||
In the case of dwarf_tdelete() for a hash 'tree'
|
||
if the node was the last in a hash chain then
|
||
NULL is returned. (ugly, but it is difficult to
|
||
figure out what else to do).
|
||
|
||
All this means that it is a waste of time to
|
||
inspect the return value from tdelete.
|
||
|
||
So to do a tdelete and do any necessary
|
||
memory free do:
|
||
key = xxxx
|
||
t = tfind(key,&root,compar_func)
|
||
|
||
if (t) {
|
||
tdelete(key,&root,compar_func)
|
||
}
|
||
free key
|
||
free r
|
||
|
||
where what 'free key' means is to free whatever 'key = xxxx'
|
||
allocated. Which means do the same for 'free r'.
|
||
|
||
Of course if your keys point into to a static array of data
|
||
then free is unnecessary, but how often will the data
|
||
be in static memory?
|
||
And if it is in static storage, why do any free at all?
|
||
|
||
|
||
|
||
============
|
||
Tdestroy:
|
||
void *tdestroy((void * root,
|
||
void (*free_node)(void * key));
|
||
|
||
This frees all memory the tree system has and
|
||
along the way it calls 'free_node' for each
|
||
node in the tree.
|
||
|
||
It empties the tree, but, oddly, the root argument
|
||
is a simple pointer to the root, not a pointer-to-pointer.
|
||
Hence this routine can free the tree, but cannot
|
||
assign a NULL to the root. So you should do
|
||
root = 0;
|
||
after calling tdestroy().
|
||
|
||
|
||
============
|
||
Twalk:
|
||
void *twalk((const void * /*root*/,
|
||
void (* /*action*/)(const void * /*nodep*/,
|
||
const DW_VISIT /*which*/,
|
||
const int /*depth*/));
|
||
|
||
============
|
||
Tdump:
|
||
void tdump(const void *rootp,
|
||
char *(*keyprint)(const void *key),
|
||
const char * msg);
|
||
|
||
This is unique to the dwarf_tsearch library.
|
||
It prints (to standard out) a representation of the tree.
|
||
The output is indented one space per level and ordered such
|
||
that turning the output 90 degrees clockwise shows a kind
|
||
of picture of the tree structure in the way trees
|
||
are normally shown in books.
|
||
It may be of slight interest for debugging trees if the
|
||
trees are not too large.
|
||
|
||
The 'msg' argument is just a character string of interest to you, something
|
||
identifying the output. It is printed once and otherwise ignored.
|
||
|
||
The 'keyprint' argument is a pointer to
|
||
a function you write. The function
|
||
is called for each node in turn. It
|
||
should return a pointer to a string with whatever
|
||
data from the passed-in-by-tdump key you wish to show.
|
||
Normally the pointer returned from keyprint should be pointing to a
|
||
static area so there is no issue with memory leakage to worry about.
|
||
tdump will print the returned string immediately and will not refer
|
||
to it again.
|
||
|
||
|
||
============
|
||
tsearch use using pointers (declarations left out):
|
||
The struct here is entirely ours: tsearch neither knows nor cares
|
||
how it is laid out.
|
||
|
||
mt = struct example_tentry *mt =
|
||
(struct example_tentry *)calloc(sizeof(struct example_tentry),1);
|
||
mt->mt_key = keyvalue;
|
||
mt->mt_data = datavalue;
|
||
errno = 0;
|
||
/* tsearch adds an entry if its not present already. */
|
||
retval = dwarf_tsearch(mt,tree, mt_compare_func );
|
||
if(retval == 0) {
|
||
printf("FAIL ENOMEM in search on %d, give up insertrecsbypointer\n",i);
|
||
exit(1);
|
||
} else {
|
||
struct example_tentry *re = 0;
|
||
re = *(struct example_tentry **)retval;
|
||
if(re != mt) {
|
||
/* found existing entry. */
|
||
mt_free_func(mt);
|
||
} else {
|
||
/* New entry mt was added. */
|
||
}
|
||
}
|
||
|
||
In this case the mt_compare_func() might if using a struct look like:
|
||
int
|
||
mt_compare_func(const void *l, const void *r)
|
||
{
|
||
const struct example_tentry *ml = l;
|
||
const struct example_tentry *mr = r;
|
||
/* If the key were a string, one could use strcmp()
|
||
instead of the comparisons here */
|
||
if(ml->mt_key < mr->mt_key) {
|
||
return -1;
|
||
}
|
||
if(ml->mt_key > mr->mt_key) {
|
||
return 1;
|
||
}
|
||
return 0;
|
||
}
|
||
|
||
|
||
|
||
|
||
============
|
||
tsearch use using values (declarations left out):
|
||
void *mt = (void *)key;
|
||
errno = 0;
|
||
/* tsearch adds an entry if its not present already. */
|
||
retval = dwarf_tsearch(mt,tree, value_compare_func );
|
||
if(retval == 0) {
|
||
printf("FAIL ENOMEM in search on %d, give up insertrecsbypointer\n",i);
|
||
exit(1);
|
||
} else {
|
||
/* successful search.mt might have been added by the call,
|
||
or maybe it was already in the tree.
|
||
It is impossible to tell which */
|
||
}
|
||
In this case the value_compare_func might look like:
|
||
int
|
||
value_compare_func(const void *l, const void *r)
|
||
{
|
||
VALTYPE lp = (VALTYPE)l;
|
||
VALTYPE rp = (VALTYPE)r;
|
||
if(lp < rp) {
|
||
return -1;
|
||
}
|
||
if(lp > rp) {
|
||
return 1;
|
||
}
|
||
return 0;
|
||
}
|
||
In this case the free_node function would look like
|
||
void free_node { /* Do nothing. */ }
|
||
|
||
|
||
In this case, there is never any free() calls you need to make
|
||
as the key is not a pointer to anything.
|
||
===================
|
||
If I were designing the interfaces in 2014 I might do
|
||
as follows.
|
||
|
||
I think the GNU hsearch_r interfaces are a fine
|
||
approach and could be extended to tsearch(). But
|
||
I propose something slightly different.
|
||
|
||
struct tsearch_base; /* opaque struct, content defined by
|
||
the search code, content not made public */
|
||
int compare_func(void *l, void*r); /* you define comparison function */
|
||
int free_func(void *n); /* you define free function */
|
||
|
||
None of the proposed interfaces touch errno.
|
||
|
||
All functions return 0 on success and an errno on failure.
|
||
They return EINVAL if an argument is incorrect somehow.
|
||
|
||
int tcreate(struct tsearch_base **base,compare_func,free_func)
|
||
allocate a struct_tsearch_base record and puts a pointer to
|
||
it in *base. Records the compare and free_func pointers.
|
||
A call on this by uses is a requirement .
|
||
|
||
returns:
|
||
ENOMEM if out of memory.
|
||
EINVAL if something wrong with an argument or an internal problem.
|
||
|
||
int tsearch(void *key,struct tsearch_base *base);
|
||
returns:
|
||
ENOMEM if out of memory.
|
||
EINVAL if something wrong with an argument or an internal problem.
|
||
|
||
int tfind(void *key,struct tsearch_base *base);
|
||
returns:
|
||
ESRCH if not found.
|
||
EINVAL if something wrong with an argument or an internal problem.
|
||
|
||
int tdelete(void *key,struct tsearch_base *base,void **keyout);
|
||
If successful, tdelete deletes the record key identifies
|
||
and returns that record's recorded key through *keyout.
|
||
So if a free()or other action is required you can take that action.
|
||
|
||
The 'base' struct tsearch_base record is NOT freed,
|
||
it remains, whether the tree has any records left or not.
|
||
|
||
returns:
|
||
ESRCH if key not found.
|
||
EINVAL if something wrong with an argument or an internal problem.
|
||
|
||
int tsize(struct tsearch_base *base,unsigned long*count_out);
|
||
Stores a count of the records currently recorded in the tree in *count_out.
|
||
|
||
EINVAL if something wrong with an argument or an internal problem.
|
||
|
||
|
||
int tdestroy(struct tsearch_base **base);
|
||
A call is a requirement on users -- to free up the tree space.
|
||
Calls free_func on each node in the tree and
|
||
frees all tree memory and does
|
||
*base = NULL
|
||
to reset your tree pointer.
|
||
|
||
returns:
|
||
EINVAL if something wrong with an argument or an internal problem.
|
||
=================
|