440 lines
15 KiB
Text
440 lines
15 KiB
Text
|
Some thoughts about tsearch.
|
|||
|
|
|||
|
David Anderson
|
|||
|
Updated 15 February 2014.
|
|||
|
|
|||
|
Tsearch() was defined and implemented fairly early in
|
|||
|
the history of UNIX, but I don't know exactly when.
|
|||
|
The earliest documentation I have is from the 1991
|
|||
|
edition of UNIX System V Release 4 Programmer's Reference
|
|||
|
Manual.
|
|||
|
|
|||
|
It is useful because it enables one to make a searchable tree
|
|||
|
of, well, any data one might have need to search. Tsearch never needs
|
|||
|
to understand the format of the data.
|
|||
|
|
|||
|
The functions defined there are tsearch(), tfind(),
|
|||
|
tdelete(), and twalk(). The description is just
|
|||
|
as difficult to understand as the documentation in release 3.54
|
|||
|
of the Linux man-pages project (the most recent I have on hand).
|
|||
|
|
|||
|
The Single Unix Specification page on tsearch() etc is slightly
|
|||
|
different text from the Programmer's Reference Manual and the
|
|||
|
Linux man page, but no clearer.
|
|||
|
|
|||
|
The confusion is partly due to the interface definition. Until the
|
|||
|
late 20th century only limited attention was given to interface
|
|||
|
design. By the late 20th century it seems clear that any
|
|||
|
newly designed tsearch-like functionality would have a significantly
|
|||
|
different interface. An example of an improved interface is
|
|||
|
the GNU/Linux function hsearch_r(). Making the return value
|
|||
|
a small integer defining what the function actually did
|
|||
|
(and using other arguments to return additional values as
|
|||
|
necessary) significantly simplifies the discussion.
|
|||
|
|
|||
|
Here though we are talking about the old and standard interface.
|
|||
|
We attempt to clarify the messy parts. See the examples provided
|
|||
|
for actual code.
|
|||
|
|
|||
|
The functions and tables of tsearch are not thread safe.
|
|||
|
Nor thread-aware. Any access to one of tables at the
|
|||
|
same time the table is being updated will lead to chaos
|
|||
|
eventually.
|
|||
|
|
|||
|
Any table tsearch maintains consists of records
|
|||
|
of undefined (meaning the definition
|
|||
|
is local to the internals) content each record
|
|||
|
of which contains, as one element, a copy of a KEY pointer.
|
|||
|
|
|||
|
In the following I freely use tsearch for dwarf_tsearch
|
|||
|
etc. The functionality and interfaces are identical.
|
|||
|
|
|||
|
=========
|
|||
|
Further reading
|
|||
|
|
|||
|
The canonical reference for binary trees and the
|
|||
|
algorithm reference for most of the tsearch code is
|
|||
|
Donald E. Knuth,
|
|||
|
"The Art of Computer Programming" Volume 3 Sorting and Searching,
|
|||
|
Second Edition.
|
|||
|
|
|||
|
"Algorithms" Fourth Edition, Robert Sedgewick and Kevin Wayne,
|
|||
|
is the basis for the red-black tree coding.
|
|||
|
|
|||
|
|
|||
|
Wikipedia has quite a few entries of interest.
|
|||
|
http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree
|
|||
|
http://en.wikipedia.org/wiki/Tree_traversal
|
|||
|
http://en.wikipedia.org/wiki/Red–black_tree
|
|||
|
are just a few.
|
|||
|
|
|||
|
After implementing tsearch I looked briefly at
|
|||
|
some other projects.
|
|||
|
|
|||
|
freecode.net: See the 'GNU libavl' project.
|
|||
|
The libavl libary is a GNU project in C with each tree type
|
|||
|
having a uniquely-named but consistent set of interfaces.
|
|||
|
The interface design is much more sensible than
|
|||
|
Unix TSEARCH. You will probably find it in most any
|
|||
|
Linux distribution. The 2.0.3 tarfile source has no
|
|||
|
configure step and the 2.0.3 Makefile failed for me,
|
|||
|
it placed -lm too early on the command line building texitree
|
|||
|
means the version mentioned on freecode.net:
|
|||
|
ftp://ftp.gnu.org/pub/gnu/avl/avl-2.0.3.tar.gz
|
|||
|
is rather old but shows internal evidence of
|
|||
|
last being updated in 2007 (*.pdf and other file timestamps
|
|||
|
in the tar file). The Makefile fails for me doing plain 'make'.
|
|||
|
Doing 'make program' to just build the source works fine.
|
|||
|
The interface definitions are more sensible than tsearch,
|
|||
|
rb_delete returns the deleted item when it succeeds, for example.
|
|||
|
|
|||
|
freecode.net: See the 'libredblack' project.
|
|||
|
This project tarfile was last updated in 2003.
|
|||
|
It is on sourceforge at http://sourceforge.net/projects/redblacktree/
|
|||
|
Implemented in C.
|
|||
|
Configure worked (version 1.3) but the make step failed
|
|||
|
running the python app ./rbgen due to the presence
|
|||
|
of the string %prefix on line 9 of rbgen.
|
|||
|
Removing that one line of python commentary from Eric Raymond
|
|||
|
fixed the build!
|
|||
|
There is a good amount of C #define magic making it harder to
|
|||
|
read the source than one might expect.
|
|||
|
The interface definitions are more sensible than tsearch,
|
|||
|
rb_delete returns the deleted item when it succeeds, for example.
|
|||
|
|
|||
|
freecode.net: See the project 'Template based B+ Tree'.
|
|||
|
Is in C++ and can be used for searches which are file or
|
|||
|
memory based through use of C++ templates by the implemenation.
|
|||
|
Have not attempted to build it.
|
|||
|
|
|||
|
=========
|
|||
|
tsearch:
|
|||
|
void *tsearch(const void *key, void **rootp,
|
|||
|
int (*compar)(const void *l, const void *r));
|
|||
|
|
|||
|
Our terminology comes from the above declaration.
|
|||
|
tsearch() returns a pointer. If tsearch
|
|||
|
fails due to an out of memory or internal error
|
|||
|
it returns NULL. Otherwise it returns a non-null
|
|||
|
pointer (see Return value, below).
|
|||
|
|
|||
|
KEY:
|
|||
|
First we will define KEY as it is ordinarily used.
|
|||
|
KEY is a pointer to an object you define and initialize.
|
|||
|
The object must remain in stable
|
|||
|
storage for as long as the table has a copy of the KEY
|
|||
|
pointing to the object. Example:
|
|||
|
struct mystruct *key_a = malloc(sizeof(struct mystruct));
|
|||
|
initialize(key_a, my data to fill in struct...);
|
|||
|
void *r = tsearch(key_a,&treeroot,comparfunc);
|
|||
|
|
|||
|
ROOTP:
|
|||
|
This must be the address of a void* datum.
|
|||
|
Before the first call of tsearch the value of *rootp must
|
|||
|
be NULL (set by you somehow).
|
|||
|
The contents are maintained by tsearch thereafter.
|
|||
|
|
|||
|
COMPAR:
|
|||
|
A function you write whose argments (when tsearch internals call it)
|
|||
|
are KEY pointers from two records (your records). The function should
|
|||
|
return -1 if the record KEY l points to is considered less than
|
|||
|
the recorda KEY r points to. Return zero of the values
|
|||
|
are considered to match. Return 1 otherwise.
|
|||
|
Example: int comparefunc(const void *l,const void *r) {
|
|||
|
struct mystruct *lp = l;
|
|||
|
struct mystruct *rp = r;
|
|||
|
if(lp->myv < rp->myv) return -1;
|
|||
|
if(lp->myv == rp->myv) return 0;
|
|||
|
return 1; }
|
|||
|
Of course the comparison need not be of simple values, it could involve
|
|||
|
anything. This is just a simple sketch.
|
|||
|
|
|||
|
Return value:
|
|||
|
If tsearch() returns NULL something went wrong. There was insufficient memory
|
|||
|
or an internal error of some kind.
|
|||
|
|
|||
|
Lets call the KEY passed in KEYa.
|
|||
|
Otherwise tsearch() returns a pointer to a KEY for this object.
|
|||
|
Dereference to get an actual KEY (a pointer to your object).
|
|||
|
Lets call this dereferenced KEY key_deref.
|
|||
|
struct mystruct *key_deref = *(struct mystruct *)r;
|
|||
|
if KEYa == key_deref then:
|
|||
|
KEYa was added to the tree.
|
|||
|
Hence the tree now has a copy of
|
|||
|
the value of key_a.
|
|||
|
else:
|
|||
|
KEYa matched a record in the tree and
|
|||
|
you need to free any space you allocated
|
|||
|
to build the key for this tsearch call.
|
|||
|
In our example:
|
|||
|
free(key_a).
|
|||
|
|
|||
|
|
|||
|
I hope the above clarifies the use of tsearch somewhat.
|
|||
|
============
|
|||
|
Tfind:
|
|||
|
void *find(const void *key, void **rootp,
|
|||
|
int (*compar)(const void *l, const void *r));
|
|||
|
|
|||
|
The interfaces are the same but the return value is
|
|||
|
(in contrast with tsearch) reasonably clear:
|
|||
|
|
|||
|
A non-null return is a key (pointer).
|
|||
|
It never adds a new record, it simple tries to find a record.
|
|||
|
A NULL return means either that the search-ed for record did not exist or that
|
|||
|
something internal went wrong or perhaps that something incorrect
|
|||
|
was detected at runtime in the arguments passed in.
|
|||
|
|
|||
|
============
|
|||
|
Tdelete:
|
|||
|
void *tdelete(const void *key, void **rootp,
|
|||
|
int (*compar)(const void *l, const void *r));
|
|||
|
|
|||
|
The arguments are the same as tsearch/tfind,
|
|||
|
but the return value is a bit odd.
|
|||
|
|
|||
|
If something goes wrong or if the record does not exist,
|
|||
|
it returns NULL. That is straightforward.
|
|||
|
|
|||
|
If the record is deleted tdelete is supposed to return
|
|||
|
a pointer to the parent of the deleted record.
|
|||
|
The content of the deleted record is NOT deleted.
|
|||
|
|
|||
|
If the record deleted was the last record in the
|
|||
|
tree, a NULL is returned and *rootp is set to NULL.
|
|||
|
Thus restoring initial conditions of an empty tree.
|
|||
|
|
|||
|
If the record deleted was the root (of the records
|
|||
|
in the tree as of the call) then what is this supposed
|
|||
|
to return? No available documentation suggests an
|
|||
|
answer. The current dwarf_tdelete implementations
|
|||
|
return some record or other in this case.
|
|||
|
|
|||
|
In the case of dwarf_tdelete() for a hash 'tree'
|
|||
|
if the node was the last in a hash chain then
|
|||
|
NULL is returned. (ugly, but it is difficult to
|
|||
|
figure out what else to do).
|
|||
|
|
|||
|
All this means that it is a waste of time to
|
|||
|
inspect the return value from tdelete.
|
|||
|
|
|||
|
So to do a tdelete and do any necessary
|
|||
|
memory free do:
|
|||
|
key = xxxx
|
|||
|
t = tfind(key,&root,compar_func)
|
|||
|
|
|||
|
if (t) {
|
|||
|
tdelete(key,&root,compar_func)
|
|||
|
}
|
|||
|
free key
|
|||
|
free r
|
|||
|
|
|||
|
where what 'free key' means is to free whatever 'key = xxxx'
|
|||
|
allocated. Which means do the same for 'free r'.
|
|||
|
|
|||
|
Of course if your keys point into to a static array of data
|
|||
|
then free is unnecessary, but how often will the data
|
|||
|
be in static memory?
|
|||
|
And if it is in static storage, why do any free at all?
|
|||
|
|
|||
|
|
|||
|
|
|||
|
============
|
|||
|
Tdestroy:
|
|||
|
void *tdestroy((void * root,
|
|||
|
void (*free_node)(void * key));
|
|||
|
|
|||
|
This frees all memory the tree system has and
|
|||
|
along the way it calls 'free_node' for each
|
|||
|
node in the tree.
|
|||
|
|
|||
|
It empties the tree, but, oddly, the root argument
|
|||
|
is a simple pointer to the root, not a pointer-to-pointer.
|
|||
|
Hence this routine can free the tree, but cannot
|
|||
|
assign a NULL to the root. So you should do
|
|||
|
root = 0;
|
|||
|
after calling tdestroy().
|
|||
|
|
|||
|
|
|||
|
============
|
|||
|
Twalk:
|
|||
|
void *twalk((const void * /*root*/,
|
|||
|
void (* /*action*/)(const void * /*nodep*/,
|
|||
|
const DW_VISIT /*which*/,
|
|||
|
const int /*depth*/));
|
|||
|
|
|||
|
============
|
|||
|
Tdump:
|
|||
|
void tdump(const void *rootp,
|
|||
|
char *(*keyprint)(const void *key),
|
|||
|
const char * msg);
|
|||
|
|
|||
|
This is unique to the dwarf_tsearch library.
|
|||
|
It prints (to standard out) a representation of the tree.
|
|||
|
The output is indented one space per level and ordered such
|
|||
|
that turning the output 90 degrees clockwise shows a kind
|
|||
|
of picture of the tree structure in the way trees
|
|||
|
are normally shown in books.
|
|||
|
It may be of slight interest for debugging trees if the
|
|||
|
trees are not too large.
|
|||
|
|
|||
|
The 'msg' argument is just a character string of interest to you, something
|
|||
|
identifying the output. It is printed once and otherwise ignored.
|
|||
|
|
|||
|
The 'keyprint' argument is a pointer to
|
|||
|
a function you write. The function
|
|||
|
is called for each node in turn. It
|
|||
|
should return a pointer to a string with whatever
|
|||
|
data from the passed-in-by-tdump key you wish to show.
|
|||
|
Normally the pointer returned from keyprint should be pointing to a
|
|||
|
static area so there is no issue with memory leakage to worry about.
|
|||
|
tdump will print the returned string immediately and will not refer
|
|||
|
to it again.
|
|||
|
|
|||
|
|
|||
|
============
|
|||
|
tsearch use using pointers (declarations left out):
|
|||
|
The struct here is entirely ours: tsearch neither knows nor cares
|
|||
|
how it is laid out.
|
|||
|
|
|||
|
mt = struct example_tentry *mt =
|
|||
|
(struct example_tentry *)calloc(sizeof(struct example_tentry),1);
|
|||
|
mt->mt_key = keyvalue;
|
|||
|
mt->mt_data = datavalue;
|
|||
|
errno = 0;
|
|||
|
/* tsearch adds an entry if its not present already. */
|
|||
|
retval = dwarf_tsearch(mt,tree, mt_compare_func );
|
|||
|
if(retval == 0) {
|
|||
|
printf("FAIL ENOMEM in search on %d, give up insertrecsbypointer\n",i);
|
|||
|
exit(1);
|
|||
|
} else {
|
|||
|
struct example_tentry *re = 0;
|
|||
|
re = *(struct example_tentry **)retval;
|
|||
|
if(re != mt) {
|
|||
|
/* found existing entry. */
|
|||
|
mt_free_func(mt);
|
|||
|
} else {
|
|||
|
/* New entry mt was added. */
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
In this case the mt_compare_func() might if using a struct look like:
|
|||
|
int
|
|||
|
mt_compare_func(const void *l, const void *r)
|
|||
|
{
|
|||
|
const struct example_tentry *ml = l;
|
|||
|
const struct example_tentry *mr = r;
|
|||
|
/* If the key were a string, one could use strcmp()
|
|||
|
instead of the comparisons here */
|
|||
|
if(ml->mt_key < mr->mt_key) {
|
|||
|
return -1;
|
|||
|
}
|
|||
|
if(ml->mt_key > mr->mt_key) {
|
|||
|
return 1;
|
|||
|
}
|
|||
|
return 0;
|
|||
|
}
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
============
|
|||
|
tsearch use using values (declarations left out):
|
|||
|
void *mt = (void *)key;
|
|||
|
errno = 0;
|
|||
|
/* tsearch adds an entry if its not present already. */
|
|||
|
retval = dwarf_tsearch(mt,tree, value_compare_func );
|
|||
|
if(retval == 0) {
|
|||
|
printf("FAIL ENOMEM in search on %d, give up insertrecsbypointer\n",i);
|
|||
|
exit(1);
|
|||
|
} else {
|
|||
|
/* successful search.mt might have been added by the call,
|
|||
|
or maybe it was already in the tree.
|
|||
|
It is impossible to tell which */
|
|||
|
}
|
|||
|
In this case the value_compare_func might look like:
|
|||
|
int
|
|||
|
value_compare_func(const void *l, const void *r)
|
|||
|
{
|
|||
|
VALTYPE lp = (VALTYPE)l;
|
|||
|
VALTYPE rp = (VALTYPE)r;
|
|||
|
if(lp < rp) {
|
|||
|
return -1;
|
|||
|
}
|
|||
|
if(lp > rp) {
|
|||
|
return 1;
|
|||
|
}
|
|||
|
return 0;
|
|||
|
}
|
|||
|
In this case the free_node function would look like
|
|||
|
void free_node { /* Do nothing. */ }
|
|||
|
|
|||
|
|
|||
|
In this case, there is never any free() calls you need to make
|
|||
|
as the key is not a pointer to anything.
|
|||
|
===================
|
|||
|
If I were designing the interfaces in 2014 I might do
|
|||
|
as follows.
|
|||
|
|
|||
|
I think the GNU hsearch_r interfaces are a fine
|
|||
|
approach and could be extended to tsearch(). But
|
|||
|
I propose something slightly different.
|
|||
|
|
|||
|
struct tsearch_base; /* opaque struct, content defined by
|
|||
|
the search code, content not made public */
|
|||
|
int compare_func(void *l, void*r); /* you define comparison function */
|
|||
|
int free_func(void *n); /* you define free function */
|
|||
|
|
|||
|
None of the proposed interfaces touch errno.
|
|||
|
|
|||
|
All functions return 0 on success and an errno on failure.
|
|||
|
They return EINVAL if an argument is incorrect somehow.
|
|||
|
|
|||
|
int tcreate(struct tsearch_base **base,compare_func,free_func)
|
|||
|
allocate a struct_tsearch_base record and puts a pointer to
|
|||
|
it in *base. Records the compare and free_func pointers.
|
|||
|
A call on this by uses is a requirement .
|
|||
|
|
|||
|
returns:
|
|||
|
ENOMEM if out of memory.
|
|||
|
EINVAL if something wrong with an argument or an internal problem.
|
|||
|
|
|||
|
int tsearch(void *key,struct tsearch_base *base);
|
|||
|
returns:
|
|||
|
ENOMEM if out of memory.
|
|||
|
EINVAL if something wrong with an argument or an internal problem.
|
|||
|
|
|||
|
int tfind(void *key,struct tsearch_base *base);
|
|||
|
returns:
|
|||
|
ESRCH if not found.
|
|||
|
EINVAL if something wrong with an argument or an internal problem.
|
|||
|
|
|||
|
int tdelete(void *key,struct tsearch_base *base,void **keyout);
|
|||
|
If successful, tdelete deletes the record key identifies
|
|||
|
and returns that record's recorded key through *keyout.
|
|||
|
So if a free()or other action is required you can take that action.
|
|||
|
|
|||
|
The 'base' struct tsearch_base record is NOT freed,
|
|||
|
it remains, whether the tree has any records left or not.
|
|||
|
|
|||
|
returns:
|
|||
|
ESRCH if key not found.
|
|||
|
EINVAL if something wrong with an argument or an internal problem.
|
|||
|
|
|||
|
int tsize(struct tsearch_base *base,unsigned long*count_out);
|
|||
|
Stores a count of the records currently recorded in the tree in *count_out.
|
|||
|
|
|||
|
EINVAL if something wrong with an argument or an internal problem.
|
|||
|
|
|||
|
|
|||
|
int tdestroy(struct tsearch_base **base);
|
|||
|
A call is a requirement on users -- to free up the tree space.
|
|||
|
Calls free_func on each node in the tree and
|
|||
|
frees all tree memory and does
|
|||
|
*base = NULL
|
|||
|
to reset your tree pointer.
|
|||
|
|
|||
|
returns:
|
|||
|
EINVAL if something wrong with an argument or an internal problem.
|
|||
|
=================
|