Specification for refactoring an example function:

Author: Pavly G.

Important

Functions Declarations and Type Defintions	Definition

The main quest is to find problems in this code, classify them, and refactor them accordingly. Follow the software specification to understand what are the problems of this code, why they are problems, and whether there is a refactor.

Preface:

The software specification identifies the problem first, and decomposes it for the sake of simplicity; then it groups the problems in sets by common properties. Eventually, it maps the problems from the problem domain to solutions in solution sets. Solution sets can be mapped in an injection, surjective, or bijective fashion. This strategy uses a couple of maneuvers from discrete mathematics (e.g., Axiomatic Set Theory - Automata Theory - Relations and Function Theory).

Problem identification and decomposition: Recall that a problem P exists such that P
1. Insert node routine is doing dynamic memory allocation on its own (Catastrophic).
2. Signaling for a failure is not implemented nor planned (Major).
3. Handling failure with external routines is not implemented nor planned (Minor).
4. Too much nested blocks and indirections (Minor - Readability issues).
5. Uses naive printing functions for logging (Minor - Portability issues).
6. Compares using concrete integers (Minor - Portability issues).
7. Uses concrete values (integers) (Minor - Portability issues).
8. Could lead to extensive recursion and StackOverflows (Catastrophic?Major?).
Problems regrouping based on common properties:
1. Problem $$P_{lifecycle}$$ set can group those problems that could be solved by callbacks. For example, problems $$\{1, 3, 6\}$$.
2. Problem $$P_{APIs}$$ set can group those problems that require specialized APIs to delegate to them. For example, problems $$\{1, 5, 7\}$$.
3. Problem $$P_{Struct}$$ set can group those problems that require a data structure refactor. For example, problems $$\{2, 4, 6, 7\}$$.
Finding Relations among problem sets and problem domain members:
- $$R_{Allocation} = \{(P_1, P_2), (P_2, P_1), (P_2, P_3), (P_1, P_3)\}$$, provides a relation set between problems concerning memory allocation.
- $$R_{Failure} = \{(P_2, P_3), (P_2, P_4), (P_3, P_4)\}$$, provides a relation set among problems concerning handling failure routines.
- $$R_{Structure} = \{(P_7, P_6), (-, P_7)\}$$, provides a relation set among problems concerning the structure of the ADT.
- $$R_{problem-problem} = R_{Allocation} \cup R_{Failure} \cup R_{Structure} \cup R_{Recursion}$$ $$= \{(P_1, P_2), (P_2, P_1), (P_2, P_3), (P_1, P_3), (P_7, P_6), (P_2, P_4), (P_3, P_4) ... \}$$.

Note

Notes:

The $$R_{problem-problem}$$ describes the way problems synthesize other problems in the problem domain (i.e., entailment or consequence relationship; such that $$p_0$$ leads to $$p_1$$).
The $$R_{problem-problem}$$ is a symmetric, and transitive relation.
The symmetry and transitivity entail that multiple problems could have a common solution.

Solutions synthesis and mapping to the problem domain:
1. Concrete use of Malloc: A Lifecycle on_operation_failure(...) Callback together with other metadata (e.g., _Caller _Address, and failure cause) will suffice for signaling for memory allocation needs (main quest).
2. Concrete use of Malloc: An Enum Structure or a Set of Error Codes to indicate the type of failure evoked by the API.
3. Concrete use of Malloc: A Lifecycle on_operation_failure(...) Callback will suffice for signaling to handle failures evoked by this API.
4. Concrete use of Malloc: Provide a factory pattern that selects an Allocator API for the Tree (Advanced Solution/Design Practice).
5. Indirections problems: Remove the double indirection unless necessary required by the implementation.
6. Blocks problems: Use statically linked functions that are linked to compilation-unit of internal uses.
7. Naive printing functions: Use a Logging API that can be programmed in compile-time (i.e., Debug and Production Binaries).
8. Concrete Data problems: Use generic pointers (aka. variables to memory addresses), and safe-access APIs.
9. Conrete Data Comparison problems: Use a lifecycle bool is_levo_subtree(...) and bool is_dextro_subtree(...) or bool is_left_subtree and bool is_right_subtree(...) predicate functions that abstract the comparison routines.
10. Recursive StackOverflows problems: Refactor and use simple loops or a for-each callback or the library traversal functions to traverse.
11. Concrete use of traversing routines: Use a lifecycle callback on_node_traversed(...) with some useful arguments; including but not limited to: the current node, the root node, the allocator, and others depending on the system design and the detailed design paradigm.

Implementation of binary tree data structure using linked buffers:

[binary_tree.h]

/**
 * @brief This code shows a better way to implement a binary tree using Self-referential 
 * data structures (i.e., linked buffers). The API uses short routines with a lifecycle pattern.
 * The lifecycle pattern could be omitted with simple return error codes for simplicity.
 *  
 * @copyright Electrostat-Lab BSD-3 Clause License.
 * @file binary_tree.h
 * @author pavl_g
 */
 
#ifndef _BINARY_LINKED_BUFFER_H_
#define _BINARY_LINKED_BUFFER_H_

#include <stdbool.h>

// disables function name mangling for C++ Compilers
#ifdef __cplusplus
extern "C" {
#endif

// declare incomplete types
typedef struct tree_lifecycle (tree_lifecycle);
typedef struct linked_buffer (linked_buffer);
typedef enum status_code (status_code); // used as a return value for the operations
typedef enum failure_cause (failure_cause); // used, among the rest of metadata, 
                                            // to determine the type of failure
// declare function prototypes 
// pass an address to the address buffer of the [linked_buffer] instead of an address to the object
// enables R/W on the address buffer of the [linked_buffer] as well as the [linked_buffer] object
// which enables the object allocation/reallocation with a protection to its association with the address buffer
status_code insert_node(linked_buffer **root, linked_buffer **node, tree_lifecycle *lifecycle);
status_code find_node(linked_buffer **src, linked_buffer **node, tree_lifecycle *lifecycle);
status_code retrieve_node(linked_buffer **src, void *data);
status_code remove_node(linked_buffer **src, void *data, tree_lifecycle *lifecycle);
status_code traverse_nodes(linked_buffer **src);

// ... the rest of functions for traversing data
// No allocation and destruction function, this is not the job of this API!
// Allocation can be triggered via the implementation for the lifecycle callbacks.
// Allocation is delegated to a specialized factory API that would select the appropriate 
// allocator for the Application (among which the GNU Allocator).

/**
 * @brief Provides an abstract lifecycle functions to handle callbacks from the library 
 * functions.
 * @note This API is memory-safe (i.e., addresses are nullable), but not thread-safe (i.e.,
 * concurrency modelling and synchronization design is required for multi-threaded access).
 */
struct tree_lifecycle {
    
    /**
    * @brief Dispatched when an operation has succeeded with a status code
    * and a pointer to the caller source. 
    */
    void (*on_operation_success)(linked_buffer *, void *, status_code);
    
    /**
     * @brief Dispatched when an operation has failed with a status code
     * and a pointer to the failure source. 
     */
    void (*on_operation_failure)(linked_buffer **, void *, status_code, failure_cause);
    
    /**
     * @brief Dispatched when a node is found in the [find_node] routine.
     * @param A Virtual Address to the direct parent root to the found node.
     * @param A Virtual Address to the found node.
     * @note This routine could be used for anything including node address nullification
     * which holds as a removal. Removal should be done by dereferencing the root node using 
     * the indirection operator, and writing on either the left or the right buffer after 
     * a conditional operator to test for the data.
     */
    void (*on_node_found)(linked_buffer *root, linked_buffer *node, void *caller, status_code);
    
    /**
     * @brief Dispatched when a node is successfully traversed from any 
     * API function. Check for the caller before executing subroutines.
     */
    void (*on_node_traversed)(linked_buffer *, void *caller, status_code);
    
    /**
     * @brief A predicate function to access left subtree 
     * in a n-level tree.
     */
    bool (*is_levo_link)(linked_buffer *, linked_buffer *);
    
    /**
     * @brief A predicate function to access right subtree 
     * in a n-level tree.
     */
    bool (*is_dextro_link)(linked_buffer *, linked_buffer *);
};

/**
 * @brief Provides a skeleton for the binary tree (a specialized ADT) using 
 * linked buffers. When allocated an object of this type, 
 * this implementation provides a fixed-size (24 bytes on 64-bit binaries) 
 * buffer to skeletonize binary trees.
 */
struct linked_buffer { 
  // no alignment issues since we are using pointers only ...
   linked_buffer *right_buffer;
   linked_buffer *left_buffer;
   void *data;
};

/**
 * @brief Provides a concrete structure for status codes that is 
 * highly specialized for this library. Status codes 
 */
enum status_code {
   OPERATION_SUCCESS = 0xa8,
   ALLOCATION_REQUEST = (OPERATION_SUCCESS + 1),
   DEALLOCATION_REQUEST = (ALLOCATION_REQUEST + 1),
   NODE_TRAVERSAL = (DEALLOCATION_REQUEST + 1),
   INSERTION_FAILURE = (NODE_TRAVERSAL + 1),
   REMOVAL_FAILURE = (INSERTION_FAILURE + 1),
   RETRIVAL_FAILURE = (REMOVAL_FAILURE + 1),
   SEARCH_FAILURE = (RETRIVAL_FAILURE + 1),
   SIZE_UPDATE_FAILURE = (SEARCH_FAILURE + 1),
   UNKNOWN = (SIZE_UPDATE_FAILURE + 1),
};

/**
 * @brief Provides metadata for failure routines to test for 
 * the failure cause.
 * @note the [failure_cause] buffer has values that 
 * are well-aligned with the [status_code] for traversal 
 * compatibilities.
 */
enum failure_cause {
    ILLEGAL_STATE = (UNKNOWN + 1),
    ILLEGAL_ARG = (ILLEGAL_STATE + 1),
    ARITHMETIC_OVERFLOW = (ILLEGAL_ARG + 1),
    NULLPOINTER = (ARITHMETIC_OVERFLOW + 1),
    LINK_NOT_FOUND = (NULLPOINTER + 1),
    UNDEFINED_BEHAVIOR = (LINK_NOT_FOUND + 1),
    NODE_NOT_FOUND = (UNDEFINED_BEHAVIOR + 1)
};

#ifdef __cplusplus
};
#endif

#endif

[binary_tree.c]

/**
 * @brief This code shows a better way to implement a binary tree using Self-referential 
 * data structures (i.e., linked buffers). The API uses short routines with a lifecycle pattern.
 * The lifecycle pattern could be omitted with simple return error codes for simplicity.
 *  
 * @copyright Electrostat-Lab BSD-3 Clause License. 
 * @file binary_tree.c
 * @author pavl_g
 */

#include <binary_tree.h>

//.................BEGINNING of Statically-linked Routines.................

static inline void dispatch_operation_status(linked_buffer **src, void *caller, tree_lifecycle *lifecycle, 
                                                                status_code status,
                                                                failure_cause cause) {
    if (NULL == src || NULL == *src || 
          NULL == lifecycle || NULL == caller) {
        return;
    }
    // More safety could be provided via functional bounding to the API functions only!
    // ...
    if (OPERATION_SUCCESS == status && NULL != lifecycle->on_operation_success) {
        lifecycle->on_operation_success(*src, caller, status);
        // failure_cause is ignored in this case ...
        return;
    }
    
    if (OPERATION_SUCCESS != status && NULL != lifecycle->on_operation_failure) {
        lifecycle->on_operation_failure(src, caller, status, cause);
        return;
    }
}

static inline void dispatch_on_node_found(linked_buffer *root, linked_buffer *node, 
                                                  void *caller, tree_lifecycle *lifecycle, 
                                                  status_code status) {
    // preprocessing -- validation phase
    if (NULL == root || NULL == node || NULL == caller ||
          NULL == lifecycle || NULL == lifecycle->on_node_found) {
        return ;
    }
    
    // processing -- invoke
    lifecycle->on_node_found(root, node, caller, status);
    
    // postprocessing -- KO
}

static inline void dispatch_on_node_traveresed(linked_buffer *src,
                                               tree_lifecycle *lifecycle,
                                               void *caller,
                                               status_code status) {
    // sanity check the input
    if (NULL == src || NULL == lifecycle 
          || NULL == lifecycle->on_node_traversed
          || NULL == caller) {
        return;
    }
    
    // dispatch node traversal callback
    lifecycle->on_node_traversed(src, caller, status);
}

static inline bool dispatch_comparison_test(linked_buffer *src, linked_buffer *node,
                                            tree_lifecycle *lifecycle, bool call_left_test) {
      // sanity checks for nullary associations.
      if (NULL == src || NULL == node || NULL == lifecycle) {
          return false;
      }
      if (call_left_test && (NULL == lifecycle->is_levo_link)) {
          return false;
      }
      if (!call_left_test && (NULL == lifecycle->is_dextro_link)) {
          return false;
      }
      
      // dispatch the link test
      if (!call_left_test) {
          return lifecycle->is_dextro_link(src, node);    
      } 
      return lifecycle->is_levo_link(src, node);
}

static inline void on_node_found_removal_impl(linked_buffer *root, linked_buffer *node, void *caller, status_code) {
    // preprocessing automata -- validation phase
    if (NULL == root || NULL == node 
                  || NULL == caller) {
        return ;
    }
    if (&find_node != caller) {
        return ;
    }
    // preprocessing automata accepts here!

    // processing automata -- nullify the memory address
    // test against the direct parent
    if (root->left_buffer->data == node->data) {
        root->left_buffer->data = NULL;
    } else if (root->right_buffer->data == node->data) {
        root->right_buffer->data = NULL;
    }
    
    // postprocessing automata -- delegated to the caller routine
}

//.................END of Statically-linked Routines.................

status_code insert_node(linked_buffer **root, linked_buffer **node, tree_lifecycle *lifecycle) {
    // sanity check the input
    if (NULL == root || NULL == *root) {
        dispatch_operation_status(root, &insert_node, lifecycle, 
                                    ALLOCATION_REQUEST, ILLEGAL_STATE);
        return ALLOCATION_REQUEST;
    } 
    if (NULL == node || NULL == *node) {
        dispatch_operation_status(root, &insert_node, lifecycle, 
                                    INSERTION_FAILURE, ILLEGAL_ARG);
        return INSERTION_FAILURE;
    }
    
    // starts the tracker at the root
    // move the tracker along the tree in Levo- and Dextro- links
    // as long as there are buffer addresses in these links!
    // once a link is being hit without a buffer address, apply 
    // insertion and terminate immediately.
    for (linked_buffer *tracker = *root; NULL != tracker;) {
      dispatch_on_node_traversed(tracker, lifecycle, 
                              &insert_node, NODE_TRAVERSAL);
      // test for the correct link
      if (dispatch_comparison_test(tracker, *node, lifecycle, true)) {
          if (NULL == tracker->left_buffer) {
              tracker->left_buffer = node; // insert and exit immediately
              break;
          }
          // note that this assignment validates
          // the next test ('tracker != NULL')
          // as ('tracker->left_buffer') cannot be null at this stage
          tracker = tracker->left_buffer; // move the root tracker
      } else if (dispatch_comparison_test(tracker, *node, lifecycle, false)) {
          // tests for the dextro link
          if (NULL == tracker->right_buffer) {
              tracker->right_buffer = node; // insert and exit immediately
              break;
          }
          tracker = tracker->right_buffer; // move the root tracker
      } else {
          // signal failure to insert and terminate!
          dispatch_operation_status(tracker, &insert_node, lifecycle, 
                                    INSERTION_FAILURE, LINK_NOT_FOUND);
          return INSERTION_FAILURE;
      }
    }

    dispatch_operation_status(root, &insert_node, lifecycle, OPERATION_SUCCESS, 0x00);
    
    return OPERATION_SUCCESS;
}

status_code find_node(linked_buffer **src, linked_buffer **node, tree_lifecycle *lifecycle) {
    // preprocessing code
    // sanity check the input
    if (NULL == root || NULL == *root || 
                NULL == node || NULL == *node) {
        dispatch_operation_status(root, &find_node, lifecycle, 
                                    SEARCH_FAILURE, ILLEGAL_ARG);
        return SEARCH_FAILURE;
    }
    
    // processing code
    bool is_found = false;
    linked_buffer *tracker = *root; 
    linked_buffer *antecedent = *root;
    
    while (NULL != tracker && !is_found) {
        dispatch_on_node_traversed(tracker, lifecycle, 
                              &find_node, NODE_TRAVERSAL);
        if (dispatch_comparison_test(tracker, *node, lifecycle, true)) {
            if (tracker->data != (*node)->data) {
                // move tracker
                antecedent = tracker;
                tracker = tracker->left_buffer;
                continue;
            }
            is_found = true;
        } else if (dispatch_comparison_test(tracker, *node, lifecycle, false)) {
            if (tracker->data != (*node)->data) {
                // move tracker
                antecedent = tracker;
                tracker = tracker->right_buffer;
                continue;
            }
            is_found = true;
        } else {
            is_found = false;
            break;
        }
    }
    
    // post processing 
    if (!is_found) {
      // call failure
      dispatch_operation_status(root, &find_node, lifecycle, 
                                  SEARCH_FAILURE, NODE_NOT_FOUND);
      return OPERATION_FAILURE;
    }
    // else call success
    dispatch_on_node_found(antecedent, tracker, 
                    &find_node, lifecycle, OPERATION_SUCCESS, 0x00);
    
    return OPERATION_SUCCESS;
}

status_code remove_node(linked_buffer **src, void *data, tree_lifecycle *lifecycle) {
    // preprocessing code -- validation phase
    if (NULL == src || NULL == *src || 
        NULL == data) {
        dispatch_operation_status(root, &remove_node, lifecycle, 
                                    REMOVAL_FAILURE, ILLEGAL_ARG);
        return REMOVAL_FAILURE;
    }
    
    // preprocessing automata -- initialization phase
    tree_lifecycle embed_lifecycle = {
        .on_node_found = &on_node_found_removal_impl;
    };
    linked_buffer node = {
        .data = data;
    };
    linked_buffer *_node = &node;
   
    // processing automata -- searching the node in O(log_2(n)) clocks 
    // processing automata -- a callback [on_node_found_removal_impl]
                // removing the address of a matched node 
    status_code status = find_node(src, &_node, &embed_lifecycle);
    
    // post-processing automata -- execute callbacks 
    if (SEARCH_FAILURE == status || ALLOCATION_REQUEST == status) {
        dispatch_operation_status(root, &remove_node, lifecycle, 
                            SEARCH_FAILURE, NODE_NOT_FOUND);
        return OPERATION_FAILURE;
    } 
    
     dispatch_operation_status(root, &remove_node, lifecycle, 
                    OPERATION_SUCCESS, 0x00);
     
     return OPERATION_SUCCESS;
}

Potential Changes to aid for Malfunctions of this code:

Passing addresses to the data buffer of the allocated objects by their addresses enables R/W operations on the address buffer associated with the data buffer of the object (e.g., Reallocation) [Code-fragment (1, 2)], unlike passing object references (i.e., references to the data buffer) by their addresses [Code-fragment (3)].

[(1) Using A Long Address variable to the Address Buffer of the Data Buffer]

static inline void allocate(long address, size_t size) {
   void **__address = (void **) address;
   if (0 != *__address) {
      return ;
   }

   *__address = malloc (size);
   printf("%d\n", *__address); // prints the value of the [address] from the caller stack
   // no memory leaks!
}

[(2) Using an Address Variable to the Address Buffer of the Data Buffer]

static inline void allocate(void **address, size_t size) {
   // More type-safety, removed typecasting by using pointer types!
   if (NULL != *address) {
      return ;
   }

   *address = malloc (size);
   printf("%d\n", *address); // prints the value of the [address] from the caller stack
   // no memory leaks!
}

[(3) Using an Address Variable to the Data Buffer]

static inline void allocate(void *address, size_t size) {
   if (NULL != address) {
      return ;
   }
   address = malloc (size);
   printf("%d\n", address); // prints a copy of the [address] from the caller stack
   // newly allocated character cannot be accessed 
   // anymore; as the address is assigned to a local
   // automatically allocated address buffer leading to address trap
   // Thus, A heap memory leak!
}

See this post.

Important

Some C idioms:

Virtual Memory Address Space:
Ker

Take home messages:

Removing the concrete use of Malloc is a sort of abstraction in C programming (i.e., Memory Allocation Abstractions).
Providing an API that is highly specialized to allocate memory or delegate the job to GNU Allocators (e.g., Malloc, Calloc, and others) is badly required especially for systems engineering.
Systems Engineering is a much larger domain, and quite different than OS Application Development, in which failures are accpeted at some points.
General purpose C programming books should provide vigilance for all types of students (e.g., Application Engineers and Systems Engineers) or at least contemplations for writing code that is compliant with system design principles.
Since C programming is not required for Application Development, but it's highly required for Systems Engineering; then these information about memory management outside the non-specialized APIs using the command-state and the lifecycle patterns must be explained rigorously.
For simplicity we're using Linked buffers here as a concrete data structure. However, one should use an Abstract Data Structure that has a factory API that provides a way to choose the implementation of List (e.g., Contiguous Buffers or Linked Lists).
Binary trees could be implemented using Arrays (aka. Contiguous Buffers), and it sometimes provides much more performance over non-contiguous buffers for some systems; therefore using an ADT is the most performant and robust solution ever.
[Discrete Mathematics] Designing data structures and algorithms are not physically bound to coding, but in fact, software design, and discrete mathematical structures play the largest role.
[Calculus - Limits and Continunity] In all the implementations of the Binary Tree ADTs, it's contemplated that the clock complexity rises by an asymptotic function of $$C(n) = Log_2(n) + \epsilon$$; where $$n$$ is the number of elements, while memory complexity rises by an asymptotic function (i.e., Growth Function) of $$M(n) = n * [Constant]$$; where the $$Constant$$ factor represents the total size of the structural object used.
[Different types of implementations] The Contiguous buffers implementations will be much more compatible with cache mamory algorithms (e.g., Set-Associative Mapping) rendering the second part of $$C(n)$$ function (i.e., $$\epsilon$$) much smaller.
[Different types of implementations] The Contiguous buffers implementations will also remove the 2 pointers (i.e., left_buffer and right_buffer), hence decreasing the total size by $$(2) * (8\ bytes) * n$$; where $$[Constant] = (2) * (8\ bytes)$$ for 64-bit binaries than the Linked buffers implementations. However, Contiguous Buffers will increase the complexity of the traversal code.

pavly-gerges/refactor-tree.md

Specification for refactoring an example function:

Preface:

Implementation of binary tree data structure using linked buffers:

Potential Changes to aid for Malfunctions of this code:

Take home messages:

Appendix-A: References: