Idea #4 – Information Technology – Binary Sort

I happen to have two (slightly functionally different) descriptions of this idea stored away, so I’ll just copy both.

My (novel?) idea for a sorting algorithm. It doesn’t do any comparisons, and it works in O(n) time.

We do not compare any elements. We create a binary tree and when navigating the tree we use the binary pattern of the element we’re currently sorting as the path. When we come to a 1 in the path and there’s no 1 branch, or a 0 in the path and there’s no 0 branch, we create branches until we get to our number. A node which contains 2 branches might also represent an element, as not all elements will necessarily have the same number of bits (for example, if we’re sorting strings). Each node would store a count of how many times its value (i.e. what sorted element it encodes if it’s taken as an end node) occurs in the list. That’s how we’d have duplicate values and how a branch can also be a value.

This has a worst-case scenario of O(n*l) where n is the length of the list and l is the average length of a member in bits, or O(t) where n is the total length of all the members in bits. An O(n log n) sort algorithm actually has to compare member values byte by byte, so for comparison it’s really O(n l/8 * log n), or O(n * l/16 * log n) if they’re comparing words, or O(n * l/32 * log n) if they’re comparing double-words.

It might be more advantageous than other sorting methods to implement this one in assembly.

The algorithm could alternatively use separate binary trees for each number of bits in the given elements, but I don’t how that would have any benefits.

A sorting algorithm that sorts in constant amortized time, but which is proportional to the average length, in binary, of the elements being sorted

Let’s say we have a function that returns a list of {0, 1} values which is a given element’s binary representation. Now we make an array of the size of our input list, which we will use as a binary tree. Each node will be a struct instance. For each element in our list to be sorted, we navigate this tree using the element’s binary sequence as the path, creating new branches as necessary. When we get to the end, we insert a reference to the original object into the beginning of a linked list of references to objects belonging at that node (i.e., equivalent elements). Then at the end of our sorting we simply walk the tree and traverse its linked lists to return our sorted list. If we want equivalent objects to appear in the same order in which they appeared in the original list, we can append them to the end of the linked lists instead and have the node store a pointer to the current last linked list item.

This is only the naive version, though.. we don’t really have to make the full path for every element. Instead of having a function that returns an object’s binary representation, we’ll have a function that returns an iterator for it. Then instead of navigating/creating nodes to the end of the path, we navigate until we find a node who simply stores a pointer to an iterator for an element that’s somewhere underneath it. Now, since a node can only contain one such iterator pointer, when you come to such a node you must poll values from both iterators (your current element’s and the node’s) and create a path until the values diverge, at which point you create two children and put your respective iterator and element pointers in those. Of course you’d nullify the pointers in the original node. You have a 50/50 chance of this happening on the first poll, a 3/4 chance of it happening by the second poll, etc. When an iterator exhausts itself you delete the iterator object, nullify the iterator pointer and start the linked list of equivalent elements for that node.

Alternatively to the linked lists we could simply have a pointer to the first instance of an equivalent member and a count value that specifies how many times it repeats, but that only works with members that don’t have any ‘hidden variables’ with respect to their binary representations. Plain old strings or numbers would work just fine that way. Sorting person objects by last name, wouldn’t.

This algorithm isn’t generally suitable for mixed-type collections or any type that uses a special comparison function. Strings, ints, floating points and fixed decimals should work fine. The above is optimized for strings. For a numerical type, all iterators are exhausted at the same branching level (8, 16, 32, or 64), although that doesn’t change the algorithm a whole lot — maybe provides for some minor optimizations. But because comparing two numbers might be less expensive than grabbing a value from a binary iterator, we could forgo the binary representations altogether for numbers. Instead of a node storing a pointer to an iterator and an element, it would simply store a number. That could be a huge improvement. And even if we still used binary representations, most numeric values probably start with lots of 0’s, so we could speed up the process by performing a BSF assembly command on it to determine the number of leading zeros, then jump directly to a node, skipping the first portion of the path constituting zeros, based on a table of pointers for numbers of leading zeros up to 8, 16, 32, or 64.

Leave a Reply