Uniform Hashing of Arbitrary Input Into Key-Exclusive Segments by Paul Dorfman and Don Henderson
Wed, Jun 12
|Webinar
Don Henderson presents a method for using hash functions to split an arbitrarily large dataset into manageable chunks for processing.


Time & Location
Jun 12, 2024, 12:00 PM – 1:00 PM EDT
Webinar
Aggregating or combining large data volumes can challenge computing resources. For example, the process may be hindered by the system limits on utility space or memory and, as a result, either fail or run too long to be useful. It is a natural inclination to try solving the problem by segregating the input records into a number of smaller segments, processing them independently and combining the results. However, in order for such a divide-and-conquer tactic to work, two seemingly contradictory criteria must be met: First, to aggregate or combine the data correctly, no segment can share its key values with the rest; and second, the segments must be more or less equal in size. In this presentation, we show how a hash function can be used to achieve it for arbitrary input with no prior knowledge of the distribution of the key values among its records. Effectively, the method renders…