Very simple but useful PHP function for performing work on large arrays of information. Basically, it's like pagination for
callbacks. For example, this can be used for performing bulk database INSERT
operations (instead of inserting each
item individually). See below for a quick real world example.
$numbers = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ];
$return = doChunks($numbers, 3, function($numbersChunk) {
return array_sum($numbersChunk);
});
var_dump($return);
Should have 4 return values:
array (size=4)
0 => int 6
1 => int 15
2 => int 24
3 => int 10
Here's a more complex real world example on the power of breaking generic work into smaller portions. In this case, the
doChunks
function is helping optimize database operations. This use case originally involved inserting large numbers
of records using the SilverStripe ORM individually one at a time. SilverStripe's
framework is quite powerful, but this also means that individual operations can take quite a long time when repeated
thousands of times. The doChunks
function comes into play because when the total number of records is not known, it's
important to ensure that you limit the maximum number of values you INSERT
at any given moment. This is because MySQL
has an upper limit on the size of an individual query (which an vary by server via max_allowed_packet
),
which, in our case, was roughly 16MB. To be safe, we capped our maximum rows to 500 at a time.
This optimization (moving from ORM to bulk INSERT
) resulted in reducing import time from of about 75s
down to only
about 2.5s
(or a 30x speed increase). Less time waiting for an API to complete means reduced likelihood of a
potentially failed and/or unnecessarily repeated import caused by network connectivity issues. It also means dramatically
improved performance by reducing CPU hit to the server cluster and offloading the majority of work to the database itself
(best practice). Note that this is only one of 3 separate usages of doChunks
(which in total were inserting approximately
6.7k
records).
// Insert patents in bulk into database directly, breaking up into chunks of 500. While we're here,
// we'll make sure to also build a flat de-duplicated array of inventors for insertion later.
/** @var RawInventorData[] $inventors */
$inventors = [];
doChunks($patents, 500, function($patents) use (&$inventors) {
$dbValues = [];
/** @var RawPatentData[] $patents */
foreach($patents as $rawPatentData) {
$dbValues[] = [
$rawPatentData->matter_file_number,
$rawPatentData->grant_date,
$rawPatentData->patent_number,
$rawPatentData->country,
$rawPatentData->title,
$rawPatentData->abstract,
];
// Build flattened array of de-duplicated inventors (for bulk insertion later).
foreach($rawPatentData->inventors as $rawInventorData) {
if (!isset($inventors[$rawInventorData->qid])) {
$inventors[$rawInventorData->qid] = $rawInventorData;
}
}
}
// Pass values into insertFieldsBulk(), which will generate the concatenated string of
// escaped field values in parenthetical groups, e.g. ('val1', 'val2'), ('val1', 'val2'), ...
$dbFields = [
'MatterFileNumber',
'GrantDate',
'PatentNumber',
'CountryCode',
'Title',
'Abstract',
];
$insertFields = insertFieldsBulk($dbFields, $dbValues);
$query = "INSERT INTO Patent $insertFields";
DB::query($query);
});