Skip to content

Instantly share code, notes, and snippets.

@ochafik
Last active February 24, 2023 12:55
Show Gist options
  • Select an option

  • Save ochafik/ccb4f54757ef5fea7e0d6003eae891f5 to your computer and use it in GitHub Desktop.

Select an option

Save ochafik/ccb4f54757ef5fea7e0d6003eae891f5 to your computer and use it in GitHub Desktop.

Octree Slicing for Massively Parallel OpenSCAD rendering

TL;DR:

  • We split the scene in an octree and make sure we can restrict the rendering of the entire scene to a given cell (either w/ a custom CLI parameter or by transforming the input .scad to a rendering script able to render a single cell or to assemble cells)
  • We generate a Makefile that orchestrates the rendering of each cell (and the final reunion, which is free as we rely on lazy-union)
  • We let Make parallelize and cache the rendering
  • We abuse lazy unions to not pay the cost of final reunion of all cells

About exact numbers

They don't matter as much as one might think.

OpenSCAD already uses non-exact doubles when doing Hull, when doing Minkowski, and when doing... transforms.

TODO: what about extrusions?

The approach we suggest is to push transforms down, then do all the operations on double precision inputs which can be efficiently deserialized from disk. CSG operations can even use the elalish/manifold library.

Detailed process

  • Read original model's CSG export
  • Push transforms down to just above each scene leaf (using openscad/openscad#3637) (leaves are minkowski, polyhedron, linear_extrude, etc). The scene is now a massive union of intersections of whatever of... transformed scene leaves.
  • Compute the world bbox of each scene leaf, and the entire scene's bbox
  • Create and refine a CGAL Octree:
    • ensure an octree leaf cell intersects with at most 2 leaves' bboxes OR has at most N facets (Note: hard to know before doing the intersections, so just doing an estimation by multiplying the scene leaf's # vertices by the ratio of intersecting bounding volume over the scene leaf's bounding volume!).
    • for each octree leaf cell, keep track of all the scene leaves that have bboxes that intersect with it. Build the inverse index of that as we go (from scene leaf to index of octree leaf cells)
  • We keep track of identical nodes and build a dependency graph. We write a Makefile w/ rendering graph between nodes intertwined w/ cells & assembly nodes (see below)
  • Generate an %input%.cells.scad intermediate file:
    • includes octree-slicing.scad above
    • Defines variable bbox = list of octree leaves's bounding boxes (min / max points of said boxes in world coordinates)
    • Calls render_cells with bbox, node (stage) and the original file path (used as a prefix for the outputs, being %input%.cell_${N}.stl). This will either run the final assembly, or print the # of leaves, or render an
    • AST w/ each scene leaf wrapped in a render_cell(bboxes, [...indices_of_potentially_intersecting_leaves...]) call just above the multmatrix node
    • Each non-final node (for reused tree) is staged w/o its transform. Nodes are only rendered if they match the global dependency node index.

There's two processes at play here:

  • A slicer: can parallelize the push down of transforms and computation of bboxes but there's little point. Octree construction should be fast. Code generation should also be fast.
  • A rendering orchestrator: can be a bash script. Calls GNU parallel with each cell being an openscad render task w/ -Dcell=$i. Or creates a Makefile that can encapsulate any graph dependencies and support parallel execution. Then there's a final step with -Dcell=assembly that uses --feature=lazy-union to generate a single STL output at zero union cost.

This should quickly max out core utilization on complex models. In the worst cases it will create a lot more work (splitting solids for nothing).

Optionally we could have a feature that bails out of the minkowski's final union and leaves it in the CSG AST, so that that part can be split too.

Orchestration of reused subtrees vs. cells

We recognize reused subtrees and create a dependency tree w/ full intermediate renderings of reused nodes. Each such node is refered to by its STL output (or serialized nef) in the nodes that depend on it. Cell rendering of downstream reverse dependencies depend on the assembly of the reused subtree. All of those nodes are parallelizable together by Make! Non-final nodes have their own octree sub process and may have a different number of cells.

Todo:

  • each 3D scene leaf is as stored in binary STL (pre-transformed) so each cell rendering spends less time loading geometries. All these files will end up in the OS file cache anyway.
  • Extra dimension = animation frames (more shared subtrees, start w/ N roots)
# example.scad -> output.renderer.scad, output.Makefile (this file)
# node0 is reused twice (with different transforms) in node1 = world scene and somehow was split in 3 cells.
# note1 depends on node1 and was split in 2 cells. Each of these cells depend on node1's assembly output

output.node0.cell0.stl:
  openscad output.renderer.scad -Dnode=0 -Dcell=0
  
output.node0.cell1.stl:
  openscad output.renderer.scad -Dnode=0 -Dcell=1
  
output.node0.cell2.stl:
  openscad output.renderer.scad -Dnode=0 -Dcell=2
  
output.node0.stl: output.node0.cell0.stl output.node0.cell1.stl output.node0.cell2.stl
  openscad output.renderer.scad -Dnode=0 -Dcell=all

output.node1.cell0.stl: output.node0.stl
  openscad output.renderer.scad -Dnode=1 -Dcell=0
  
output.node1.cell1.stl: output.node0.stl
  openscad output.renderer.scad -Dnode=1 -Dcell=1
  
output.stl: output.node1.cell0.stl output.node1.cell1.stl
  openscad output.renderer.scad -Dnode=1 -Dcell=all

Simpler option

Pushing down the transforms and wrapping them with a render_cell call works, but it's a lot of AST fiddling. Instead, we can add a clip-to-bounds feature that's easy to implement in the 3D rendering logic: before processing operands of any operation, it checks their bounds against the clip bounds. If fully outside, discards them. If partially inside, clips them with the BBox before performing the operation. If fully inside no change. Do the same with top-level result (say, in case there was no operation whatsoever). Visitation needs to keep track of the current transform, and numerical inaccuracy might creep in (works better with transforms pushed down by openscad/openscad#3637)

  • Transform all the points to world coordinates (how to do that w/ minkowksi?? just improvise)

    • Simply create a global hash map of point -> solid it's been seen in
  • Use those points to refine an octree (w/ at most N points from at least 2 solids, unsure how to track that)

  • Run a the full model w/ clip bounds for each cell of the octree

    OPENSCAD_CLIP_BOUNDS=1.0,0.4,0.2;10.0,13.0,20.0 openscad - -o out.stl --enable=clip-to-bounds --output-format=binarystl  
    
  • Keep track of whether we're in a repeated subtree (or inside a minkowski): disable the bounds check

// Copyright 2023 Google LLC.
// SPDX-License-Identifier: Apache-2.0
/*
Original: example.scad
minkowski() {
cube(10);
sphere($fn=20);
}
translate([1, 2, -30]) {
cube(20);
translate([10, 2, -3])
sphere(20);
}
*/
include <octree_rendering.scad>;
// Min-max of each leaf's bbox bbox
bboxes=[
[[1, 0, -1], [20, 20, 10]],
[[1, 0, -10], [20, 20, 11]]
];
render_cells(bboxes, "example");
render_cell(bboxes, [1, 2, 3])
minkowski() {
cube(10);
sphere($fn=20);
}
render_cell(bboxes, [1, 2])
translate([1, 2, -30])
cube(20);
render_cell(bboxes, [1, 2])
translate([1, 2, -30]) translate([10, 2, -3])
sphere(20);
// //leaf=-1;
// leaf="assemble";
// leaves=20;
// // Min-max of each leaf's bbox bbox
// bboxes=[
// [[1, 0, -1], [20, 20, 10]],
// ];
// render_cells(leaf, bboxes, "octree-slicer");
// render_cell(leaf, bboxes, [1, 2, 3])
// minkowski() {
// cube(10);
// sphere($fn=20);
// }
// render_cell(leaf, bboxes, [1, 2])
// translate([1, 2, -30])
// cube(20);
// render_cell(leaf, bboxes, [1, 2])
// translate([1, 2, -30]) translate([10, 2, -3])
// sphere(20);
// Copyright 2023 Google LLC.
// SPDX-License-Identifier: Apache-2.0
// Can be "assembly", -1 to display the # of render leaves, or an integer >=0 to render a given leaf.
cell=-1;
function contains(list, item) =
len([for (i=list) if (i == item) true]) > 0;
module box(min_pt, max_pt) {
min_x = min_pt[0];
min_y = min_pt[1];
min_z = min_pt[2];
max_x = max_pt[0];
max_y = max_pt[1];
max_z = max_pt[2];
// Note: we use an explicit polyhedron to avoid numerical inaccuracy from the following "equivalent":
// translate(bbox[0]) cube(bbox[1] - bbox[0]);
// See https://en.wikibooks.org/wiki/OpenSCAD_User_Manual/Primitive_Solids#polyhedron
polyhedron(
points = [
[min_x, min_y, min_z], //0
[max_x, min_y, min_z], //1
[max_x, max_y, min_z], //2
[min_x, max_y, min_z], //3
[min_x, min_y, max_z], //4
[max_x, min_y, max_z], //5
[max_x, max_y, max_z], //6
[min_x, max_y, max_z] //7
],
faces = [
[0,1,2,3], // bottom
[4,5,1,0], // front
[7,6,5,4], // top
[5,6,2,1], // right
[6,7,3,2], // back
[7,4,0,3] // left
]
);
}
module render_cell(bboxes, potentially_intersecting_cells=undef) {
if (is_num(cell) && cell >= 0) {
//if (contains(potentially_intersecting_cells, cell)) {
for (c=potentially_intersecting_cells) if (c==cell) {
intersection() {
box(bboxes[cell][0], bboxes[cell][1]);
children();
}
}
}
}
module render_cells(bboxes, file_prefix) {
cell_count = len(bboxes);
if (cell=="assembly") {
for (i=[0:cell_count-1])
import(str(file_prefix, "_", i, ".stl"));
} else if (cell < 0) {
echo("cells:", cell_count);
} else if (cell >= cell_count) {
error(str("Invalid cell parameter (got ", cell, " but there are only ", cell_count, " cells)"));
}
}
#include "cgal.h"
#include "CGAL_Nef_polyhedron.h"
#include "cgalutils.h"
#include "printutils.h"
#include <CGAL/Octree.h>
#include "linalg.h"
#include "../IndexedMesh.h"
#include <unordered_set>
#include <unordered_map>
#include <flat_set>
#include "node.h"
#include "TransformNode.h"
#include "OffsetNode.h"
// #include "LeafNode.h"
class SharedNodesDetector : public NodeVisitor
{
const Tree& tree;
std::unordered_map<std::string, size_t>& key_occurrences;
SharedNodesDetector(const Tree& tree, std::unordered_map<std::string, size_t>& key_occurrences) : tree(tree), key_occurrences(key_occurrences) {}
Response visit(State& state, const AbstractNode& node) override {
if (state.isPrefix()) {
auto key = this->tree.getIdString(node);
key_occurrences[key]++;
}
return Response::ContinueTraversal;
}
public:
static std::unordered_set<std::string> detect(const std::vector<Tree>& trees) {
std::unordered_map<std::string, size_t> key_occurrences;
for (const auto &tree : trees) {
if (tree.root()) {
SharedNodesDetector detector(tree, key_occurrences);
detector.traverse(*tree.root());
}
}
std::unordered_set<std::string> set;
for (auto &pair : key_occurrences) {
if (pair.second > 1) {
set.insert(pair.first);//, set.end());
}
}
return std::move(set);
}
};
struct RenderStep {
// Same as a cache key (string source for subtree).
using ArtefactId = std::string;
// We split repeated subtrees out to separate render steps.
// Optionally, can also convert 3D imports to formats that are faster to deserialize, and put that conversion as an extra render step.
std::unordered_map<ArtefactId, std::shared_ptr<RenderStep>> dependencies;
std::shared_ptr<const AbstractNode> root;
// Each render step has its own partition
std::vector<BoundingBox> cells;
// Mapping from any node to the set of cell (indices) the node's bbox intersects with.
std::unordered_map<shared_ptr<const AbstractNode>, std::flat_set<size_t>> node_cell_indices;
// Any node -> artefact id (if node is shared or generally the result of a computation)
std::unordered_map<shared_ptr<const AbstractNode>, ArtefactId> node_artefact_ids;
std::string output_file;
};
class RenderStepsBuilder
{
const std::unordered_set<std::string>& sharedKeys;
// std::unordered_map<std::string, std::shared_ptr<RenderStep>> stepBySharedKey;
std::stack<std::shared_ptr<RenderStep>> currentStep;
std::unordered_map<ArtefactId, std::shared_ptr<RenderStep>> &allDependencies;
RenderStepsBuilder(const std::unordered_set<std::string>& sharedNodesCacheKeys) : sharedNodesCacheKeys(sharedNodesCacheKeys) {}
void collect(const std::shared_ptr<const AbstractNode> &node) {
}
public:
void collect(const Tree &tree) {
if (!node) return;
auto key = tree.getIdString(*node);
if (sharedKeys.find(key) == sharedKeys.end()) {
// Not a shared node, just recurse!
for (const auto &child : node.children) {
collectRenderSteps(tree, child, currentStep, allDependencies, sharedKeys);
}
} else {
// Shared node.
auto &sharedStep = allDependencies[key];
if (!sharedStep) {
// Haven't built that dependency yet!
sharedStep = make_shared<RenderStep>();
Tree subTree(node);
collectRenderSteps(subTree, node, sharedStep, allDependencies, sharedKeys);
}
currentStep.dependencies[key] = sharedStep;
}
}
};
class PointsCollector
{
typedef std::unordered_map<Vector3d, size_t> PointMap;
std::unordered_set<std::string> sharedNodesCacheKeys;
std::unordered_map<std::string, PointMap> sharedNodesPointMaps;
const Tree& tree;
void collectWithoutCaching(const AbstractNode &node, const Transform3d& world_transform, PointMap& out) {
auto visitChildren = [&](const Transform3d& transform) {
for (auto &child : node.children) {
if (child) {
collect(*child, transform, out);
}
}
};
if (auto transformNode = dynamic_cast<const TransformNode*>(&node)) {
visitChildren(world_transform * transformNode->matrix);
} else if (auto leafNode = dynamic_cast<const LeafNode*>(&node)) {
shared_ptr<const Geometry> geom(leafNode->createGeometry());
if (geom) {
IndexedMesh im;
im.append_geometry(geom);
// TODO: detect if im.isEmpty() (e.g. 2D case)
auto &vec = im.vertices.getArray();
for (const auto &pt : vec) {
out[world_transform * pt]++;
}
}
} else if (auto offsetNode = dynamic_cast<const OffsetNode*>(&node)) {
// TODO
} else {
visitChildren(world_transform);
}
}
bool shouldBeShared(const std::string &key) const {
return sharedNodesCacheKeys.find(key) != sharedNodesCacheKeys.end();
}
protected:
PointsCollector(const Tree& tree)
: tree(tree), sharedNodesCacheKeys(std::move(SharedNodesDetector::detect(tree)))
{}
void collect(const AbstractNode &node, const Transform3d& world_transform, PointMap& out) {
auto key = this->tree.getIdString(node);
if (shouldBeShared(key)) {
PointMap *pointMap = nullptr;
auto sharedIt = sharedNodesPointMaps.find(key);
if (sharedIt != sharedNodesPointMaps.end()) {
// Points of shared node have already been collected.
pointMap = &sharedIt->second;
} else {
// Collect points of shared node and cache them.
pointMap = &sharedNodesPointMaps[key];
collectWithoutCaching(node, Transform3d::Identity(), *pointMap);
}
assert(pointMap);
for (const auto &p : *pointMap) {
out[world_transform * p.first]++;
}
} else {
collectWithoutCaching(node, world_transform, out);
}
}
public:
static PointMap collect(const Tree& tree) {
PointsCollector::PointMap points;
if (tree.root()) {
PointsCollector pointsCollector(tree);
pointsCollector.collect(*tree.root(), Transform3d::Identity(), points);
}
return std::move(points);
}
};
struct RenderingOptions {
};
std::vector<std::shared_ptr<RenderStep>> createRenderSteps(const std::vector<std::pair<std::shared_ptr<const AbstractNode>, std::string>>& rootsAndOutputNames, const RenderingOptions &options) {
std::vector<std::shared_ptr<RenderStep>> result;
return std::move(results);
}
struct PartitionResult {
// std::unordered_map<shared_ptr<const AbstractNode>, std::string> shared_subtrees_ids;
// //
// std::unordered_map<std::string, std::unordered_set<std::string>> shared_subtrees_dependencies;
// std::unordered_map<shared_ptr<const AbstractNode>, std::string> shared_subtrees_ids;
};
PartitionResult partitionTree(const Tree &tree, size_t maxPointsInCellsWithOperations) {
std::unordered_map<std::string, shared_ptr<const AbstractNode>> example_instance_for_each_shared_key;
// std::unordered_set<std::string> shared_keys;
std::unordered_map<shared_ptr<const AbstractNode>, std::flat_set<size_t>> cell_indices_by_node;
PartitionResult result;
return std::move(result);
}
// {
// this->bbox.setNull();
// for (const auto& poly : polygons) {
// for (const auto& p : poly) {
// this->bbox.extend(p);
// }
// typedef CGAL::Simple_cartesian<double> Kernel;
// typedef Kernel::Point_3 Point;
// typedef CGAL::Octree<Kernel, Point_vector> Octree;
// CGAL::Octree createOctree(const Tree& tree) {
// }
// // TODO(ochafik): Have GeometryEvaluator inherit this
// class AbstractGeometryEvaluator : public NodeVisitor
// {
// const Tree& tree;
// public:
// [[nodiscard]] const Tree& getTree() const { return this->tree; }
// protected:
// AbstractGeometryEvaluator(const Tree& tree) : tree(tree) {}
// const std::string& getCacheKey(const AbstractNode& node) {
// return this->tree.getIdString(node);
// }
// void smartCacheInsert(const AbstractNode& node,
// const shared_ptr<const Geometry>& geom)
// {
// const std::string& key = this->getCacheKey(node);
// if (CGALCache::acceptsGeometry(geom)) {
// if (!CGALCache::instance()->contains(key)) CGALCache::instance()->insert(key, geom);
// } else {
// if (!GeometryCache::instance()->contains(key)) {
// if (!GeometryCache::instance()->insert(key, geom)) {
// LOG(message_group::Warning, Location::NONE, "", "GeometryEvaluator: Node didn't fit into cache.");
// }
// }
// }
// }
// bool isSmartCached(const AbstractNode& node)
// {
// const std::string& key = this->getCacheKey(node);
// return (GeometryCache::instance()->contains(key) ||
// CGALCache::instance()->contains(key));
// }
// shared_ptr<const Geometry> smartCacheGet(const AbstractNode& node, bool preferNef)
// {
// const std::string& key = this->getCacheKey(node);
// shared_ptr<const Geometry> geom;
// bool hasgeom = GeometryCache::instance()->contains(key);
// bool hascgal = CGALCache::instance()->contains(key);
// if (hascgal && (preferNef || !hasgeom)) geom = CGALCache::instance()->get(key);
// else if (hasgeom) geom = GeometryCache::instance()->get(key);
// return geom;
// }
// };
// class PointsCollector : public AbstractGeometryEvaluator
// {
// // typedef CGAL::Simple_cartesian<double> Kernel;
// // typedef Kernel::Point_3 Point;
// // typedef CGAL::Octree<Kernel, Point_vector> Octree;
// // Octree octree_;
// void pushTransform(const Transform3d& transform) {
// assert(!world_transforms.empty());
// world_transforms.emplace(world_transforms.top() * transform);
// }
// void popTransform() {
// assert(!world_transforms.empty());
// world_transforms.pop();
// assert(!world_transforms.empty());
// }
// template <class iter>
// void addPoints(const iter& start, const iter& end) {
// const auto &transform = world_transform.top();
// for (auto it = start; it != end; ++it) {
// const auto &pt = *it;
// points[pt]++;
// }
// }
// void processGeometry(const AbstractNode &node) {
// shared_ptr<const Geometry> geom;
// if (isSmartCached(node)) {
// geom = smartCacheGet(node, /* preferNef= */ false);
// } else {
// geom = node.createGeometry();
// smartCacheInsert(node, geom);
// }
// // auto ps = CGALUtils::getGeometryAsPolySet(geom);
// if (geom) {
// IndexedMesh im;
// im.append_geometry(geom);
// auto &vec = im.vertices.getArray();
// addPoints(vec.begin(), veg.end());
// }
// }
// typedef std::unordered_map<Vector3d, size_t> PointMap;
// std::stack<PointMap> point_maps;
// std::stack<Transform3d> world_transforms;
// std::unordered_set<std::string> sharedNodesCacheKeys;
// public:
// PointsCollector(const Tree& tree, const std::unordered_set<std::string> &sharedNodesCacheKeys)
// : AbstractGeometryEvaluator(tree),
// sharedNodesCacheKeys(sharedNodesCacheKeys)
// {
// world_transforms.emplace(Transform3d::Identity());
// point_maps.emplace(PointMap());
// }
// // shared_ptr<const Geometry> evaluateGeometry(const AbstractNode& node, bool allownef);
// // Response visit(State& state, const AbstractNode& node) override;
// // Response visit(State& state, const AbstractIntersectionNode& node) override;
// // Response visit(State& state, const AbstractPolyNode& node) override;
// // Response visit(State& state, const LinearExtrudeNode& node) override {
// // // TODO
// // }
// // Response visit(State& state, const RotateExtrudeNode& node) override {
// // if (state.isPrefix()) {
// // point_maps.emplace(PointMap());
// // } else if (state.isPostfix()) {
// // auto nested_map(std::move(point_maps.pop());
// // // TODO
// // }
// // }
// // Response visit(State& state, const RoofNode& node) override;
// // Response visit(State& state, const ListNode& node) override;
// // Response visit(State& state, const GroupNode& node) override;
// // Response visit(State& state, const RootNode& node) override;
// Response visit(State& state, const LeafNode& node) override {
// if (state.isPrefix()) {
// processGeometry(node);
// }
// return Response::ContinueTraversal;
// }
// Response visit(State& state, const TransformNode& node) override {
// if (state.isPrefix()) {
// pushTransform(node.matrix);
// } else if (state.isPostfix()) {
// popTransform();
// }
// return Response::ContinueTraversal;
// }
// Response visit(State& state, const CsgOpNode& node) override {
// if (state.isPrefix()) {
// if (node.type == OpenSCADOperator::MINKOWSKI) {
// // TODO
// }
// }
// return Response::ContinueTraversal;
// }
// // Response visit(State& state, const CgalAdvNode& node) override;
// // Response visit(State& state, const ProjectionNode& node) override;
// // Response visit(State& state, const RenderNode& node) override;
// Response visit(State& state, const TextNode& node) override {
// if (state.isPrefix()) {
// processGeometry(node);
// }
// return Response::ContinueTraversal;
// }
// Response visit(State& state, const OffsetNode& node) override {
// if (state.isPrefix()) {
// // TODO
// }
// return Response::ContinueTraversal;
// }
// // void addToParent(const State& state, const AbstractNode& node, const shared_ptr<const Geometry>& geom);
// // Response lazyEvaluateRootNode(State& state, const AbstractNode& node);
// // std::map<int, Geometry::Geometries> visitedchildren;
// // const Tree& tree;
// // shared_ptr<const Geometry> root;
// public:
// };
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment