TRELLIS.2 for Apple Silicon: Native 3D Generation on Mac
AI Summary
I've ported Microsoft's TRELLIS.2, a cutting-edge image-to-3D model, to run natively on Apple Silicon using PyTorch MPS, eliminating the need for an NVIDIA GPU. This adaptation allows users to generate detailed 3D meshes with over 400,000 vertices from single images in approximately 3.5 minutes on an M4 Pro Mac. The output includes textured OBJ and GLB files with PBR materials, ready for integration into 3D applications.
## Requirements and Setup
To use this port, you'll need a Mac with Apple Silicon (M1 or later), Python 3.11+, and at least 24GB of unified memory. The setup involves cloning the repository, logging into HuggingFace for model weights, and running a setup script that installs dependencies and patches TRELLIS.2. Once set up, you can generate 3D models from images using simple command-line instructions.
## Technical Details
The port replaces several CUDA-only libraries with pure-PyTorch and Python alternatives. For instance, sparse 3D convolution is achieved through a spatial hash of active voxels, and mesh extraction is reimplemented using Python dictionaries. The attention module now uses a scaled dot-product attention backend. Despite these adaptations, the pure-PyTorch version is slower than its CUDA counterpart, particularly in sparse convolution.
## Performance and Limitations
On an M4 Pro, the process from model loading to mesh decoding takes about 3.5 minutes, with memory usage peaking at 18GB. However, the port has some limitations: it doesn't support texture export or hole filling due to the absence of CUDA-specific libraries, and it's slower than the original CUDA version. Additionally, this port supports inference only, not training.
## Licensing and Credits
The ported code is available under the MIT License, while the model weights have their own licenses. Credits go to Microsoft Research for TRELLIS.2, Meta for DINOv3, and BRIA AI for RMBG-2.0.
Key Concepts
Image-to-3D generation is a process that converts 2D images into 3D models, allowing for the creation of three-dimensional representations from flat visuals. This involves complex algorithms that interpret depth, texture, and structure from a single image.
Sparse 3D convolution is a technique used in 3D data processing to efficiently handle sparse data by focusing computations only on non-empty regions. This reduces computational load and memory usage compared to dense convolutions.
Category
TechnologyOriginal source
https://github.com/shivampkumar/trellis-macMore on Discover
Summarized by Mente
Save any article, video, or tweet. AI summarizes it, finds connections, and creates your to-do list.
Start free, no credit card