Rezo Accelerates Drug Discovery while Saving >90% on Compute Costs with Union.ai
Overview
Rezo Therapeutics is a pioneering biotech company turning protein-protein interactions into oncology therapeutics. Rezo combines AI, high-throughput experimentation, and cellular validation to understand how protein networks drive disease. A key aspect of this is the ability to perform structural predictions and analyses at scale. These tasks demand immense computational resources, efficient workflow orchestration and streamlined infrastructure management.
By partnering with Union.ai, Rezo gained a powerful ally that helped them scale faster, cut costs, and stay focused on science instead of infrastructure.
Challenge: Scaling science without scaling overhead.
Rezo’s early-stage drug discovery required operational excellence to achieve scientific innovation. Their team faced four key challenges:
- Scalable Structural Predictions: Modeling thousands of protein-protein interactions with tools like AlphaFold required high-throughput, compute-intensive workloads.
- Cost Efficiency at Scale: As a growth-stage biotech, Rezo needed to stretch their compute budget while maintaining performance.
- Workflow Complexity: Rezo’s workflows required different types of cloud hardware at different points, creating an orchestration challenge.
- Minimal Operational Overhead: With a small team, they wanted to avoid the burden of managing infrastructure or hiring a dedicated DevOps engineer.
Solution: Union.ai helps power Rezo’s breakthroughs
Rezo turned to Union.ai, the AI development platform for Flyte, to overcome these barriers and unlock new velocity in their drug discovery pipeline.
1. Running Biomolecular Structural Inference at Scale with TPUs
To run inference efficiently, Rezo wanted to use Tensor Processing Units (TPUs) on Google Cloud. But using TPUs dynamically wasn’t straightforward—until Union.ai stepped in.
"We get significant cost efficiency from running biomolecular structural AI inference on TPUs. Having the ability to scale dynamically—to go from zero to 500 TPUs across four regions—is unique and highly valuable. We get that from Union.ai, and I don’t know who else could give us that."
— Greg Friedland, Principal ML Engineer, Rezo
Union.ai’s team rapidly built and deployed TPU orchestration capabilities tailored to Rezo’s needs, giving them access to high-performance hardware without the typical engineering lift.
2. Resilient, Low-Cost Compute via Spot Instances
Spot instances (heavily discounted compute resources from GCP) can dramatically reduce costs, but they usually come with the risk of being interrupted and disrupting workflows. Union.ai’s platform and retry mechanisms allowed Rezo to utilize thousands of spot instances without risk of workload interruption.
"Union.ai gives us the fault tolerance we need to run our workflows on spot instances. We can configure the number of retries and fall back to on-demand if needed."
This allowed Rezo to run massive, complex protein prediction workloads using map tasks at a fraction of the standard compute cost, with no compromise in reliability.
3. Cost Reduction at Every Layer
Whether it was physics-based simulations or multiple sequence alignment creation (MSA), Rezo used Union.ai to fine-tune their infrastructure for each use case.
- Esoteric CPU instances: Union.ai allowed Rezo to discover and deploy high-efficiency instance types that reduced costs by 2–3x.
"We've been able to save a factor of two or three by using this esoteric instance type at high scale with Union.ai. Scaling out to a thousand and hopefully soon even more of these 48 and 60 CPU instances and scaling dynamically."
- On-the-fly SSD provisioning: For protein sequence MSAs, Union.ai enabled dynamic use of local SSDs to store terabytes of data, slashing over 90% of monthly compute storage costs.
"We have been able to use Union.ai to configure local SSDs to download one and a half terabytes of data on the fly from GCS, store this on a local SSD, and run a batch operation. This allowed us to reduce the cost of doing MSAs from around $10,000 a month to hundreds of dollars."
4. Eliminating DevOps Costs
Perhaps most impressively, Rezo did all this without increasing DevOps overhead. Union.ai’s managed service and responsive engineering team allowed Rezo to operate like a team twice their size.
"The Union.ai team is responsive on Slack and open to brainstorming, getting guidance on how to use Flyte, taking feature requests, and conveying them to the engineering team. We haven't had to hire a DevOps engineer on the team. I've been able to lean on Union.ai to abstract out that problem and handle the infrastructure. If I have a request, they do it the right way. I don't have to worry about: Am I following security best practices? Am I going to overload my API server? Those kinds of things."
Union.ai’s team acted as an extension of Rezo’s own, providing guidance, incorporating feature requests, and ensuring Rezo could focus on core research and drug discovery efforts—not system maintenance and overhead.
- 90% — cost savings on storage
- 67% — cost savings on compute
- 500 TPUs — scaling dynamically across 4 regions
Results: Union.ai accelerated drug discovery while saving on compute costs
Thanks to Union.ai, Rezo is now operating at an entirely new level of efficiency and scale. The partnership has delivered:
- Accelerated drug discovery through scalable workflows and compute
- Massive cost savings across storage (up to 90%), compute (up to 67%), and staffing, enabling Rezo to quickly and cost-effectively deliver medicines to patients
- Focus on scientific discovery preserved by removing infrastructure burdens
- A trusted partner in Union.ai that evolves alongside Rezo’s needs
Rezo’s journey highlights how biotech innovators can tackle highly complex computational challenges and accelerate the development of life-saving therapeutics for diseases like cancer by partnering with an AI development infrastructure.