AI Performance Engineering Done in a New Way

A more skillful approach to performance

Our team of experts has improved the performance of every code of every application it has ever touched. Regardless of the brilliance of the original coders, this has been proven multiple times within every application area.

This kind of success is possible because our team looks beyond the specific algorithm and across the layers in the stack. Sometimes the opportunity for superior performance is in reconsidering discarded “slow” algorithms and optimizing them appropriately for the solution. Some coders will make a trade-off between algorithms based on the assumed performance of the alternatives and may choose the lesser algorithm without considering if the “slow” algorithm can be optimized by carefully looking at its specific characteristics.

We work across the stack and are always thinking about how applications will perform distributed and at scale. In addition, we have a deep understanding of a wide variety of architectures that are being used or are being revisited at this time (unique skills). We usually work embedded with subject-matter experts, but collaborate together to look for innovations and additional optimizations.

We are passionate about this work. We are the team that is never satisfied with the current performance and that takes the initiative to improve it. Most teams will stop working when preconceptions tell them that performance can’t be improved. This is precisely the time to innovate. Innovations come in many forms. Some of the innovations come from optimizing at a different layer of the stack or by using functions for unintended purposes.

Not every optimization can be done in software alone. Future hardware will be steered towards AI-focused workloads. Because of the innovative spirit and ability to work across both HW & SW groups, our team has often been involved in HW/SW codesign.

A few examples of these codesign experiences include 3D torus for solving non-linear PDEs using AutoDiff and long accumulators with interval math, SPARC DAX (in-memory columnar SW-in-Silicon) and future designs for accelerating DNNs.