Old Ways of looking at AI Performance Engineering

Two typical approaches to optimizing performance

Performance QA. Typically, performance engineering groups in the industry can be described as Performance QA. This engineering is characterized by focused “run, test, repeat” cycles. Performance is accurately quantified, but little performance improvement in the product is achieved.

Domain-expert focus. Coders/algorithmists are responsible for performance. Typically this will come down to coders pushing algorithm limits within their field of expertise (ex: DL algorithms). If specific optimal implementation is needed, such as writing CUDA for GPUs, then those CUDA engineers are brought in to optimize that level. While each optimizes the code for that domain-specific layer, such myopic vision will miss systemic optimizations.

Both of these approaches overlook optimizations, which leads to inefficient and under-performing codes, which miss performance objectives, which waste costly distributed compute resources.

It is surprising to us the some DL optimizations were not considered much earlier.