
Paul Chan
Hi, I'm Paul. I like high-performance computing and machine learning, mostly the interplay between how machine learning and hardware architectures influence each other.
The New Frontier of GPU Performance: From Memory Bound to Communication Bound
For decades, GPU performance optimization has been dominated by the memory wall problem. As we scale to multi-GPU and multi-node systems, a fundamental shift is occurring: the bottleneck is moving from memory bandwidth to inter-GPU communication.
The Demise of CUDA has been Greatly Exaggerated
Endless twitter threads, articles, and podcasts frequently declare the end of CUDA and NVIDIA’s dominance. The arguments typically hinge on three main claims: the rise of ASICs will render GPUs obsolete, a new software ecosystem will erode the CUDA MOAT, and that LLM based agents will make knowledge of CUDA and low-level implementations irrelevant. Yet, closer examination reveals that these predictions fail to capture the nuance and ongoing innovation within NVIDIA’s ecosystem.
Ramp's Marketing Strategy
How Ramp transformed B2B SaaS marketing by focusing on efficiency over rewards, and executing viral campaigns, and building owned media channels.