FP8 is ~100 tflops faster when the kernel name has "cutlass" in it

FP8 is ~100 tflops faster when the kernel name has "cutlass" in it
#JackDongarra Makes a Stand for Traditional #HPC: "US still doesn’t have a clear, long-term plan for what comes next.... U.S. risks falling behind."
Challenges to high-performance computing threaten #US #innovation
The #AI boom has led chip makers to focus on #FP16 and #FP8, not the #FP64 used by scientific research. If chip companies stop making the parts that #scientists need, then it could become harder to do important research.
https://theconversation.com/challenges-to-high-performance-computing-threaten-us-innovation-255188
DeepSeek Open Sources DeepGEMM: Clean and efficient FP8 GEMM kernels — https://github.com/deepseek-ai/DeepGEMM
#HackerNews #DeepSeek #DeepGEMM #FP8 #AI #Kernels #OpenSource
Introducing Phind-405B and faster, high quality #AI answers for everyone
Phind-405B: New flagship #llm, based on Meta Llama 3.1 405B, designed for programming & technical tasks. #Phind405B
128K tokens, 32K context window at launch, 92% on HumanEval, great for web app design. #Programming #AIModel
Trained on 256 H100 GPUs with FP8 mixed precision, 40% memory reduction. #DeepSpeed #FP8
Phind Instant Model: Super fast, 350 tokens/sec, based on Meta Llama 3.1 8B. #PhindInstant
Runs on NVIDIA TensorRT-LLM with flash decoding, fused CUDA kernels. #NVIDIA #GPUs
Faster Search: Prefetches results, saves up to 800ms latency, better embeddings. #FastSearch
Goal: Help developers experiment faster, new features coming soon! #DevTools #Innovation
https://www.phind.com/blog/introducing-phind-405b-and-better-faster-searches
Glad to be on here! My #introduction:
I'm an AI researcher in the UK, working at Graphcore - a semiconductor company who develop the #IPU (a #GPU alternative) I joined last year, having previously been at Oxford for my MSc.
My interests are in #numerics (especially #fp8 ), #LLMs, mixture-of-expert models, and anything to do with #solitaire
Thanks to @thegradient for making this happen