The Gradient @thegradient

**Hacker News** @h4ckernews@mastodon.social · 15h

Hacker News @h4ckernews@mastodon.social

FP8 is ~100 tflops faster when the kernel name has "cutlass" in it

https://twitter.com/cis_female/status/1943069934332055912

X (formerly Twitter)sophia (@cis_female) on X> fp8 is 100 tflops faster when the kernel name has "cutlass" in it kms https://t.co/KpZjwSAkrM

#HackerNews #FP8 #tflops

**Benjamin Carr, Ph.D.** @BenjaminHCCarr@hachyderm.io · May 21

May 21

Benjamin Carr, Ph.D. @BenjaminHCCarr@hachyderm.io

#JackDongarra Makes a Stand for Traditional #HPC: "US still doesn’t have a clear, long-term plan for what comes next.... U.S. risks falling behind."

Challenges to high-performance computing threaten #US #innovation

The #AI boom has led chip makers to focus on #FP16 and #FP8, not the #FP64 used by scientific research. If chip companies stop making the parts that #scientists need, then it could become harder to do important research.
https://theconversation.com/challenges-to-high-performance-computing-threaten-us-innovation-255188

The ConversationChallenges to high-performance computing threaten US innovationToday’s supercomputers are enormously powerful, but the work they do − running AI and tackling difficult science − is pushing them to their limits. Building bigger supercomputers won’t be easy.

**Hacker News** @h4ckernews@mastodon.social · Feb 26

Feb 26

Hacker News @h4ckernews@mastodon.social

DeepSeek Open Sources DeepGEMM: Clean and efficient FP8 GEMM kernels — https://github.com/deepseek-ai/DeepGEMM
#HackerNews #DeepSeek #DeepGEMM #FP8 #AI #Kernels #OpenSource

GitHubGitHub - deepseek-ai/DeepGEMM: DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scalingDeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling - deepseek-ai/DeepGEMM

**michabbb** @michabbb@vivaldi.net · Sep 12, 2024

Sep 12, 2024

michabbb @michabbb@vivaldi.net

Introducing Phind-405B and faster, high quality #AI answers for everyone

Phind-405B: New flagship #llm, based on Meta Llama 3.1 405B, designed for programming & technical tasks. #Phind405B

128K tokens, 32K context window at launch, 92% on HumanEval, great for web app design. #Programming #AIModel

Trained on 256 H100 GPUs with FP8 mixed precision, 40% memory reduction. #DeepSpeed #FP8

Phind Instant Model: Super fast, 350 tokens/sec, based on Meta Llama 3.1 8B. #PhindInstant

Runs on NVIDIA TensorRT-LLM with flash decoding, fused CUDA kernels. #NVIDIA #GPUs

Faster Search: Prefetches results, saves up to 800ms latency, better embeddings. #FastSearch

Goal: Help developers experiment faster, new features coming soon! #DevTools #Innovation

https://www.phind.com/blog/introducing-phind-405b-and-better-faster-searches

**Charlie Blake** @thecharlieblake · Nov 6, 2022

Nov 6, 2022

Charlie Blake @thecharlieblake

Glad to be on here! My #introduction:

I'm an AI researcher in the UK, working at Graphcore - a semiconductor company who develop the #IPU (a #GPU alternative) I joined last year, having previously been at Oxford for my MSc.

My interests are in #numerics (especially #fp8 ), #LLMs, mixture-of-expert models, and anything to do with #solitaire

Thanks to @thegradient for making this happen

Recent searches

Search options

Administered by:

Server stats:

#fp8