The Gradient @thegradient

Recent searches

Search options

Only available when logged in.

**Leshem Choshen** @LChoshen · Jan 14

𝐙𝐢𝐩𝐍𝐍 v0.5.0 introduces compression with CPU multithreading!
Keep models 𝐜𝐨𝐦𝐩𝐫𝐞𝐬𝐬𝐞𝐝 𝐚𝐥𝐥 𝐭𝐡𝐞 𝐰𝐚𝐲 𝐭𝐨 𝐭𝐡𝐞 𝐆𝐏𝐔
saving both 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫 𝐭𝐢𝐦𝐞 and 𝐬𝐭𝐨𝐫𝐚𝐠𝐞 𝐬𝐩𝐚𝐜𝐞
From
@huggingface
to storage
From storage to GPU

Git: github.com/zipnn/zipnn
Next, GPU!

Leshem Choshen @LChoshen@sigmoid.social

And compression is now super fast!
Performance on Mac M1:
𝐂𝐨𝐦𝐩𝐫𝐞𝐬𝐬𝐢𝐨𝐧: 7 GB/s
𝐃𝐞𝐜𝐨𝐦𝐩𝐫𝐞𝐬𝐬𝐢𝐨𝐧: 8 GB/s
Wait till multithreading happens on GPU and you only decompress on demand

#compression

#llms

#GPUComputing

#ai

𝐏𝐚𝐩𝐞𝐫: alphaxiv.org/abs/2411.05239

Jan 14, 2025, 02:50 PM··Web

0boosts·1favorite

**David Ruffner** @davidruffner@raphus.social · Jan 14

Jan 14

David Ruffner @davidruffner@raphus.social

@LChoshen Cool! I haven't read the whole paper yet, but I thought the source of the compressibility was interesting. Here is a quote from the paper:

"We identify the source of model compressibility as the floating point range that actually exists in models. Specifically,
we find that the exponent component in a floating point parameter is highly skewed and therefore very compressible."

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back