𝐙𝐢𝐩𝐍𝐍 v0.5.0 introduces compression with CPU multithreading!
Keep models 𝐜𝐨𝐦𝐩𝐫𝐞𝐬𝐬𝐞𝐝 𝐚𝐥𝐥 𝐭𝐡𝐞 𝐰𝐚𝐲 𝐭𝐨 𝐭𝐡𝐞 𝐆𝐏𝐔
saving both 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫 𝐭𝐢𝐦𝐞 and 𝐬𝐭𝐨𝐫𝐚𝐠𝐞 𝐬𝐩𝐚𝐜𝐞From
@huggingface
to storageFrom storage to GPU
Git: github.com/zipnn/zipnn
Next, GPU!
And compression is now super fast!Performance on Mac M1:
𝐂𝐨𝐦𝐩𝐫𝐞𝐬𝐬𝐢𝐨𝐧: 7 GB/s
𝐃𝐞𝐜𝐨𝐦𝐩𝐫𝐞𝐬𝐬𝐢𝐨𝐧: 8 GB/s
Wait till multithreading happens on GPU and you only decompress on demand
#compression
#llms
#GPUComputing
#ai
𝐏𝐚𝐩𝐞𝐫: alphaxiv.org/abs/2411.05239
@LChoshen Cool! I haven't read the whole paper yet, but I thought the source of the compressibility was interesting. Here is a quote from the paper:
"We identify the source of model compressibility as the floating point range that actually exists in models. Specifically,
we find that the exponent component in a floating point parameter is highly skewed and therefore very compressible."