sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

598
active users

#pca

0 posts0 participants0 posts today

#programming #engineering #statistics #PCA in #commonLisp #lisp #blog #easy #reference screwlisp.small-web.org/progra

While I am just feeding no-other-obvious-source lisp pieces of my brain to my young kitten, here is principle component analysis in common lisp using an openly available #ML package #clml from a Japanese telco changing hands 15 years ago, actively developed by a lisp lone wolf up until five years ago.

My demo is in english (#emacs #eev #eepitch) in contrast to the Japanese internals.

screwlisp.small-web.orgLisp Principal Component Analysis (PCA) (eigendecomposition)

Continuing to work through #GCP training – having gotten my #ACE, I now need to get my #PCA.

At any rate, one of the training-modules was talking about data transfer times and popped up a size/rate table to show transfer-times. It instantly transported me back to when I was doing DR-related delivery-consulting.

I still remember one customer being pissed when I'd finished setting up their storage-replication and declared, "well, I'll see you in about 21 days to verify that we're actually finished and functional". They were incredulous, sputtering about "how can you possibly say 21 days". I'd reminded them that I'd noted that their replication-bandwidth was objectively too small (while they had a T3, their data set was large and they couldn't get their networking people to remove the session-limits that were constraining the replication to a fraction of that T3's capability). We'd spent a couple days benchmarking and working with their networking people to try to address the discrepancy between their theoretical bandwidth and the observed bandwidth. In fairness, they probably would have been ill-advised to dedicate the entirety of the T3's bandwidth to replications — since they presumably had other projects that needed some of its bandwidth — but the amount they were able to allocate to the storage replication was
way too low. I'd given them an initial transfer-time estimate, the day prior, based on my benchmarking and the first couple hours of the replication's sync-up. My return the next day was mostly to confirm that things were moving along as expected — so that I didn't have to revise my estimate (fortunately, over the intervening 16 hours, my numbers had stayed dead-on).

Just to add some gas to the fire:

• I reminded them that they had had the option to speed up the initial sync by seeing it with a tape-based restore – which would have required them to ship tapes from NJ to AZ, import the tapes into their DR site's tape library system, then do a restore to the DR site's storage array
• I pointed out to them that, with such a low transfer-rate, if they ever had link-loss, it could take them hours to days to get back in sync. I further pointed out and that, worst case (i.e., on a long-enough link-outage), the replication-software might declare the sync "stale" and they'd have to re-initialize and re-do the transfer.

This was probably 2005 or 2006. So, it's not that they had a
ton of data to transfer, it's just that cross-country private circuits weren't available with nearly as high of transfer speeds and what options were available were silly-expensive.

Principal Component Analysis (PCA) is a powerful technique for reducing the dimensionality of your data while maintaining its essential structure. A key advantage of PCA is its ability to transform high-dimensional data into a lower-dimensional space, enabling new ways to visualize complex data. Two crucial plots that help visualize PCA are loading plots and biplots.

I've created a video to walk you through these visualizations: youtube.com/watch?v=f34zq2jErK

In case you want to know why I've completely given up on #Christianity (no reason that you *should*), this is a good summary:

youtu.be/isp4243WPO0?si=k5kaQH

BTW: the interviewer question at about the 12m55s mark is spot on.

I used to be a Christian, but I just can't be any more. At least not here anymore...

Continued thread

This next paper is about #stylometry in a #translation setting involving novels in #Swedish and #Danish:

Martje Wijers (2023), “Why the Daisy sisters are different. A stylometric study on the oeuvre of Swedish author Henning #Mankell and the Dutch translations of his work”, Journal of Computational Literary Studies 2 (1), 1–27. doi: doi.org/10.48694/jcls.3585

Keywords: #stylometry, #cluster analysis, #PCA, #delta, #zeta, #translation