Redressing #Bias: "Correlation Constraints for Regression Models":
Treder et al (2021) https://doi.org/10.3389/fpsyt.2021.615754
Redressing #Bias: "Correlation Constraints for Regression Models":
Treder et al (2021) https://doi.org/10.3389/fpsyt.2021.615754
"Feature importance helps in understanding which features contribute most to the prediction"
A few lines with #sklearn: https://mljourney.com/sklearn-linear-regression-feature-importance/
#Lasso #LinearRegression "is useful in some contexts due to its tendency to prefer solutions with fewer non-zero coefficients, effectively reducing the number of features upon which the given solution is dependent"
https://scikit-learn.org/stable/modules/linear_model.html#lasso
I'm playing with the California Housing dataset built into sklearn.
One census block group has an average number of bedrooms per household of 0.83 and an average number of household members of 1243.
Huh?
I just did my first project using the #mlflow library to track metrics on iterations of manual tuning of an #sklearn pipeline, it works great and gives me some idea of the search space before moving into automated hyperparameter tuning.
I am using it in a super basic way, as an alternative to creating a gazillion cells with comments tracking metrics, does anyone have any favorite features to check out for taking mlflow to the next level?
#machinelearning #python #MLOps #scikitlearn
Uhm... if I get a decision tree like the one shown in the picture, does it mean that I only need the columns shown in the tree for training and validation, right? I would only need the columns 2 and 3 (x[2], x[3]), isn't it? Or am I missing something else?
#LinearRegression #Python #Sklearn
Dive into predictive modeling with our comprehensive guide on linear regression using Python and sklearn. Learn step-by-step implementation, result interpretation, and data visualization techniques. Perfect for beginners
https://teguhteja.id/mastering-linear-regression-with-python-and-sklearn-a-step-by-step-guide/
When training a model it turns out that I get better results with a small dataset than with a bigger dataset. This is what is called overfiting, right?
#MachineLearning #Sklearn
Dear Machine Learning people: when a problem can be solved using both a regressor and a classifier, which method would you choose? Or you simply try both and then choose whatever worked better? Any rule or set of rules to try to determine which method should work better?
In my job as a data analyst, I come across many different types of problems to solve. Some are relatively easy to solve, others not so much. That was until recently, where I came across a problem I have never given much thought before. That was until now.
What is the problem? Finding multiple peaks in a dataset.
You might think, this sounds […]
https://jrashford.com/2024/03/25/finding-peaks-in-a-dataset-and-why-it-is-not-straightforward/
anyone know of a FOSS lib equiv to Python's Scikit-learn (sklearn) but in/for Go?
(and to forestall an obvious suggestion which is likely a non-starter for my needs: yes I am aware of idea of wrapping it or otherwise linking out to it from Go, that is my worst case fallback, but avoiding it. ideal is a 100% pure Go source-to-binary solution)
#Golang
#Python
#sklearn
#ScikitLearn
#ML
#stats
#statistics
#math
#FOSS
Ya está abierto el registro para nuestra reunión de febrero: Eficiencia operacional con LLMs y pipelines de scikit-learn, este mes en las oficinas de Adyen
https://www.meetup.com/pydata-madrid/events/299189759/
¡Nos vemos el jueves 22 a las 19:00! Y después, networking
@buck The feature has landed! FormaK now supports hyper-parameter selection and cross validation with a new structured state machine interface. Under the hood it’s using scikit-learn. As always, it can be built into a #Python or #Cpp model or #KalmanFilter
Discover scikit-learn 1.4 and its: 5 major features & 13 features
14 efficiency improvements & 23 enhancements
15 API changes
38 fixes
More details in the changelog: https://bit.ly/3tWlZA3
or in the release highlights: https://bit.ly/3Hsoddm
You can upgrade with pip as usual:
pip install -U scikit-learn
Or using the conda-forge builds:
conda install -c conda-forge scikit-learn
Thanks again to all the +80 contributors!
I ran a quick Gradient Boosted Trees vs Neural Nets check using scikit-learn's dev branch which makes it more convenient to work with tabular datasets with mixed numerical and categorical features data (e.g. the Adult Census dataset).
Let's start with the GBRT model. It's now possible to reproduce the SOTA number of this dataset in a few lines of code 2 s (CV included) on my laptop.
1/n
I’m starting a new feature for @formak: semi-automated hyper-parameter selection for models and Kalman Filters.
You can read the design doc for the feature here: https://github.com/buckbaskin/formak/blob/hyperparameter-selection/docs/designs/hyperparameter_selection.md
Feedback on the design is welcome here or on GitHub
scikit-learn 1.3.1 is out!
This release fixes a bunch of annoying bugs. Here is the changelog:
https://scikit-learn.org/stable/whats_new/v1.3.html#version-1-3-1
Thanks very much to all bug reporters, PR authors and reviewers and thanks in particular to @glemaitre, the release manager of 1.3.1.
I recently dived into a rabbit hole when attempting to fix the tests for #sklearn's OLS and Ridge regression solvers.
On the theoretical side, I now understand that the minimum norm solution for the centered problem without intercept is also the minimum norm solution for the original problem (with intercept). Ridge/OLS on centered X & y followed by intercept computation is the approach (hereafter name type "a") we have been using for years.
https://raw.githubusercontent.com/ogrisel/minimum-norm-ols/main/minimum-norm-ols-intercept.pdf