Stacked Shipping Containers: A Simplistic Analogy for Objects in Memory
It’s almost 2021. Memory is inexpensive and it’s easy to access cloud platforms like Amazon Web Services (AWS) or Google Cloud Platform (GCP) and throw vast amount of resources at a data problem. And so, we usually don’t worry about memory (RAM) these days. But there are at least two problems with this line of thinking:

i) if we can use our resource efficiently, we can do more with the same amount of resources (i.e. save money!); and,
ii) “data has mass” in the sense that the rate at which large volume of data moves is slower than smaller volumes…

Using the popular Penguins dataset to demonstrate usage of Scikit-learn pipeline and GridSearchCV.
Building even the most basic machine learning model involves several steps. Features have to be selected, data needs to be standardized, the type of estimator to be used has to be determined and then fitted to training data. Once we have a working model, the next step is finding and optimizing parameters.

Models involve two types of parameters:
i) model parameters: configuration variables internal to the model and which can be estimated from data; and,
ii) model hyperparameters: configuration variables external to the model and cannot be estimated from data.[1]
Throughout the model building process there are many steps where…

