Stacked Shipping Containers: A Simplistic Analogy for Objects in Memory
Picture Credit: Teng Yuhong (From Unsplash at

It’s almost 2021. Memory is inexpensive and it’s easy to access cloud platforms like Amazon Web Services (AWS) or Google Cloud Platform (GCP) and throw vast amount of resources at a data problem. And so, we usually don’t worry about memory (RAM) these days. But there are at least two problems with this line of thinking:

i) if we can use our resource efficiently, we can do more with the same amount of resources (i.e. save money!); and,
ii) “data has mass” in the sense that the rate at which large volume of data moves is slower than smaller volumes…

Using the popular Penguins dataset to demonstrate usage of Scikit-learn pipeline and GridSearchCV.
Picture Credit: Alin-Andersen (

Building even the most basic machine learning model involves several steps. Features have to be selected, data needs to be standardized, the type of estimator to be used has to be determined and then fitted to training data. Once we have a working model, the next step is finding and optimizing parameters.

Models involve two types of parameters:
i) model parameters: configuration variables internal to the model and which can be estimated from data; and,
ii) model hyperparameters: configuration variables external to the model and cannot be estimated from data.[1]
Throughout the model building process there are many steps where…

Apratim Biswas

I’m a data scientist. I love working with data, seeking patterns, building models and translating findings to compelling stories.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store