Preparing For Your Data Science Bootcamp

Rami S.
5 min readDec 14, 2020

A Data Science Bootcamp can be an incredibly efficient way to switch careers and land a job as a data scientist. This is why so many people have opted for this style of education. In 3–6 months, they literally have the capacity to transform you into a data scientist, no matter your previous background. On top of that, they offer almost around-the-clock assistance, increase your Data Science network, and teach you how to market yourself to potential employees. Compare that to a typical graduate school which lasts 1–3 years, costs more money, offers limited career assistance, and may not provide the practical skills that are used in industry.

So why would anyone consider graduate school for Data Science? The quickest answer is that you have the luxury of time in graduate school to really dive into Data Science topics and gain a full understanding of the principles. Bootcamps do not have that luxury. They cram 3–6 semesters worth of information into 3–6 months with the expectation that you dedicate around 50–70 hours a week. The only way to get the most out of any bootcamp is to realize that the bootcamp does NOT start on day 1; it starts at least 2–3 months before the 1st day.

Preparation Checklist

Most Important skills: (Not in any particular order)

  1. Programming (Python/R) — By far the most important skill you should learn is how to program. Bootcamps go from 0–60mph within the 1st few hours. The concepts you learn are not easy and demand a high degree of focus. You want to utilize your time in understanding the concepts and not waste your energy Google-ing how to do a for-loop or create a dictionary. (i.e. Data Types, Loops, Conditionals, Functions)
  2. Pandas — Pandas is a widely used Python library to manipulate, clean and analyze data. Most bootcamps include this in their pre-work curriculum however Pandas can be difficult to learn, so I want to emphasize the importance of being comfortable performing most common functions. The only way to learn Pandas is by doing! With any dataset practice the workflow of understanding, cleaning and manipulating the dataset until you can do so without the help of any aids. Repeat the process 5–10 times until it becomes muscle memory. Beyond that, it is crucial to be aware of more complex functionality such as reshaping, pivoting, and plotting the dataframe.
  3. Plotting- Visualizing your data is key during the Exploratory phase of your analysis. Similar to Pandas, I found Matplotlib, Seaborn, and Plotly useful but sometimes unintuitive. They each have tricky syntax and can only be mastered by doing. At the very least, you should know how to generate different types of plots with a title, x/y labels, and legend. In addition, be aware of the more advance functionality of these tools as you will need them later on.
  4. Create 2 to 3 MVP’s — An MVP is a Minimally Viable Product, which is essentially a few non-trivial visualizations and a simple model (Linear Regression) that you can glean some information from. This could be as simple as using the Titanic dataset to predict who survives. The benefit of this is it will give you a solid understanding of the workflow for creating a data science model: Defining a problem -> Extracting the key data -> Preparing the Data -> Building a model -> Evaluating the results -> and Presenting your repository and ReadMe. Important: You don’t need to optimize it or understand why a certain model is performing bad (This will be taught in your program).
  5. Clear your schedule — Make sure that your commitments during this 3–6 month stretch are as minimal as possible. To get the most out of a bootcamp, you do NOT want to have your mind wander into tasks that are not critical and can wait. For example, don’t put an offer on a home a week before the program starts! (Lol, not from experience 😝!)

Skills that you should have a good understanding of:

  1. Probability/Statistics — The strength of your statistical knowledge will enable you to understand what’s going on under the hood of the built-in Python Libraries. Without this knowledge, you will not be as effective at making your case and it will be clear to the customer that you don’t really know what your model is telling you. Statistical distributions: make sure you understand the different types of distributions, how to spot them, what they mean, and how to calculate basic parameters for each. Statistics: Understand mean, standard deviation, quartile ranges, confidence intervals, P-value, and z-score. Probability: Sets, Unions, Intersections, Conditional Probability, and Bayes Theorem. Supplement this by solving actual practice problems using Python or R.
  2. Numpy — Numpy is an extremely useful tool and can really speed up certain tasks. Luckily it’s a bit more intuitive than Pandas so just a basic understanding of how Numpy works should suffice (Perhaps a 1 hour Youtube tutorial). Your knowledge of Numpy will likely advance as you progress in your bootcamp.
  3. SQL — SQL is super useful and present in some form at most companies. Similar to Numpy, SQL is a fairly intuitive language to learn. Most bootcamps should introduce this tool however to really get the most out of it, learn some basic SQL code and syntax before-hand. In addition, familiarize yourself with how relational databases are structured.
  4. Business Sense/Communication — At the end of the day, you are only as good as what you can communicate to the end-user. The cleanliness of your code or complexity of your model doesn’t make a difference unless you understand how your analysis can benefit the customer. Understanding the industry and knowing your audience are keys to being successful. I would suggest using your MVP and create a mock presentation that you pitch to a potential customer. As a personal example: I presented a few of my projects to my girlfriend (who is an artist). Going thru the process of making a business case to a non-technical audience forced me to skip the “nerdy” details and hone in on what actually mattered.
  5. Computer Science Topics- As the programs you build become more complex, having an understanding of certain computer science topics will set you apart from the average joe. Things like BigO notation, recursion, code complexity, list comprehension, and programming language design (PEP 8). Don’t necessarily need to be an expert, but be aware of their existence and let it sit in the back of your mind as you start programming.
  6. Github and Command Line — Although most bootcamps will include this in their pre-work, I strongly suggest you don’t take these skills for granted and make sure to have a working knowledge before starting the bootcamp.

--

--

Rami S.
0 Followers

Data Scientist, Engineer, Bike Touring Enthusiast