While knowledge of statistics and programming is a must-have for every data scientist, non-technical skills can also help you do the job.
One particularly useful attribute is being business-minded, as proven in our article “5 Most In-Demand Skills for Data Scientists.” Now, you may be asking how you can acquire such a broad skill set? And that’s why we’ve prepared a list of resources to help you on your way.
The recommendations cover everything from data science to data analysis, programming, and general business. Meaning you’ll have a better understanding of all the mechanisms to make you a more effective data scientist if you read even just a few of these books.
Ready? Let’s dive in.
Top books for data scientists
1. The Black Swan
Author: Nassim Taleb
Let’s start with one of the least obvious titles. The Black Swan is from Nassim Nicholas Taleb’s landmark Incerto series, which looks at uncertainty, probability, risk, and the decision-making processes.
A ‘black swan’ event is an occurrence with three principal characteristics:
- It’s unpredictable
- It carries a massive impact
- People often try to find an explanation that makes it appear more predictable than it was
For example, people were once convinced that all swans were white because they had never seen anything to convince them otherwise. But when they came across a black swan in Australia, it smashed their conviction.
Using this story, Taleb points out the various pitfalls in human thinking and how they affect our decision-making. The key takeaway from the book is to be conscious of uncertainty because of the ever-changing environment, especially in the IT industry.
To put this into practice: don’t be afraid to try out different strategies and models because you may just stumble across the right solution.
2. High Output Management
Author: Andrew Grove
In this business-centric book, Intel’s former chairman and CEO shares his perspective on building and running a global company. And if Grove were to break down the skill required to create and maintain a business in a single word, we think he would choose ‘management.’
That’s an appropriate skill not only for CEOs but for technical people, including data scientists, too. Grove writes about techniques for building highly productive teams, motivational methods, navigating real-life business scenarios, and a bit about revolutionising the way we work.
Here are five key takeaways:
- Everything is a process
- Meetings are a medium of work
- Manage short-term objectives based on long-term plans
- Functional teams increase leverage, mission-oriented teams increase speed
- Use performance reviews to improve performance
So — if the above captures your imagination, it’s well worth getting stuck in.
3. The Hard Thing About Hard Thing: Building a Business When There Are No Easy Answers
Author: Ben Horowitz
While many people think starting a new business is a great opportunity, very few appreciate how difficult it is to run one.
Indeed, business schools don’t cover practical wisdom for managing the most challenging problems; managers are usually left to deal with their challenges alone. And that’s why Ben Horowitz’s wrote this book.
It includes essential advice on building and running a new company, analysing the types of obstacles that confront leaders every day.
You’ll find plenty of tips on how to:
- Develop software
- Manage a business
- Sell a product
- Procure resources
- Invest wisely
- Oversee the day-to-day
If you’re looking for a resource to help you cope with difficult circumstances, here’s the book for you.
4. Obviously Awesome: How to Nail Product Positioning
Author: April Dunford
As a data scientist, you may not think of your work as a product, but it is. And you should be able to present what you do for your clients in a way that captivates the imagination. Because even if you know your product is fantastic, you still have to persuade them.
How do you do that? Follow April Dunford’s advice. Her book runs through how to successfully connect your product with your customers by showing it as a “secret sauce” and making them feel like they have to have it.
You’ll find out how to:
- Choose the best market for your goods
- Instantly connect an audience to your offering’s value
- Use three different styles of positioning to your advantage
- Leverage market trends to help buyers
5. The Mom Test
Author: Rob Fitzpatrick
The Mom Test is all about improving your communication skills. Conversations with clients rarely go as expected. Which is why the book focuses on one of the most crucial rules of communication: ask about specific past actions instead of talking in generalities.
This works the other way round, too. So if a client has suggestions or requests, remember to listen intently and make sure everything is explained and understood by both sides. The book is chock full of valuable guidelines to put into practice when you speak to your clients.
6. Introduction to Machine Learning with Python: A Guide for Data Scientists
Authors: Andreas C. Müller, Sarah Guido
Now, let’s move to more technical papers.
If you’re a data scientist with a degree of Python knowledge and want to get a fundamental insight into machine learning, this is the book for you. The authors will walk you through the practical parts of using algorithms in place of mathematical theory.
Their approach is perfect for developers wanting to learn basic machine learning concepts and understand practical use cases. Beyond Python, the book explores sci-kit-learn alongside core libraries like NumPy, SciPy, pandas, and Jupyter Notebook.
One important note: if you know about machine learning or have a decent understanding of artificial neural networks, you can skip this one.
7. Python Data Science Handbook: Essential Tools for Working with Data
Author: Jake VanderPlas
Working with data isn’t as simple as you might think. Every action must be well thought out; manipulation, transformation, cleaning, and visualisation of data types must be precise.
For many, one of the best tools for the task is Python. And the Python Data Science Handbook covers everything you’ll need to know. The book explains how to use the most well-known Python libraries, including Pandas, Numpy, Matplotlib, Scikit-Learn, and Jupyter, making for a great resource for anyone just starting out.
If we had one criticism, we’d say the only thing missing is a way to put your learnings into practice.
8. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Author: Wes McKinney
If you pick this title up, expect to learn about Python and its best-known libraries: NumPy, Pandas, and IPython, as the author walks you through manipulating, processing, cleaning, and analysing Python datasets using these tools.
The book is also full of practical case studies, making it an excellent resource for anyone new to Python or scientific computing. Once finished, you’ll quickly find solutions to all your web analytics, finance, social science, and economics headaches.
9. Data Science from Scratch
Author: Joel Grus
Here’s a title for all data scientists with basic Python, statistics, maths, and algebra knowledge (alongside a grasp of algorithms and machine learning). Once finished, expect to know all about the core libraries, frameworks, modules, and toolkits used in data science.
The book is best for intermediate programmers interested in getting started with data science and machine learning, as the author walks through all the crucial concepts, giving you the practical skills to write simple code.
That’s not to say you need prior knowledge before reading. The writing style suits every experience level, but having a level of understanding will help you take more onboard.
10. Machine Learning Yearning
Author: Andrew Ng
Andrew Ng is one of the most recognised names in machine learning. He co-founded Coursera and is an associate professor at Stanford University. And this free book teaches you how to structure an ML project, covering the entire project lifecycle, including diagnosing errors in a machine learning system and building in complex settings.
The book not only gives you knowledge; it explains how to put learnings into practice, meaning once finished, you’ll know how to:
- Plot the best path forward for your ML project
- Build software that performs better than people
- Know when and how to use end-to-end, transfer, and multi-task learning
- Diagnose errors in a machine learning system
Ng’s books are simple to understand. You won’t find any heavy math theory; just a great way to learn how to make technical decisions during any machine learning project.
11. Deep Learning with PyTorch Step-by-Step
Author: Daniel Voigt Godoy
The last title is also the most recent. The book was first published a few years ago, then updated on 23rd January 2022 to explain Deep Learning and present a structured, first-principles approach to learning PyTorch: one of the tools to code in Python.
The book has four parts:
- Fundamentals (gradient descent, training linear and logistic regressions in PyTorch)
- Computer Vision (deeper models and activation functions, convolutions, transfer learning, initialisation schemes)
- Sequences (RNN, GRU, LSTM, seq2seq models, attention, self-attention, transformers)
- Natural Language Processing (tokenisation, embeddings, contextual word embeddings, ELMo, BERT, GPT-2)
We love the book as it’s so easy to read. The author uses simple words and avoids complex mathematical formulas, making the text feel like a conversation between friends.
Is every data scientist a humanist?
As you may have noted, even if you become a technical specialist, there’s no way to avoid human interaction, especially if you work with clients.
That’s why some level of interpersonal skill is always helpful, and we hope these books will help you on that front, too. And one last thing: if you’re looking for a few free reads focused on artificial intelligence…
Our article on Free eBooks on Artificial Intelligence could well be of interest.