A successful data scientist does two things that AI cant be trained to do: listening and selling.Although I do not strictly work with Ai, the idea that data scientists work with Ai is widespread enough that its useful to keep half an eye on what it is, if only so that you can explain to people why that isnt what you do.Computers have been better at routine calculations since 1944, but that hasnt lead to less work for statisticians, when they still existed, or more recently, data scientists.
Indeed, other than regression, the commonly used machine learning algorithms were explicitly invented to take advantage of computers being better at routine calculations than humans, beginning with tehIf not routine statistical calculations, though what on earth do data scientists do? My contention is that data scientists are spending only a relatively small amount of time on calculations and coding, or, at least, that the time that is spent in this arena is not the where the most value created by a data scientist comes from.Dont mislead into thinking that the reason that applying algorithms is only a small proportion of a data scientists time is because 80% is taken up by data preparation and this will one day be done via AI.
Data manipulation only takes up the lions share of time within projects that come pre-packaged with clear objectives. Once the data scientist becomes involved as they should be with defining the goal of the project, or the people start to use the model and begin to have questions, both data preparation and modeling are crowded out by other activities.Instead of modeling and data manipulation, the value is created in translating a clients needs or a customers needs (these words have slightly different meanings but the difference isnt important here) into some kind of model, and implementing that in a useful way.
And usefulness is in the eye of the user, so that means you need to understand your customers needs twice over. Once, to figure what you should model, and, secondly, when youhave the model, you need to understand how the customer is going to use it.The crucial thing here is that the person who talks to the customer has to know what is possible, and has to understand whats involved to implement.
This really means that the best person to talk to the customer is always the data scientist not a BA or a product owner or a salesperson. Worse than the problem of not being able tell the customer about what data science can do, the business representative will lose the important parts of the customers problem that you need to produce the correct work for the customer. There are two obvious reasons transmitting messages from human to human is a sure fire to lose information and introduce noise.
The second is that non data scientists are poorly placed to understand which parts are important and which parts arentNow we see as through a glass, darklyThe difficult part of this task is understanding what your customers want. Data scientists ought to be familiar with information theory, which aims to quantify which parts of a message are essential for the message to get through, and thus the bed rock of things like lossless compression.Recently at a data science meetup, I saw part of the concept of noisy transmission illustrated by a game where people tried to transmit a dance move along a line of people by having each person perform it to the one standing behind them.
Predictably by the end of the line, the moves were completely garbled. St Paul was an optimist. Even when you are standing face to face, understanding the message another person is trying to give you is very difficult, unless they understand it well themselves and are an excellent communicator.
Your client will typically not be a data scientist, so they typically wont understand what to tell you.Understanding what the customer really wants and needs is therefore simultaneously the most difficult and most important skill for a data scientist to master. However, as communication is a two way street, the related skill of explaining back to the user what the model actually does, how it relates to the users problem and the best way to use it within their business context are nearly as important and just as difficult.
This phase of persuading people that usin g the model is worthwhile is as much as sales task as the initial phase of getting the opportunity to do the work. This is an area that AI hasnt yet made head roads AI can sometimes provide an answer but it cant persuade people to trust it. In some ways IBMs Watson is an example of this disconnect, where the performance of the product seems at least adequate, but the intended users doctors- arent seein a place for it in their workflow.
The problem here can be summed up by a quote from Martin Kohn Merely proving that you have powerful technology is not sufficient. Prove to me that it will actually do something useful that it will make my life better, and my patients lives better. In fact, to prove to someone that your technology improves their life requires you to explain how they can do it themselves- you need to explain how they can alter the way they work so that your technology actually makes them more productive.
This part is the slipperiest, but potentially most important part, at least in that it is the part that has the greatest potential for going wrong, and deserves the most attention as a result. However, so far there is no AI attention to this topic, and not a whole lot of attention given in the kinds of books and training materials most often created for data scientists. Here is a plausible explanation for the misunderstanding.
Most training materials for data scientists focus on the algorithms and data wrangling. Few focus on the soft skills. There could be a few reasons for this.
Possibly authors think there is no demand for those skills though I dont believe that to be true myself. However, a reason they might think that there is no demand is that they think that those skills are obvious because they are intrinsically human.Again, I disagree.
I think that even those the skills in question are only the purview of humans, and that some human who may not be data scientists find them easy to master, that doesnt mean that they are obvious or that they are instantly learned by every human. It just means that humans can learn them, with plausible amount of effort, while they are out of the reach of AI research. Therefore, any data scientist (at least competent on the technical side) who takes the time to properly develop the right soft skills is in a position where they cant be replaced by artificial intelligence they are also in a position where they are very likely ahead of other data scientists.
Mastering these skills is both the best protection against job loss to AI, and the best differentiator compared to other humans.Robert de Graaf is the author of Managing Your Data Science Projects, due for publication by Apress Publishing mid July 2019.