Some roadblocks to the broad adoption of machine learning and AI

I read two blog posts on AI over the Thanksgiving break. One was a nice post discussing the challenges for AI in medicine by Luke Oakden-Rayder and the other was about the need for increased focus on basic research in AI motivated by AlphaGo by Tim Harford.

I’ve had a lot of interactions with people lately who want to take advantage of machine learning/AI in their research or business. Despite the excitement around AI and the exciting results we see from sophisticated research teams almost daily - the actual extent and application of AI is much smaller. In fact, most AI usually ends up being humans in the end.

While the promise of AI/ML has never been clearer, there are still only a handful of organizations that are using the technology in a major way. Sometimes even apparent success stories turn out to be problematic.

I was thinking about the lifecycle of developing an AI application. I have defined this type of application previously as having three parts: (i) an interface to humans, (ii) a data set, and (iii) an algorithm for turning the data into interactions. I started thinking about the extension of this idea to the development of an AI application and all the steps involved. Then I started thinking about potential barriers.

To develop an AI application you need a few things:

  1. A group of people who are willing to let you have their data.
  2. A technology for data capture from people (this could be as simple as a website, or an Echo, or as complex as a robot).
  3. A data storage mechanism for collecting the raw data from this input (this could just be a database)
  4. A set of algorithms and scripts for organizing the data for analysis.
  5. A definition of the problem you’d like to solve in quantitative terms - usually generated through exploratory analysis.
  6. An algorithm trained on a massive data set or at minimum trained with a good prior or expert knowledge.
  7. A way to structure the responses and provide feedback either to the original users of your application or to other users (researchers or executives at a company for example).
  8. The pipeline to take those formatted responses and return them to the user in a way that they can take advantage of.

I think that a lot of attention is focused on step 6 and how costly talent is for designing AI algorithms. I think for the big players where a lot of the other steps have been solved this is for sure the limiting factor and it is no wonder that the talent war is fierce.

But I think that for 95% of organizations - whether they be researchers, businesses, or individuals the problem isn’t in developing the algorithm. A random forest can be fit with one line of R code and while it won’t be as accurate as an expertly trained neural network on a gigantic training data set, it will be really useful.

So I think that most of the roadblocks to the democratization of AI are actually in the other steps and in particular the “glue” between the steps. For example:

I think a lot of these barriers come down to the fact that for the most part we don’t have strict standards for data capture/tidying/organization/use that are used across organizations. We also don’t have the “glue” steps between each of these components automated. So while I think that the algorithms for AI will continue to develop rapidly in accuracy and range, for organizations to keep up they will need a lot more than just a way to fit the latest model. The reason that I think some organizations are leaping so far ahead is that they already have spent a huge amount of time thinking about all the Steps but the model fitting, so now they can focus their time/energy/resources on making algorithms do things we didn’t imagine were possible.