Ultimate AI battle - Apple vs. Google

Yesterday, Apple launched its Worldwide Developer’s Conference (WWDC) and had its public keynote address. While many new things were announced, the one thing that caught my eye was the dramatic expansion of Apple’s use of artificial intelligence (AI) tools. I talked a bit about AI with Hilary Parker on the latest Not So Standard Deviations, particularly in the context of Amazon’s Echo/Alexa, and I think it’s definitely going to be an area of intense competition between the major tech companies.

Pretty much every major tech player is involved in AI—Google, Facebook, Amazon, Apple, Microsoft—the list goes on. Recently, a some commentators have suggested that Apple in particular will never catch up with the likes of Google with respect to AI because of Apple’s strict stance on privacy and unwillingness to gather/aggregate data from all its users. However, yesterday at WWDC, Apple revealed a few clues about what it was up to in the AI world.

First, Apple mentioned deep learning more than a few times, including specifically calling out its use of LSTM in its Messages app and facial recognition in its Photos app. Previously, Apple had been rumored to be applying deep learning to its Siri assistant and its fingerprint sensor. At WWDC, Craig Federighi stressed Apple’s continued focus on privacy and how Apple does not need to develop “user profiles” server-side, but rather does most computation on-device (in this case, on the iPhone).

However, it can’t be that Apple does all its deep learning computation on the iPhone. These models tend to be enormous and take advantage of reams of data that can only be reasonablly processed server-side. Unfortunately, because most companies (Apple in particular) release few details about what they do, we may never how this works. But we can definitely speculate!

Apple vs. Google

I think the Apple/Google dichotomy provides an interesting opportunity to talk about how models can be learned using data in different ways. There are two approaches being represented here by Apple and Google:

  • Google way - Collect lots of data from users and store them on a server in the Googleplex somewhere. Then use that data to fit an enormous model that can predict when you’ve taken a picture of a cat. As users generate more data, bring that data back to the Googleplex and update/refine the model.
  • Apple way - Build a “starter model” in the Apple Mothership. As users generate data on their phones, bring the model to the phone and update the model using just their data. Bring the updated model back to the Apple Mothership and leave the user’s data on the phone.

Perhaps the easiest way to understand this difference is with the arithmetic mean, which is perhaps the simplest “model”. Suppose you have a bunch of users out there and you want to compute the average of some attribute that they have on their phones (or whatever device). The first approach would be to get all that data and compute the mean in the usual way.

Google way

Once all the data is in the Googleplex, we can just use the formula

Google mean

I’ll call this the “Google mean” because it requires that you get all the data X1 through Xn, then sum them up and divide by n. Here, each of the Xi’s represents the ith user’s data. The general principle here is to gather all the data and then estimate the model parameters server-side.

What if you didn’t want to gather everyone’s data centrally? Can you still compute the mean?

Apple way

Yes, because there’s a nice recurrence formula for the mean:

Apple mean

We can call this the “Apple mean”. With this strategy, we can send our current estimate of the mean to each user, update our estimate by taking the weighted average of the old value and the new value, and then move on to the next user. Here, you send the model parameters out to the users, update those parameters and then bring the parameters back.

Which method is better? Well, in this case, both give you the same answer. In general, for linear models (like the mean), you can usually rework the formulas to build out either “whole data” (Google) approaches or “streaming” (Apple) approaches and get pretty much the same answer. But for non-linear models, it’s not so simple and you usually cannot achieve this kind of equivalence.

Clients and Servers

Balancing how much work is done on a server and how much is done on the client is an age-old computing problem and, over time, the balance of work between client and server seems to shift back and forth like a pendulum. When I was in grad school, we had so-called “dumb terminals” that were basically a screen that you used to login to the server. Today, I use my laptop for computing/work and that’s it. But I use the cloud for many other tasks.

The Apple approach definitely requires a “fatter” client because the work of integrating current model parameters with new user data has to happen on the phone. With the Google approach, all the phone has to do is be able to collect the data and send it over the network to Google.

The Apple approach is also closely related to what my colleagues Martin Lindquist and Brian Caffo refer to as “fusion science”, whereby Big Data and “Small Data” can be fused together via models to improve inference, but without ever having to actually combine the data. In a Bayesian context, you might think of the Big Data as making up the prior distribution and the Small Data as the likelihood. The Small Data can be used to update the model parameters and produce the posterior distribution, after which the Small Data can be thrown out.

And the Winner is…

It’s not clear to me which approach is better in terms of building a better model for prediction or inference. Sadly, we may never have enough details to find out, and will only be ablle to evaluate which approach is better by the performance of the systems in the marketplace. But perhaps that’s the way things should be evaluated in this case?

Good list of good books

The MultiThreaded blog over at Stitch Fix (hat tip to Hilary Parker) has posted a really nice list of data science books (disclosure: one of my books is on the list).

We’ve queried our data science team for some of their favorite data science books. This list is by no means exhaustive, but should keep any data scientist/engineer new or old learning and entertained for many an evening.


Not So Standard Deviations Episode 17 - Diurnal High Variance

Hilary and I talk about Amazon Echo and Alexa as AI as a service, the COMPAS algorithm, criminal justice forecasts, and whether algorithms can introduce or remove bias (or both).

If you have questions you’d like us to answer, you can send them to nssdeviations @ gmail.com or tweet us at @NSSDeviations.

Subscribe to the podcast on iTunes.

Subscribe to the podcast on Google Play.

Please leave us a review on iTunes!

Support us through our Patreon page.

Show notes:

Download the audio for this episode.

Defining success - Four secrets of a successful data science experiment

Editor’s note: This post is excerpted from the book Executive Data Science: A Guide to Training and Managing the Best Data Scientists, written by myself, Brian Caffo, and Jeff Leek. This particular section was written by Brian Caffo.

Defining success is a crucial part of managing a data science experiment. Of course, success is often context specific. However, some aspects of success are general enough to merit discussion. A list of hallmarks of success includes:

  1. New knowledge is created.
  2. Decisions or policies are made based on the outcome of the experiment.
  3. A report, presentation, or app with impact is created.
  4. It is learned that the data can’t answer the question being asked of it.

Some more negative outcomes include: Decisions being made that disregard clear evidence from the data, equivocal results that do not shed light in one direction or another, uncertainty which prevents new knowledge from being created.

Let’s discuss some of the successful outcomes first.

New knowledge seems ideal in many cases (especially since we are academics), but new knowledge doesn’t necessarily mean that it’s important. If this new knowledge produces actionable decisions or policies, that’s even better. The idea of having evidence-based policy, while perhaps newer than the analogous evidence-based medicine movement that has transformed medical practice, has the potential to similarly transform public policy. Finaly, that our data science products have great (positive) impact on an audience that is much wider than a group of data scientists, is of course ideal. Creating reusable code or apps is great way to increase the impact of a project and to disseminate its findings.

The fourth point is perhaps the most controversial. I view it as a success if we can show that the data can’t answer the questions being asked. I am reminded of a friend who told a story of the company he worked at. They hired many expensive prediction consultants to help use their data to inform pricing. However, the prediction results weren’t helping. They were able to prove that the data couldn’t answer the hypothesis under study. There was too much noise and the measurements just weren’t accurately measuring what was needed. Sure, the result wasn’t optimal, as they still needed to know how to price things, but it did save money on consultants. I have since heard this story repeated nearly identically by friends in different industries.

Sometimes the biggest challenge is applying what we already know

There’s definitely a need to innovate and develop new treatments in the area of asthma, but it’s easy to underestimate the barriers to just doing what we already know, such as making sure that people are following existing, well-established guidelines on how to treat asthma. My colleague, Elizabeth Matsui, has written about the challenges in a study that we are collaborating on:

Our group is currently conducting a study that includes implementation of national guidelines-based medical care for asthma, so that one process that we’ve had to get right is to prescribe an appropriate dose of medication and get it into the family’s hands. [emphasis added]

Seems simple, right?