AI is begging for better DATA

AI is the shiny new toy that differentiates people in the market right now. Everyone pretends they have AI today. In 5 years every company will, now what? What will we celebrate? Our data and what we know about it.

Ben Taylor
Predict

--

Big Data Hype: Rise And Fall

I was on stage in Orlando 6 years ago talking to an HR crowd about big data. Everything was big data this, big data that, got Hadoop yet?! Well… you need it?! Maybe? While speaking I realized nobody in the HR space had a big data problem. Over the next couple of years, the promise of a big data lake turned into the reality of a big data swamp. Were you duped by the big data promise?

AI Hype: Fail, Fail, Fail, Win, Fail

Now that we are climbing the hype cycle again with AI, executives and SVPs are still remembering the IT directors and geeks crying wolf about the need for big data. Despite the hardware manufacturers wanting everyone to open up their checkbooks and buy massive amounts of GPU compute, it just isn’t happening yet. Most companies are advertising AI capabilities, most companies are failing to find value, and a few of them are first to market and transforming their industries. For every 9 failures in the market around AI, I would say there is one massive success. Sounds like VC backed startup odds.

Get The Patent Attorneys In Here, We Are Talking AI!

Companies are funny. When they decide to take the first step into the AI realm. They think what they are doing is special, they seem themselves as pioneers. Often, we will see patent attorneys get involved early on to discuss this unique opportunity with our potential customers. Something that surprises people, is when I tell them for my machine learning patents, when I get to the algorithm piece I tend to do:

f(x)=y

Where f is an “unknowable” function, where x is an input of “anything”, and y output is the “universe of possibilities”. Then I have a large section of text giving examples of all known options and potential future options. The reason you do this is you can’t risk having NAS-NET-1000-C4.5XXL model come out to get around your use case. So, this brings me to my point, it was never about the model, it was always about the data and the use case.

Data Is Everything, Make Sure It Counts

Sure, being early to market is good, especially with automation. Eventually, soon, I’m guessing in the next 5–10 years everyone will have AI for the products and businesses in some form. It will go from being mystical and special to common (think SQL), so why would you flaunt it? Why would you plaster it all over your booth? When we get there what will make the winners win and the losers lose? It will be data.

The companies that have large datasets, high-quality labels, and more importantly the iteration counts under their belts will win. The companies that have to re-invent the wheel with every new idea and struggle with their data interrogation cadence, they will fail. HireVue, my favorite employer I have had in my career had 10M video job interviews from the fortune 1000. 10M! They also started researching them 4 years ago. See the competitive moat? Even if I use a team of 20 Stanford PhDs I don’t have 10M video interviews in my pocket.

Data Privacy Kills

When I think of medical data I think of “Cancer face”, that reminds me of TaserFace (Cancer face: I can tell you have cancer by looking at your face, you don’t know it yet).

This last point is very controversial. Which is good, I like controversy because I enjoy the conversation and thoughts. Today people are needlessly dying thanks to what I see as excessive data privacy around health care in the US from #HIPAA. The time it takes to catch cancer, acromegaly, CKD, and a variety of fatal or live changes diseases is much longer/clumsier than it has to be. I imagine a future where your mirror, shower, and toilet at home will know more about your health than any doctor on the planet could today. Great, but how do we get there. We need the data, we need the training. The counterpoint here is “don’t give up your privacy, no matter the cost”, which isn’t an invalid point either. The issue that is pushed by the other side of this argument is all of the bad you can do with this data. Health/life insurance, employment, advertising, or government abuse could all happen without checks and balances. A penny for your thoughts/comments.

--

--

Ben Taylor
Predict

Ben is a cofounder at Zeff.ai, delivering automated deep learning into production. Ben is a recognized deep-learning expert and keynote speaker.