The Neural Networks of Pez.AI

Hello Daisy, I’m crazy for you

Fears of rogue AI are once again center stage, thanks to Microsoft’s infamous, and now ignominious AI. If HAL ever wanted a robot partner in crime, her name is Tay. So you might be wondering if Pez.AI could ever go rogue? After all, the promise of Pez.AI is to respond appropriately to user and customer requests, so a Tay-style melt down is thoroughly unacceptable.

They all appear harmless at first

Rest assured Pez.AI will not disappoint and cannot be manipulated like Tay. The reason is the design. Tay is designed to learn online, with each and every interaction. This can quickly yield unintended — and unacceptable — consequences as we saw. Pez.AI learns as well, but the learning happens offline, to avoid uncontrollable situations like this. It also gives our researchers time to evaluate, tune, and tweak models as they evolve — before being deployed to customer installations. This ensures that we control what Pez.AI learns, so that it always responds appropriately to your customer requests.

Less human, more reliable

Where is Pez.AI now? The launch for our original conversational interface to Google Analytics is imminent! Included in the initial launch are some exciting use cases that highlight the power of the complete Pez.AI computing platform. We are in the final stretch of testing, and registered beta users can expect an announcement soon. If you haven’t signed up for the beta, do so here. Once you receive the beta invitation, simply click the Add to Slack button in the email to begin the integration. Don’t worry, integration is quick and painless. You only need to authorize Pez.AI to your Slack team and link your Google Analytics account. From there, you’re off to the races and the envy of all your friends.

What else is going on with Pez.AI? We started off with a simple idea to provide a conversational interface to analytics. The demand for a conversational AI has exceeded our expectations, and we are working with a number of businesses spanning messaging, customer service, insurance, and financial services to automate customer service with Pez.AI. With an analytics backbone, Pez.AI presents numerous opportunities for sophisticated inline inference and prediction while interacting naturally with users. Expect to hear a few exciting announcements in the coming months.

Brian Lee Yung Rowe is Founder and Chief Pez Head of Pez.AI // Zato Novo, a conversational AI platform for guided data analysis and automated customer service. Learn more at Pez.AI.

Is deep learning a Markov chain in disguise?

Andrej Karpathy’s post “The Unreasonable Effectiveness of Recurrent Neural Networks” made splashes last year. The basic premise is that you can create a recurrent neural network to learn language features character-by-character. But is the resultant model any different from a Markov chain built for the same purpose? I implemented a character-by-character Markov chain in R to find out.

Source: @shakespeare

First, let’s play a variation of the Imitation Game with generated text from Karpathy’s tinyshakespeare dataset. Which snippets are from the RNN and which are from the Markov chain? Note that Karpathy’s examples are from the complete works, whereas my Markov chain is from tinyshakespeare (about 1/4 the size) because I’m lazy.

If you can’t tell, don’t be hard on yourself. The humble Markov chain appears to be just as effective as the state-of-the-art RNN at learning to spell (olde) English words. How can this be? Let’s think about how each of these systems work. Both are taking a sequence of characters and attempting to “predict” the next character in the sequence. The RNN does this by adjusting weight vectors to get an output vector that fits the specified response. The hidden layer maintains state over the training set. In the end, there is a confidence value attributed to each possible output character, which is used to predict the next character.

Source: Andrej Karpathy

On the other hand, training a Markov chain simply constructs a probability mass function incrementally across the possible next states. What this means is that the resulting pmf is not so different from the RNN output of confidences. Here’s an example of the pmf associated with the string ‘walk ‘:

This tells us that 40% of the time, the letter ‘a’ follows the sequence ‘walk ‘. When producing text, we can either treat this as the predicted value, or use the pmf to dictate the sampling. I chose the latter since it’s more interesting.

But how is state captured in the Markov chain since by definition a Markov chain is stateless? Simple: we use a character sequence as the input instead of a single character. For this post, I used a sequence of length 5, so the Markov chain is picking a next state based on the previous five states. Is this cheating or is this what the RNN is doing with hidden layers?

While the mechanics of RNNs differ significantly from Markov chains, the underlying concepts are remarkably similar. RNNs and deep learning might be the cool kids on the block, but don’t overlook what’s simple. You can get a lot of mileage from simple models, which have generally stood the test of time, are well understood, and easy to explain.

NB: I didn’t use a package to train and run the Markov chain, since it’s less than 20 LOC overall. A version of this code will appear in a forthcoming chapter of my book.

Brian Lee Yung Rowe is Founder and Chief Pez Head of Pez.AI // Zato Novo, a conversational AI platform for guided data analysis and automated customer service. Learn more at Pez.AI.

Making AI More Human: How To Give A Sales Pitch As A Technical Talk

I had the pleasure of seeing Gary Marcus give his talk “Making AI More Human” at the NYC Machine Learning meetup last night. For those unaware, Marcus is a Professor of Cognitive Psychology at NYU and recently founded an AI startup, Geometric Intelligence, based on his research on how children learn. It was an entertaining talk, and I agreed with his assessments on deep learning and AI in general. His approach to solving aspects of learning in AI overlap with my own AI research for Pez.AI. Of course, I’m speculating on the specifics, since he didn’t provide any details. High-level it appears inspiration comes from childhood development, Bayesian reasoning, and probably some symbolic reasoning to boot.

Anyway, what’s the point of this post? Many in the audience were unhappy about this talk because it was mostly rehashing old arguments and offered near zero information on his research. For a technical talk Marcus failed miserably. However, as he said at the end, his goal was to recruit. And with this aim, the talk should be treated as a pitch. From this perspective he did rather well. If you are a pre-product startup or have a highly technical product, you can learn a lot from his approach.

Most startup playbooks say that investors focus on three things: market/problem, idea/solution, and team. From a startup maturity perspective, Geometric Intelligence is pre-product and pre-revenue. If you don’t have a demo to show, then the emphasis needs to be on selling the story. Let’s see how Marcus did that.

The Problem

Marcus has spent years honing his problem statement around AI. He’s published numerous articles, both academic and general on the subject. A good 90% of the talk was selling the problem. In a nutshell, current advances in AI are limited to what’s known as Narrow or Weak AI: domain-specific problems whose solutions are not readily generalizable. For example, DeepMind’s Alpha Go machine can’t play chess. Of course one could argue that most human go players can’t play chess either and would have to go through a similarly  long training process (okay not millions of games). That said, deep learning has numerous well-known limitations so the argument is not without merit.

Beware the sky that falls with robots

Marcus also presented an entertaining montage (with Charlie Chaplin-esque music no less) of anthropomorphic robots falling over. In short he was effective in bursting the AI bubble. I would consider this so effective that many probably missed the sleight of hand in the presentation: Marcus isn’t building robots and therefore isn’t fully addressing the Strong AI problem he meticulously presents.

The Solution

What is the solution that Marcus presents? Since there was no demo nor description of actual AI models, Marcus used his 2 year old son as a proxy for the solution. This is clever. Like sex and cute animals, babies always sell. The essential idea is that by mimicking childhood development, you can create an AI system that learns and is more adaptive on a smaller, “sparse” dataset. All good, right? Except that most of AI research is bio-inspired, from neural networks, to genetic algorithms, to swarm intelligence. Where techniques are not bio-inspired, they are still inspired by some aspect of nature, like simulated annealing.

A mature wetware computer interacting with the next generation model

Marcus suggested that their approach is based on probabilistic reasoning. This is reasonable on its own, but there is a fair amount of literature showing that humans are innately bad at probability. He gets around this by saying that we should only mimic/model the useful parts of humans. This doesn’t sound so different from the various approaches of other approaches that layer on statistical methods to improve models.

So what makes this approach better than all the others?


The team is what investors say is the most important of the three factors. The reasoning is that it takes a while to find product-market fit so the initial problem and solution is likely impermanent i.e. wrong. The team is responsible for both finding the correct product-market fit and also executing. The team thus trumps the market and idea since ostensibly they are permanent fixtures of the business. Both Marcus and his co-founder, Zoubin Ghahramani, are academics so they are unproven as entrepreneurs. So what do you do to counter this risk? First you casually mention how smart you are (PhD at 23) and then downplay it by calling yourself a slacker since your co-founder was recently inducted into the Royal Society. This establishes your credibility so that when you say being an academic is like being an entrepreneur everyone believes you.

Social Proof

At this point it’s time to deliver the coup-de-grace: social proof. This is a silly invention by otherwise smart, socially awkward people that popularity is a good indicator of success. Others might call this herd mentality and also recognize that entrepreneurs are mavericks going against the grain of convention. So by the time there’s enough social proof you’ve probably already missed the boat. Yet, this is an important “metric” for many investors, potential employees, and sometimes even potential customers and cannot be ignored. Marcus leverages this well by saying that they have a  investments from a number of prominent CEOs. But what do they know about AI? Are they a good proxy for due diligence or not?


At the end of the day it’s unclear what exactly has developed. What is clear is that good sales pitches can be passed off as technical talks. The real takeaway, to borrow for Peter Norvig is that Marcus has demonstrated the unreasonable effectiveness of good story telling.

Brian Lee Yung Rowe is Founder and Chief Pez Head of Pez.AI // Zato Novo, a conversational AI platform for guided data analysis and automated customer service. Learn more at Pez.AI.

7 Ways to Perplex a Data Scientist

On the heels of a report showing the inefficacy of government-run cyber security, it’s imperative to understand the limitations of your system and model. As that article shows, in addition to bureaucratic risk the government also needs to worry about gaming-the-bureaucracy risk! Government snafus aside, data science has enjoyed considerable success in the past few years. Despite this success, models can fail in surprising ways. Last year we saw how deep neural nets for image recognition fail on noisy data.

As these examples show, a lot can be learned by breaking models. Model builders of all stripes must consider the limitations of their models and should be a requisite step in the validation stage. As a fun exercise, below I present some ways to confuse models at popular web destinations. Can you figure out how a model will fail based on this behavior?

Product Recommendations


Netflix is known for using collaborative filtering but also matrix factorization like SVD.


  1. Choose a genre (e.g. Movies With A Strong Female Lead)
  2. For each movie, alternate ranking between 1 and 5 stars


Amazon is known for using user-based collaborative filtering.

Make a separate purchase for each item in a list. For each item do the following:

  1. Choose a dimension or combination of dimensions e.g. gender, age, department
  2. Browse related (i.e. similar) items in the given dimension
  3. Now browse related items in the opposite direction of dimension (or something unrelated)
  4. Add actual item to purchase to cart
  5. Checkout

Example: Choose baby car seat. View n car seats plus m related items (e.g. strollers). Now view a bunch of scooters for old people, such as the Pride 3 Wheel Celebrity X Scooter. Now add your purchase item and checkout.

Alternative: If you have disposable income, actually buy the car seat and scooter and donate them to a charity afterward.

Social Media


The Facebook News Feed is notorious for changing regularly and being somewhat opaque to outsiders, here is a narrative description of how it “works”. The short version is that there are various scoring models combined with various rules to deal with outliers.


  1. Choose a set of dimensions (e.g. day of week, time of day, media type)
  2. Choose a behavior (e.g. like, hide, scroll past, stay for long time, comment)
  3. For given set of dimensions, perform same behavior over a fixed period of time (e.g. 15 minutes)
  4. Repeat

Example: Choose Monday + 9 AM as dimensions. Choose “stay for long time + hide” as behavior. Do this for each item in news feed for 30 minutes. Repeat following week.

Bonus: Recruit your friends to follow the same algorithm, ideally in same geographic region.


One curious feature of LinkedIn is automated skill endorsement recommendations. It’s often that I get endorsed for random things unrelated to what I do. Presumably this works on some sort of frequent itemset based on graph distances.


  1. Choose a network of related people
  2. Choose an unrelated skill
  3. Endorse all people in network with same “skill”

Example: For me, I might choose all my financial quant friends and endorse them with the skill “arm wrestling”.

Alternative: Use a brand slogan as the skill e.g. “Think Different” This can be awkward, so try changing initial verb to a present participle e.g. “Thinking Different”.

Bonus: Use a brand slogan with a double entendre e.g. “Doubling Your Pleasure”.

Marketing and Advertising

Google Analytics

While there aren’t any models embedded within GA, many many models are used to analyze web behavior based on the tracking codes attached to a URL.


  1. Choose a URL to link to
  2. Choose a unique identifier
  3. Replace tracking code with custom identifier
  4. Get people to click link

Example: In this post, the links to and to use a custom tracking code of, linking to my AI SaaS service.


To explore the effects of different behaviors on these sites, these R packages can help you construct recommendation models: recommenderLab, arules, rCUR.

This is a small sampling of how to identify flaws in models. Add your own ideas on how to break models in the comments!

Brian Lee Yung Rowe is Founder and Chief Pez Head of Pez.AI // Zato Novo, a conversational AI platform for guided data analysis and automated customer service. Learn more at Pez.AI.