The Neural Networks of Pez.AI

AI Social Impact #3: The Bias Edition

For this edition, we focus on how the use of AI to make decisions that involve humans chips away at the American Dream. We’ll talk about biases and how they are very much present despite the notion that artificial intelligence are incapable of prejudice.


Part of the American Dream is the belief that anyone can pursue a better life for themselves regardless of their background — a sentiment proclaimed as an inalienable right in the Declaration of Independence. And while we try to uphold this ideal (for all people), we struggle to avoid our natural tendency toward bias (favoritism or discrimination). To some, AI promises a world where decisions are made objectively, without bias. AI solutions are quickly being deployed in a wide range of social contexts, ranging from the courts, to employment, to insurance policies, and even driving.

machine learning, artificial intelligence, aii

Unfortunately, AI models are not free of bias, because bias is transferred from the data used to train them. This is how Google image classification mistook a group of black people as gorillas because the AI researchers didn’t include images of African Americans. More recently, Joy Buolamwini demonstrated that the gender of black women are identified correctly just 65% of the time versus 99% for white men. In another study, Rachael Tatman, showed how automatic speech recognition (ASR) systems don’t recognize women’s voices as well as men’s. A similar issue affects people with disabilities, arguably a population that could benefit greatly from ASR. With cars becoming voice-activated, there is a danger that only white males will be able to enjoy these luxuries due to the biases in the models.


The AI-is-objective trope is particularly dangerous, since 1) it’s easy to think there are no biases in a software-based decision making system, and 2) AI systems have access to unprecedented quantities of personal data, from gender, race, education, income, employment, politics, etc. Google famously showed different search results for black names than white names, which perpetuates racial stereotypes. Courts are already using AI to decide prison sentences, despite many flaws documented in this approach. One significant problem is that systematic bias ingrained in past sentences are transferred to new sentences. Instead of the crime driving the sentence, race drives the sentence because the data show that black people have longer prison sentences. What’s implied is that you as an individual no longer matter — instead your demographics define you.

artificial intelligence, ai, machine learning

Employment decisions can suffer from the same types of bias. The sheer amount of applications that pass through a desk is so numerous that over 45 percent of job applicants never hear or get a call from a company; for this reason, companies have tried to use AI to simplify their talent acquisition and make it easier for people to get new jobs. Of course, the data feeding these systems are susceptible to the same ingrained structural biases in other parts of society. This may help explain why some recruiters suggest inserting words like “Oxford” or “Cambridge” into a resume in invisible (white) text to game the AI.


The idea that AI can predict our potential for crime, success, and love is uncannily predicted in Philip K. Dick’s Minority Report. Before we reach that dystopian version of reality, it pays to address the bias in AI that is steadily eroding the American Dream. And it requires more commitment and action than Google, who addressed the gorilla debacle by simply removing gorilla and chimpanzee from the suggested image labels. Thankfully, many organizations, such as the Algorithmic Justice League and Data & Society, and conferences like Ethics in NLP and FAT*, are raising awareness of the ethical implications of AI.


At Pez.AI, we teach everyone how to identify bias. We also review our datasets to ensure we have acceptable levels of representation across different demographics where appropriate.


Until then,


How to avoid systematic bias in startup perks

Here at Pez.AI, we recently introduced a feature bounty program, where we pay a cash bonus for implementing a nice to have feature that isn’t prioritized. The idea is that those who are motivated and ambitious would do this in their free time outside of work. While still new, it only took a few days for the first bounty to be paid up!

This program is great for getting lower priority or non-mission critical problems solved. It also helps identify motivated employees and potential rising stars. Or does it? The problem with this program is that not everyone can take advantage of it. If we rely on the feature bounty to identify talent, we are introducing selection bias into our process. How can this be when the program is equal opportunity, for everyone in the company? The key issue is access, not opportunity.

Let’s see how this works with some examples. Suppose I offer telecommuting to my employees. Some employees work exclusively off-site, while others work on-site. Let’s pretend that the list of features in the feature bounty program are only visible on a whiteboard in the office. After some time I score my employees based on the number of features they implemented. I assume the top three employees are the most committed amd give them more opportunities for growth. What’s wrong with this picture? The program is open to all, but only on-site workers have access to the program.


In our Manila office we have a different issue. Some employees come from well-to-do families, while others are from more modest backgrounds. The upper class is like an aristocracy and these families typically have maids to do laundry and clean their house. The less fortunate have to do their own housekeeping and chores, which can take the whole day. What’s this got to do with feature bounties? Since this program is extracurricular, we make the assumption that employees have free time outside of work to work on these features. This is certainly true of employees coming from the upper classes. But what about employees coming from poorer families? While opportunity is open to all, the feature bounty program favors wealthier employees, and therefore has unequal access.

It should be easy to see how equal access becomes an issue for many groups of people, such as the disabled, the poor, single parents, women in general (in male-dominated work environments). Even if a program starts with good intentions, a poorly executed program can do more harm than good by reinforcing systematic biases.

What can be done to improve access to such programs? As with most solutions, the first step is recognizing systematic bias and unequal access. The second step is either change the process or introduce complementary programs to address the access issue. In the telecommuting example, by making the feature bounty available online, this would ensure off-site workers have the same access to the program as on-site workers. In our case, we introduced a new company perk that reimburses weekly housekeeping for employees earning less than a specified amount. This perk increases access to the feature bounty program by removing housekeeping as a time barrier.

In the United States, examples of improving access include providing or reimbursing day care, shuttles for employees who commute longer distances, etc. Our program could easily be modified to reimburse babysitting so a single parent could attend an after work or weekend networking event. We actually allow the reimbursement to be used for travel to/from events related to networking and personal growth. The less well-heeled tend to live farther away from the city center. Even if they want to go to an event, having to take a 2 hour commute late night with less transportation options can discourage them from attending an event. Reimbursing a taxi ride can therefore enable poorer employees to spend time on personal development.

It’s not easy to implement programs that are completely free of bias. And sometimes you run the risk of alienating those more privileged. In our case we want to ensure the benefit is fair and equitable. So we offer it to anyone who qualifies regardless of their socioeconomic background. That said, we do ask people to only take it if they need it. This serves two purposes. First, it helps to de-stigmatize being poor since we talk about the perk openly in our employee handbook. Second, it encourages employees to be honest about their privilege and recognize that others aren’t so fortunate. This is one step towards compassion.

Policies like these require a commitment from senior management to get right. As an entrepreneur, it’s all too easy to ignore such issues and just focus on maximizing revenue and profit. But a relentless pursuit of profit at all costs is an ugly culture bereft of compassion and respect. By improving access to opportunity, we create a more just and equitable company. We also increase diversity and foster loyalty, which helps maintain a strong and healthy business.

Pez.AI is a socially responsible AI company. We make enterprise chatbots to streamline business processes and eliminate unnecessary inefficiency.

AI Social Impact #1

Is The Singularity Imminent?

Welcome to the inaugural AI Social Impact Newsletter, brought to you by Pez.AI. The purpose of this newsletter is to raise awareness around near term challenges facing the age of AI. We are less concerned with sensational worries of the “singularity” and more concerned about tangible threats to society. These include a widening income gap between the minority that can reap the rewards versus the majority of people that will become economically displaced, and how the bias in machine learning algorithms and AI will impact society. AI has the potential to automate the majority of jobs faster than new ones are created. AI will take on ever more important roles in our daily lives. Who will own and control the AI that we will depend on? How do we remove biases in models so that automated decision-making is objective and fair? Who is accountable for decisions that AI make?

The pace of progress and adoption is happening faster than people imagine. Whether it’s accounting, finance, law, logistics, retail, and even data science, all industries are affected. As with all scientific revolutions, we need to discuss these issues now, to prepare for this new social and economic reality. This newsletter is one step down that path.

Warm Regards,
Brian Lee Yung Rowe
Chief Pez Head, Pez.AI


Getting started, exploring these issues also means dissecting the so-called singularity. What exactly is it, and do we really need to be concerned? Like AI, the singularity means different things to different people. Our working definition is this: the singularity occurs when a machine (AI) exceeds human intelligence. A good overview is given by Karen Stollznow, who attended the premier singularity conference Singularity Summit. She describes both the history and influences of the singularity idea.

Those that think the singularity is real and near are the “Singularists”. Singularists are an eclectic bunch, whose ranks are filled with STEM luminaries past and present, including Bill Joy, Bill Gates, Elon Musk, and Stephen Hawking. It’s prime cheerleader is most likely Ray Kurzweil, lifelong futurist, inventor of numerous OCR and speech synthesis tools, and more recently a Director of Engineering at Google. A prime argument for believing the singularity is imminent is the continual technological progress that transcends Moore’s law, Riding this theme of dizzying progress, SoftBank CEO, Son Masayoshi, has recently jumped on the Singularity band wagon, predicting that a microchip will have an IQ of 10000 by 2047, not to mention shoes smarter than us.

Kurzweil made splashes back in 2006 saying the singularity would arrive in 2045. More recently, he’s pushed that forward to 2029! The complete timeline of his predictions for technological advances is here.

While this camp all agrees that the singularity is inevitable, they disagree on whether that is inherently good or bad. There are those that think it’s great for humanity and those that think it’s an existential threat. This divide will be the subject of a future newsletter. We’ll also see how this divide might not be as wide as imagined.


In an interesting twist, another co-founder of Microsoft sits squarely in the other camp. This group doesn’t necessarily think the singularity is impossible, just far enough away that there are more immediate things to worry about. I’d call this group the “Realists”, except that term is loaded with bias. Instead, rather tongue-in-cheek, I’ll call them the “Zenoists”. Paul Allen wrote a still relevant article from 2014 that picks apart most of the enthusiastic claims of the Singularists. Allen’s argument rests on Singularists’ method of extrapolation, saying that progress around cognition is different from Moore’s law (and by extension, computing). We still know very little of how the brain works, let alone consciousness. From this perspective, even twenty years seems a short time to decipher what some call the most complex thing in the universe.

Another known Zenoist is famed creator of Linux, Linus Torvalds. His view is that there will be progress in narrow applications of AI, as we’re seeing today with deep learning, but less convinced about human level cognition. In his words, he doesn’t expect to “see the situation where you suddenly have some existential crisis because your dishwasher is starting to discuss Sartre with you”. About as plausible as Douglas Adams’ existential elevators that were “imbued with intelligence and precognition [and] became terribly frustrated with the mindless business of going up and down, up and down…”

Erik Larson offers a slightly different take: humans willingly making ourselves obsolete through technology. Focusing on whether AI become superintelligent is a distraction to the bigger issue of what we are doing to ourselves. This argument has a long history. I first heard it in High School, when my calculus instructor wondered whether scientific calculators would make us dumber. Anecdotally, I’d like to think not.

So where do you stand? Are you a Singularist or Zenoist? Should we be afraid of AI or ourselves? Shout out over email or social media with the tag #AISocialImpact.

The AI Social Impact Newsletter is brought to you by Pez.AI, a socially responsible AI startup. We make enterprise bots that automate customer facing workflows and internal business processes. If you like this newsletter, sign up for future issues and please spread the awareness by sharing it with your friends or colleagues.


Hello Daisy, I’m crazy for you

Fears of rogue AI are once again center stage, thanks to Microsoft’s infamous, and now ignominious AI. If HAL ever wanted a robot partner in crime, her name is Tay. So you might be wondering if Pez.AI could ever go rogue? After all, the promise of Pez.AI is to respond appropriately to user and customer requests, so a Tay-style melt down is thoroughly unacceptable.

They all appear harmless at first

Rest assured Pez.AI will not disappoint and cannot be manipulated like Tay. The reason is the design. Tay is designed to learn online, with each and every interaction. This can quickly yield unintended — and unacceptable — consequences as we saw. Pez.AI learns as well, but the learning happens offline, to avoid uncontrollable situations like this. It also gives our researchers time to evaluate, tune, and tweak models as they evolve — before being deployed to customer installations. This ensures that we control what Pez.AI learns, so that it always responds appropriately to your customer requests.

Less human, more reliable

Where is Pez.AI now? The launch for our original conversational interface to Google Analytics is imminent! Included in the initial launch are some exciting use cases that highlight the power of the complete Pez.AI computing platform. We are in the final stretch of testing, and registered beta users can expect an announcement soon. If you haven’t signed up for the beta, do so here. Once you receive the beta invitation, simply click the Add to Slack button in the email to begin the integration. Don’t worry, integration is quick and painless. You only need to authorize Pez.AI to your Slack team and link your Google Analytics account. From there, you’re off to the races and the envy of all your friends.

What else is going on with Pez.AI? We started off with a simple idea to provide a conversational interface to analytics. The demand for a conversational AI has exceeded our expectations, and we are working with a number of businesses spanning messaging, customer service, insurance, and financial services to automate customer service with Pez.AI. With an analytics backbone, Pez.AI presents numerous opportunities for sophisticated inline inference and prediction while interacting naturally with users. Expect to hear a few exciting announcements in the coming months.

Brian Lee Yung Rowe is Founder and Chief Pez Head of Pez.AI // Zato Novo, a conversational AI platform for guided data analysis and automated customer service. Learn more at Pez.AI.

Is deep learning a Markov chain in disguise?

Andrej Karpathy’s post “The Unreasonable Effectiveness of Recurrent Neural Networks” made splashes last year. The basic premise is that you can create a recurrent neural network to learn language features character-by-character. But is the resultant model any different from a Markov chain built for the same purpose? I implemented a character-by-character Markov chain in R to find out.

Source: @shakespeare

First, let’s play a variation of the Imitation Game with generated text from Karpathy’s tinyshakespeare dataset. Which snippets are from the RNN and which are from the Markov chain? Note that Karpathy’s examples are from the complete works, whereas my Markov chain is from tinyshakespeare (about 1/4 the size) because I’m lazy.

If you can’t tell, don’t be hard on yourself. The humble Markov chain appears to be just as effective as the state-of-the-art RNN at learning to spell (olde) English words. How can this be? Let’s think about how each of these systems work. Both are taking a sequence of characters and attempting to “predict” the next character in the sequence. The RNN does this by adjusting weight vectors to get an output vector that fits the specified response. The hidden layer maintains state over the training set. In the end, there is a confidence value attributed to each possible output character, which is used to predict the next character.

Source: Andrej Karpathy

On the other hand, training a Markov chain simply constructs a probability mass function incrementally across the possible next states. What this means is that the resulting pmf is not so different from the RNN output of confidences. Here’s an example of the pmf associated with the string ‘walk ‘:

This tells us that 40% of the time, the letter ‘a’ follows the sequence ‘walk ‘. When producing text, we can either treat this as the predicted value, or use the pmf to dictate the sampling. I chose the latter since it’s more interesting.

But how is state captured in the Markov chain since by definition a Markov chain is stateless? Simple: we use a character sequence as the input instead of a single character. For this post, I used a sequence of length 5, so the Markov chain is picking a next state based on the previous five states. Is this cheating or is this what the RNN is doing with hidden layers?

While the mechanics of RNNs differ significantly from Markov chains, the underlying concepts are remarkably similar. RNNs and deep learning might be the cool kids on the block, but don’t overlook what’s simple. You can get a lot of mileage from simple models, which have generally stood the test of time, are well understood, and easy to explain.

NB: I didn’t use a package to train and run the Markov chain, since it’s less than 20 LOC overall. A version of this code will appear in a forthcoming chapter of my book.

Brian Lee Yung Rowe is Founder and Chief Pez Head of Pez.AI // Zato Novo, a conversational AI platform for guided data analysis and automated customer service. Learn more at Pez.AI.

Making AI More Human: How To Give A Sales Pitch As A Technical Talk

I had the pleasure of seeing Gary Marcus give his talk “Making AI More Human” at the NYC Machine Learning meetup last night. For those unaware, Marcus is a Professor of Cognitive Psychology at NYU and recently founded an AI startup, Geometric Intelligence, based on his research on how children learn. It was an entertaining talk, and I agreed with his assessments on deep learning and AI in general. His approach to solving aspects of learning in AI overlap with my own AI research for Pez.AI. Of course, I’m speculating on the specifics, since he didn’t provide any details. High-level it appears inspiration comes from childhood development, Bayesian reasoning, and probably some symbolic reasoning to boot.

Anyway, what’s the point of this post? Many in the audience were unhappy about this talk because it was mostly rehashing old arguments and offered near zero information on his research. For a technical talk Marcus failed miserably. However, as he said at the end, his goal was to recruit. And with this aim, the talk should be treated as a pitch. From this perspective he did rather well. If you are a pre-product startup or have a highly technical product, you can learn a lot from his approach.

Most startup playbooks say that investors focus on three things: market/problem, idea/solution, and team. From a startup maturity perspective, Geometric Intelligence is pre-product and pre-revenue. If you don’t have a demo to show, then the emphasis needs to be on selling the story. Let’s see how Marcus did that.

The Problem

Marcus has spent years honing his problem statement around AI. He’s published numerous articles, both academic and general on the subject. A good 90% of the talk was selling the problem. In a nutshell, current advances in AI are limited to what’s known as Narrow or Weak AI: domain-specific problems whose solutions are not readily generalizable. For example, DeepMind’s Alpha Go machine can’t play chess. Of course one could argue that most human go players can’t play chess either and would have to go through a similarly  long training process (okay not millions of games). That said, deep learning has numerous well-known limitations so the argument is not without merit.

Beware the sky that falls with robots

Marcus also presented an entertaining montage (with Charlie Chaplin-esque music no less) of anthropomorphic robots falling over. In short he was effective in bursting the AI bubble. I would consider this so effective that many probably missed the sleight of hand in the presentation: Marcus isn’t building robots and therefore isn’t fully addressing the Strong AI problem he meticulously presents.

The Solution

What is the solution that Marcus presents? Since there was no demo nor description of actual AI models, Marcus used his 2 year old son as a proxy for the solution. This is clever. Like sex and cute animals, babies always sell. The essential idea is that by mimicking childhood development, you can create an AI system that learns and is more adaptive on a smaller, “sparse” dataset. All good, right? Except that most of AI research is bio-inspired, from neural networks, to genetic algorithms, to swarm intelligence. Where techniques are not bio-inspired, they are still inspired by some aspect of nature, like simulated annealing.

A mature wetware computer interacting with the next generation model

Marcus suggested that their approach is based on probabilistic reasoning. This is reasonable on its own, but there is a fair amount of literature showing that humans are innately bad at probability. He gets around this by saying that we should only mimic/model the useful parts of humans. This doesn’t sound so different from the various approaches of other approaches that layer on statistical methods to improve models.

So what makes this approach better than all the others?


The team is what investors say is the most important of the three factors. The reasoning is that it takes a while to find product-market fit so the initial problem and solution is likely impermanent i.e. wrong. The team is responsible for both finding the correct product-market fit and also executing. The team thus trumps the market and idea since ostensibly they are permanent fixtures of the business. Both Marcus and his co-founder, Zoubin Ghahramani, are academics so they are unproven as entrepreneurs. So what do you do to counter this risk? First you casually mention how smart you are (PhD at 23) and then downplay it by calling yourself a slacker since your co-founder was recently inducted into the Royal Society. This establishes your credibility so that when you say being an academic is like being an entrepreneur everyone believes you.

Social Proof

At this point it’s time to deliver the coup-de-grace: social proof. This is a silly invention by otherwise smart, socially awkward people that popularity is a good indicator of success. Others might call this herd mentality and also recognize that entrepreneurs are mavericks going against the grain of convention. So by the time there’s enough social proof you’ve probably already missed the boat. Yet, this is an important “metric” for many investors, potential employees, and sometimes even potential customers and cannot be ignored. Marcus leverages this well by saying that they have a  investments from a number of prominent CEOs. But what do they know about AI? Are they a good proxy for due diligence or not?


At the end of the day it’s unclear what exactly has developed. What is clear is that good sales pitches can be passed off as technical talks. The real takeaway, to borrow for Peter Norvig is that Marcus has demonstrated the unreasonable effectiveness of good story telling.

Brian Lee Yung Rowe is Founder and Chief Pez Head of Pez.AI // Zato Novo, a conversational AI platform for guided data analysis and automated customer service. Learn more at Pez.AI.

7 Ways to Perplex a Data Scientist

On the heels of a report showing the inefficacy of government-run cyber security, it’s imperative to understand the limitations of your system and model. As that article shows, in addition to bureaucratic risk the government also needs to worry about gaming-the-bureaucracy risk! Government snafus aside, data science has enjoyed considerable success in the past few years. Despite this success, models can fail in surprising ways. Last year we saw how deep neural nets for image recognition fail on noisy data.

As these examples show, a lot can be learned by breaking models. Model builders of all stripes must consider the limitations of their models and should be a requisite step in the validation stage. As a fun exercise, below I present some ways to confuse models at popular web destinations. Can you figure out how a model will fail based on this behavior?

Product Recommendations


Netflix is known for using collaborative filtering but also matrix factorization like SVD.


  1. Choose a genre (e.g. Movies With A Strong Female Lead)
  2. For each movie, alternate ranking between 1 and 5 stars


Amazon is known for using user-based collaborative filtering.

Make a separate purchase for each item in a list. For each item do the following:

  1. Choose a dimension or combination of dimensions e.g. gender, age, department
  2. Browse related (i.e. similar) items in the given dimension
  3. Now browse related items in the opposite direction of dimension (or something unrelated)
  4. Add actual item to purchase to cart
  5. Checkout

Example: Choose baby car seat. View n car seats plus m related items (e.g. strollers). Now view a bunch of scooters for old people, such as the Pride 3 Wheel Celebrity X Scooter. Now add your purchase item and checkout.

Alternative: If you have disposable income, actually buy the car seat and scooter and donate them to a charity afterward.

Social Media


The Facebook News Feed is notorious for changing regularly and being somewhat opaque to outsiders, here is a narrative description of how it “works”. The short version is that there are various scoring models combined with various rules to deal with outliers.


  1. Choose a set of dimensions (e.g. day of week, time of day, media type)
  2. Choose a behavior (e.g. like, hide, scroll past, stay for long time, comment)
  3. For given set of dimensions, perform same behavior over a fixed period of time (e.g. 15 minutes)
  4. Repeat

Example: Choose Monday + 9 AM as dimensions. Choose “stay for long time + hide” as behavior. Do this for each item in news feed for 30 minutes. Repeat following week.

Bonus: Recruit your friends to follow the same algorithm, ideally in same geographic region.


One curious feature of LinkedIn is automated skill endorsement recommendations. It’s often that I get endorsed for random things unrelated to what I do. Presumably this works on some sort of frequent itemset based on graph distances.


  1. Choose a network of related people
  2. Choose an unrelated skill
  3. Endorse all people in network with same “skill”

Example: For me, I might choose all my financial quant friends and endorse them with the skill “arm wrestling”.

Alternative: Use a brand slogan as the skill e.g. “Think Different” This can be awkward, so try changing initial verb to a present participle e.g. “Thinking Different”.

Bonus: Use a brand slogan with a double entendre e.g. “Doubling Your Pleasure”.

Marketing and Advertising

Google Analytics

While there aren’t any models embedded within GA, many many models are used to analyze web behavior based on the tracking codes attached to a URL.


  1. Choose a URL to link to
  2. Choose a unique identifier
  3. Replace tracking code with custom identifier
  4. Get people to click link

Example: In this post, the links to and to use a custom tracking code of, linking to my AI SaaS service.


To explore the effects of different behaviors on these sites, these R packages can help you construct recommendation models: recommenderLab, arules, rCUR.

This is a small sampling of how to identify flaws in models. Add your own ideas on how to break models in the comments!

Brian Lee Yung Rowe is Founder and Chief Pez Head of Pez.AI // Zato Novo, a conversational AI platform for guided data analysis and automated customer service. Learn more at Pez.AI.

Panoptez is now Pez.AI

Panoptez is now Pez.AI. Why the change? We’ve added a conversational AI interface on top of our data analysis platform to improve the experience for non-technical people. That means you can talk to Pez.AI in Slack like you would talk to a business analyst or data scientist. Based on the conversation, Pez.AI identifies what you want and executes code to get your answers. Under the hood, the same Pez language you’ve grown to love powers the analysis and platform, giving you the best of both worlds.

Our first conversational application targets Google Analytics. You’ll be able to ask Pez.AI questions about your web traffic stats and trends in plain English. Pez.AI does all the hard work of constructing GA queries and summarizing the results right in Slack. And since Pez.AI is always there for you, you can get insights whenever you need them.

Everyone on the beta list will have early access to the GA application. We’ll be rolling out instances within the next few weeks. We’ve streamlined the installation to under 30 seconds, so you’ll be able to get insights quickly.

In other news, we also have a few partnerships in the works for more enterprise applications in the finance and customer service verticals. If you are interested in a partnership opportunity or would like custom AI development, give us a shout.

Intro to data structures for Excel users

In this series of posts, we teach programming concepts from the perspective of spreadsheets using pez, Zato Novo’s data analysis language. If you know Excel, then you already have the foundation to start coding!

Data structures form the backbone of any programming language (and software system), and for computer science students it can send a shiver down their spine. But data structures don’t have to be intimidating. By the end of this post, you’ll be able to work with them confidently and efficiently.

So what is a data structure? Simply put they are containers that hold data. A spreadsheet is actually a massive data structure that represents data as a grid. Spreadsheets are good for displaying all the gory details of a (tabular) dataset but are cumbersome when moving data around or creating custom functions to modify data. Programming languages, on the other hand, provide compact notation for working with data structures but it can be cumbersome to see all of the data.

Most programming languages come with “batteries included”, meaning once it’s installed you have everything you need to immediately play with it. What’s implied is that all sorts of data structures are provided out of the box, which is great for variety but difficult to pick up and remember. Pez likes to err on the side of simplicity, so there are two primary data structures: lists and data frames. We’ll explore both of these structures using an example of creating financial projections for a startup.

Forecasting MRR

To make the lessons concrete, we’ll use a business forecasting example. In a previous article I showed how to use Panoptez to calculate the MRR of Slack using a basic set of assumptions. For this article, we’ll forecast the MRR of my startup, Zato Novo, based on an even simpler set of assumptions. As with the previous article, we establish a baseline approach using a Google Sheets document. This spreadsheet has a handful of columns, starting with the forecast date, followed by a projected number of paying customers. For pedagogical purposes, I’m assuming a fixed subscriber growth rate of 5% per month, which annualizes to 80%. Then I take that user number and multiply it by the base monthly price of $25/user to get a monthly recurring revenue number. To keep things simple, I’m ignoring tiers, annual prepay, and churn. This spreadsheet will be examined throughout the article as we walk through various concepts.


Working with lists

Okay, now let’s see how to construct the same thing in pez. Lists are an ordered collection of items and can contain any type of data. In a spreadsheet, a range of cells is analogous to a list. When we say an ordered list, items in the list are guaranteed to be in the same order as you entered them. This is like a spreadsheet where the value in A4 always follows the one in A3. In our revenue forecast example, each column is a list. It’s fine to treat each row as a list as well, although later we’ll see why it’s more convenient to think of lists as columns.

Let’s look at the first column that contains dates. In a spreadsheet we create this column by starting with an initial date. Next we define a formula that adds one month to create the next date (using EDATE in Google Sheets). We then copy and paste this formula for each successive cell to create the whole range. Our final date range lives in the cells A2:A25.


Notice that for each successive date, we are adding one month to the previous date. Hence, the second date adds 1 to the initial date, while the fourth date adds 3, and so on. In pez, we take advantage of this observation to create the dates more compactly. First, we create the initial date, which is simply the literal text 2016-01-01. If you enter dates with this specific format, pez knows that it’s a date, just like in a spreadsheet. (The same is true of timestamps.)

Now let’s create an integer range that represents how many months the initial date needs to be added to create the complete date range. For this we use the range operator, ... For example, 0..23 creates 24 integers, from 0 to 23. The final step is to create the dates, which simply requires adding this date to the list of numbers.

See how much simpler this is than copy and pasting a formula into a number of cells? In the spreadsheet, there is one other detail, which is that the column has a header. In pez, we just assign this expression to a variable, which we’ll call month. Here is what it looks like in our Panoptez-enabled Slack.


Literal list creation

We saw how easy integer ranges can be created in the previous section. What if you want to create a list that is not an integer range? In this case, a literal list can be created using bracket notation: [x1, x2, x3, ..., xn]. With this syntax, each element is specified explicitly within square brackets. Using the date range above, the first four elements can be created as [2016-01-01, 2016-02-01, 2016-03-01, 2016-04-01]. This approach is perfectly legal, but for efficiency, it’s often easier to think about using an expression to generate the appropriate range for you.

Learn more about lists

Element selection

So what can we do with this list? In a spreadsheet we can pull specific elements from a range and reference them in a separate cell using its coordinates. For example, January of 2017 is located at A14. This approach is convenient, but what happens if we move this column somewhere else? Let’s say we add one column to the left of A. Most of the time the spreadsheet automatically updates the cell references to reflect its new location. However, that means if we need to reference it anew, we need to know where it is in the spreadsheet! For complicated spreadsheets it can start to feel like a perverse Where’s Waldo exercise. Wouldn’t be nice if we could always reference the range using the same locations? In pez, our date range is called month, so any time we access month[13] we get the first day of 2017. That means no more missing references!

The operation using the name of the variable followed by brackets, x[y], is called indexing or subsetting. The number inside the brackets is called the index. In pez, the first element starts at an index of 1, while the last element is at length(x). There are other ways to index a list, but for now we’ll stick to the basics.

Compounding growth

Let’s move on to the second column, which contains a hypothetical user growth rate. Starting with an initial value of 100 users (hey, you gotta start somewhere), we assume a monthly growth rate of 5%. So growth is compounding monthly, meaning that each month is 1.05 times greater than the prior month. To model this in a spreadsheet, we again turn to a formula. This time the formula multiplies 1.05 to the previous value instead of adding a value.


In pez, there are a few ways to tackle this. One approach is to use the cumprod function, which takes a list of numbers and computes the cumulative product of all the numbers in the list from the first element to the current element. For example, cumprod 1..4 yields [1, 2, 6, 24], which is equivalent to [1, 1*2, 1*2*3, 1*2*3*4]. For the growth rate, we create a repeated list of 1.05 and apply cumprod to it.


Calling functions is similar to calling functions in a spreadsheet, where the name of the function is followed by its arguments wrapped in parentheses. Pez supports a simpler syntax as well, which will be discussed in a future post.

You may have noticed that there’s one problem with this approach. While the spreadsheet starts at 100, our pez list starts at 105. We need to modify the list to do this. However, an even simpler approach takes advantage of how compounding works. Since the compounding rate is constant, each compounding term raises the power of the compounding. Month one is just 1, while month two is 1.05, month three is 1.05^2, and so on. Using what we’ve already learned, we can raise 1.05 to the sequence 0..23, which produces all the powers for us!


Calculating the MRR

The last column to create is the monthly recurring revenue. The current assumption is $25/user/month, so we multiply each value in C2:C25 by 25.


In pez, the range C2:C25 corresponds to the variable customers, so we multiply that by 25 and assign its result to a new variable mrr.


Again, notice how simple it is to describe this operation.

Creating the data frame

The final step is to bring all these variables together into a single table. Data frames are organized by column, which is why we claimed that it’s best to think of lists as columns. Each variable we defined is simply a column in the table.

The output table is just like the spreadsheet. To make the table easier to work with, it’s actually better to assign our dates to the index of the table. This reduces the number of columns and sets the index to the dates. We use a special @index key at the end of the table definition to specify the index.

This looks pretty good. However, notice that we had to create a whole bunch of variables to create this table. This pollutes your workspace, which makes it harder to find useful stuff in the future. It’s better to use a let expression to define temporary variables instead.

Now only the variable you care about is created in your workspace. All the others are deleted once the let expression is evaluated.


As a final goodie, here is a plot of the MRR based on the data we created.



Data structures are an important part of programming. In this article, we took your existing knowledge of Excel and showed how cell ranges are lists and tables are data frames. You also got a taste of let expressions and vectorization, which are two powerful features of pez.

Panoptez is a collaborative data analysis and visualization platform accessible via chat systems, like Slack. Request an invite to the beta or contact us for preferred access.

How to calculate monthly recurring revenue (MRR) in Slack instead of Excel

FastCompany wrote an article about Slack, which cited some subscriber numbers. This got me wondering what their monthly recurring revenue (MRR) is based on these figures. The MRR is a key metric that helps determine if your company is cashflow positive or not. Knowing the MRR also gives you insight into a SaaS company’s P/E ratio. Since we don’t know if Slack is profitable, we can’t compute the P/E. We can, however, use price-to-revenue as a naive proxy. In this article, I show how to use Panoptez within Slack to calculate the MRR and P/R instead of Excel (or other spreadsheet program).

A spreadsheet (e.g. Excel, Google Sheets) is often the go-to tool when you want to make a quick back-of-the-envelope calculation. In isolation this is sufficient, but when sharing your calculation with others, it becomes more involved. Within a team, it’s also likely that you want to share your methodology or the function you wrote to your colleagues. In a spreadsheet this becomes a bit more challenging since usually it means writing a function in Visual Basic or something comparable and then figure out how to distribute that among your colleagues. For this article, we’ll ignore the sharing aspect and focus on only the calculations. Our baseline will be using Google Sheets to implement these values.

The Data

First, we need the raw data. In this case, it comes from FastCompany, which says Slack has 370,000 paid subscribers. Slack has two tiers of pricing, but FastCompany doesn’t break this out for us. The pricing itself comes from Slack, where they list the price of the standard and plus plans.


To get a single value for the MRR, we need to know how many people pay for the standard versus the plus tier. We also need to know how many pay month-to-month versus annually. Since these numbers aren’t available, we have to make assumptions for the proportion of subscribers in each plan as well as the ratio of subscribers paying month-to-month versus annually. My hand-waving guess is 70% pay for the standard tier and 30% pay for plus. I also assume that 70% of the standard tier pay month-to-month and 30% pay annually. For the plus tier I assume the opposite. If you have better assumptions, please let me know in the comments!

Spreadsheet Calculation

In a spreadsheet, the normal procedure is to populate cells with these values and add some labels for the rows and columns. Next we create a formula to hold some intermediate results. In our case this is the weighted monthly value of a user in the standard and plus tiers. The formula bar shows the computed value for the standard tier.


To get the MRR we tally those up and multiply by the number of paid subscribers. This gives us $3.44 million per month, or $41.3 million per year.


That means with a private valuation of $2.8 billion, the P/R is about 68. Remember, this doesn’t equate to the P/E, since we aren’t accounting for expenses, so the P/E will likely be much higher. This is a detail overlooked in the Business Insider article that you shouldn’t ignore.

Using Panoptez

Now let’s see how to do the same thing in Panoptez. First, we create a nearly identical table. Remember that since this is in Panoptez, once this table is created, any colleague on Slack can access this same table to use as they wish. We’ll create a data frame using { } notation and assign it to the variable slack_stats. In case you’re wondering, a “data frame” is a fancy way of saying “table”.


Here’s a text version so you can copy and paste into your Panoptez-enabled Slack.

Each list within the data frame represents a column of the table. In our spreadsheet, the first column of data represented the standard pricing tier. To reference it, we would create a range from B2:B6. Our data frame holds the same data, except we reference it as slack_stats$standard. The @index at the end of the data frame sets the row names for the table. If we don’t specify this, the rows will simply be numbered numerically.

To calculate the weighted value of each tier, we’ll create a temporary function. Since Panoptez tracks all variables created in your workspace, it can fill up with a bunch of garbage quickly. To reduce clutter, you can use what’s known as a “let expression” to create temporary variables that will disappear after the expression has been evaluated. The basic structure of a let expression is let x in y. In this example, we create a temporary function f and then apply it to slack_stats$standard. The function itself is doing the same thing as in the spreadsheet formula =B2 * (B3*B5 + B4*B6), except we use the dot product (the ** operator) instead of explicitly summing the two products. The value at x[1] corresponds to B2 in the spreadsheet, since that is the range we are passing to the function. If we had used slack_stats$plus instead, then x[1] would correspond to C2.


Putting it all together, we can take our let expression and use it inside a function! That means we can create a temporary function to simplify the overall calculation. This last step creates a function that accepts the number of paying users and calculates the MRR. Notice that the expression following the in is essentially the same as in the spreadsheet, which was =F2*(B8+C8). The difference is that instead of cell positions, we are using variables and functions. The variable u is equivalent to F2, while f(slack_stats$standard) evaluates to the same value as B8.


This is the code to try in your Panoptez-enabled Slack session.

To get the final result, we simply call this function like !pez slack_mrr 370000. The nice thing about having a function is that as Slack’s user base changes, we can call this function again to get the latest MRR.



In this post, I’ve shown how to use Panoptez to calculate an estimate of Slack’s MRR. I’ll leave it to reader to write an expression that calculates the P/R ratio from this. In a subsequent post, we’ll look at changing the assumptions used in this example.

Panoptez is a collaborative data analysis and visualization platform accessible via chat systems, like Slack. Request an invite to the beta or contact us for preferred access.