Hello Daisy, I’m crazy for you

Fears of rogue AI are once again center stage, thanks to Microsoft’s infamous, and now ignominious AI. If HAL ever wanted a robot partner in crime, her name is Tay. So you might be wondering if Pez.AI could ever go rogue? After all, the promise of Pez.AI is to respond appropriately to user and customer requests, so a Tay-style melt down is thoroughly unacceptable.

They all appear harmless at first

Rest assured Pez.AI will not disappoint and cannot be manipulated like Tay. The reason is the design. Tay is designed to learn online, with each and every interaction. This can quickly yield unintended — and unacceptable — consequences as we saw. Pez.AI learns as well, but the learning happens offline, to avoid uncontrollable situations like this. It also gives our researchers time to evaluate, tune, and tweak models as they evolve — before being deployed to customer installations. This ensures that we control what Pez.AI learns, so that it always responds appropriately to your customer requests.

Less human, more reliable

Where is Pez.AI now? The launch for our original conversational interface to Google Analytics is imminent! Included in the initial launch are some exciting use cases that highlight the power of the complete Pez.AI computing platform. We are in the final stretch of testing, and registered beta users can expect an announcement soon. If you haven’t signed up for the beta, do so here. Once you receive the beta invitation, simply click the Add to Slack button in the email to begin the integration. Don’t worry, integration is quick and painless. You only need to authorize Pez.AI to your Slack team and link your Google Analytics account. From there, you’re off to the races and the envy of all your friends.

What else is going on with Pez.AI? We started off with a simple idea to provide a conversational interface to analytics. The demand for a conversational AI has exceeded our expectations, and we are working with a number of businesses spanning messaging, customer service, insurance, and financial services to automate customer service with Pez.AI. With an analytics backbone, Pez.AI presents numerous opportunities for sophisticated inline inference and prediction while interacting naturally with users. Expect to hear a few exciting announcements in the coming months.

Brian Lee Yung Rowe is Founder and Chief Pez Head of Pez.AI // Zato Novo, a conversational AI platform for guided data analysis and automated customer service. Learn more at Pez.AI.

Is deep learning a Markov chain in disguise?

Andrej Karpathy’s post “The Unreasonable Effectiveness of Recurrent Neural Networks” made splashes last year. The basic premise is that you can create a recurrent neural network to learn language features character-by-character. But is the resultant model any different from a Markov chain built for the same purpose? I implemented a character-by-character Markov chain in R to find out.

Source: @shakespeare

First, let’s play a variation of the Imitation Game with generated text from Karpathy’s tinyshakespeare dataset. Which snippets are from the RNN and which are from the Markov chain? Note that Karpathy’s examples are from the complete works, whereas my Markov chain is from tinyshakespeare (about 1/4 the size) because I’m lazy.

If you can’t tell, don’t be hard on yourself. The humble Markov chain appears to be just as effective as the state-of-the-art RNN at learning to spell (olde) English words. How can this be? Let’s think about how each of these systems work. Both are taking a sequence of characters and attempting to “predict” the next character in the sequence. The RNN does this by adjusting weight vectors to get an output vector that fits the specified response. The hidden layer maintains state over the training set. In the end, there is a confidence value attributed to each possible output character, which is used to predict the next character.

Source: Andrej Karpathy

On the other hand, training a Markov chain simply constructs a probability mass function incrementally across the possible next states. What this means is that the resulting pmf is not so different from the RNN output of confidences. Here’s an example of the pmf associated with the string ‘walk ‘:

This tells us that 40% of the time, the letter ‘a’ follows the sequence ‘walk ‘. When producing text, we can either treat this as the predicted value, or use the pmf to dictate the sampling. I chose the latter since it’s more interesting.

But how is state captured in the Markov chain since by definition a Markov chain is stateless? Simple: we use a character sequence as the input instead of a single character. For this post, I used a sequence of length 5, so the Markov chain is picking a next state based on the previous five states. Is this cheating or is this what the RNN is doing with hidden layers?

While the mechanics of RNNs differ significantly from Markov chains, the underlying concepts are remarkably similar. RNNs and deep learning might be the cool kids on the block, but don’t overlook what’s simple. You can get a lot of mileage from simple models, which have generally stood the test of time, are well understood, and easy to explain.

NB: I didn’t use a package to train and run the Markov chain, since it’s less than 20 LOC overall. A version of this code will appear in a forthcoming chapter of my book.

Brian Lee Yung Rowe is Founder and Chief Pez Head of Pez.AI // Zato Novo, a conversational AI platform for guided data analysis and automated customer service. Learn more at Pez.AI.

Making AI More Human: How To Give A Sales Pitch As A Technical Talk

I had the pleasure of seeing Gary Marcus give his talk “Making AI More Human” at the NYC Machine Learning meetup last night. For those unaware, Marcus is a Professor of Cognitive Psychology at NYU and recently founded an AI startup, Geometric Intelligence, based on his research on how children learn. It was an entertaining talk, and I agreed with his assessments on deep learning and AI in general. His approach to solving aspects of learning in AI overlap with my own AI research for Pez.AI. Of course, I’m speculating on the specifics, since he didn’t provide any details. High-level it appears inspiration comes from childhood development, Bayesian reasoning, and probably some symbolic reasoning to boot.

Anyway, what’s the point of this post? Many in the audience were unhappy about this talk because it was mostly rehashing old arguments and offered near zero information on his research. For a technical talk Marcus failed miserably. However, as he said at the end, his goal was to recruit. And with this aim, the talk should be treated as a pitch. From this perspective he did rather well. If you are a pre-product startup or have a highly technical product, you can learn a lot from his approach.

Most startup playbooks say that investors focus on three things: market/problem, idea/solution, and team. From a startup maturity perspective, Geometric Intelligence is pre-product and pre-revenue. If you don’t have a demo to show, then the emphasis needs to be on selling the story. Let’s see how Marcus did that.

The Problem

Marcus has spent years honing his problem statement around AI. He’s published numerous articles, both academic and general on the subject. A good 90% of the talk was selling the problem. In a nutshell, current advances in AI are limited to what’s known as Narrow or Weak AI: domain-specific problems whose solutions are not readily generalizable. For example, DeepMind’s Alpha Go machine can’t play chess. Of course one could argue that most human go players can’t play chess either and would have to go through a similarly  long training process (okay not millions of games). That said, deep learning has numerous well-known limitations so the argument is not without merit.

Beware the sky that falls with robots

Marcus also presented an entertaining montage (with Charlie Chaplin-esque music no less) of anthropomorphic robots falling over. In short he was effective in bursting the AI bubble. I would consider this so effective that many probably missed the sleight of hand in the presentation: Marcus isn’t building robots and therefore isn’t fully addressing the Strong AI problem he meticulously presents.

The Solution

What is the solution that Marcus presents? Since there was no demo nor description of actual AI models, Marcus used his 2 year old son as a proxy for the solution. This is clever. Like sex and cute animals, babies always sell. The essential idea is that by mimicking childhood development, you can create an AI system that learns and is more adaptive on a smaller, “sparse” dataset. All good, right? Except that most of AI research is bio-inspired, from neural networks, to genetic algorithms, to swarm intelligence. Where techniques are not bio-inspired, they are still inspired by some aspect of nature, like simulated annealing.

A mature wetware computer interacting with the next generation model

Marcus suggested that their approach is based on probabilistic reasoning. This is reasonable on its own, but there is a fair amount of literature showing that humans are innately bad at probability. He gets around this by saying that we should only mimic/model the useful parts of humans. This doesn’t sound so different from the various approaches of other approaches that layer on statistical methods to improve models.

So what makes this approach better than all the others?

Team

The team is what investors say is the most important of the three factors. The reasoning is that it takes a while to find product-market fit so the initial problem and solution is likely impermanent i.e. wrong. The team is responsible for both finding the correct product-market fit and also executing. The team thus trumps the market and idea since ostensibly they are permanent fixtures of the business. Both Marcus and his co-founder, Zoubin Ghahramani, are academics so they are unproven as entrepreneurs. So what do you do to counter this risk? First you casually mention how smart you are (PhD at 23) and then downplay it by calling yourself a slacker since your co-founder was recently inducted into the Royal Society. This establishes your credibility so that when you say being an academic is like being an entrepreneur everyone believes you.

Social Proof

At this point it’s time to deliver the coup-de-grace: social proof. This is a silly invention by otherwise smart, socially awkward people that popularity is a good indicator of success. Others might call this herd mentality and also recognize that entrepreneurs are mavericks going against the grain of convention. So by the time there’s enough social proof you’ve probably already missed the boat. Yet, this is an important “metric” for many investors, potential employees, and sometimes even potential customers and cannot be ignored. Marcus leverages this well by saying that they have a  investments from a number of prominent CEOs. But what do they know about AI? Are they a good proxy for due diligence or not?

Conclusion

At the end of the day it’s unclear what exactly geometric.ai has developed. What is clear is that good sales pitches can be passed off as technical talks. The real takeaway, to borrow for Peter Norvig is that Marcus has demonstrated the unreasonable effectiveness of good story telling.

Brian Lee Yung Rowe is Founder and Chief Pez Head of Pez.AI // Zato Novo, a conversational AI platform for guided data analysis and automated customer service. Learn more at Pez.AI.

7 Ways to Perplex a Data Scientist

On the heels of a report showing the inefficacy of government-run cyber security, it’s imperative to understand the limitations of your system and model. As that article shows, in addition to bureaucratic risk the government also needs to worry about gaming-the-bureaucracy risk! Government snafus aside, data science has enjoyed considerable success in the past few years. Despite this success, models can fail in surprising ways. Last year we saw how deep neural nets for image recognition fail on noisy data.

As these examples show, a lot can be learned by breaking models. Model builders of all stripes must consider the limitations of their models and should be a requisite step in the validation stage. As a fun exercise, below I present some ways to confuse models at popular web destinations. Can you figure out how a model will fail based on this behavior?

Product Recommendations

Netflix

Netflix is known for using collaborative filtering but also matrix factorization like SVD.

Algorithm

  1. Choose a genre (e.g. Movies With A Strong Female Lead)
  2. For each movie, alternate ranking between 1 and 5 stars

Amazon

Amazon is known for using user-based collaborative filtering.

Algorithm
Make a separate purchase for each item in a list. For each item do the following:

  1. Choose a dimension or combination of dimensions e.g. gender, age, department
  2. Browse related (i.e. similar) items in the given dimension
  3. Now browse related items in the opposite direction of dimension (or something unrelated)
  4. Add actual item to purchase to cart
  5. Checkout

Example: Choose baby car seat. View n car seats plus m related items (e.g. strollers). Now view a bunch of scooters for old people, such as the Pride 3 Wheel Celebrity X Scooter. Now add your purchase item and checkout.

Alternative: If you have disposable income, actually buy the car seat and scooter and donate them to a charity afterward.

Social Media

Facebook

The Facebook News Feed is notorious for changing regularly and being somewhat opaque to outsiders, here is a narrative description of how it “works”. The short version is that there are various scoring models combined with various rules to deal with outliers.

Algorithm

  1. Choose a set of dimensions (e.g. day of week, time of day, media type)
  2. Choose a behavior (e.g. like, hide, scroll past, stay for long time, comment)
  3. For given set of dimensions, perform same behavior over a fixed period of time (e.g. 15 minutes)
  4. Repeat

Example: Choose Monday + 9 AM as dimensions. Choose “stay for long time + hide” as behavior. Do this for each item in news feed for 30 minutes. Repeat following week.

Bonus: Recruit your friends to follow the same algorithm, ideally in same geographic region.

LinkedIn

One curious feature of LinkedIn is automated skill endorsement recommendations. It’s often that I get endorsed for random things unrelated to what I do. Presumably this works on some sort of frequent itemset based on graph distances.

Algorithm

  1. Choose a network of related people
  2. Choose an unrelated skill
  3. Endorse all people in network with same “skill”

Example: For me, I might choose all my financial quant friends and endorse them with the skill “arm wrestling”.

Alternative: Use a brand slogan as the skill e.g. “Think Different” This can be awkward, so try changing initial verb to a present participle e.g. “Thinking Different”.

Bonus: Use a brand slogan with a double entendre e.g. “Doubling Your Pleasure”.

Marketing and Advertising

Google Analytics

While there aren’t any models embedded within GA, many many models are used to analyze web behavior based on the tracking codes attached to a URL.

Algorithm

  1. Choose a URL to link to
  2. Choose a unique identifier
  3. Replace tracking code with custom identifier
  4. Get people to click link

Example: In this post, the links to recode.net and slate.com to use a custom tracking code of pez.ai, linking to my AI SaaS service.

Tools

To explore the effects of different behaviors on these sites, these R packages can help you construct recommendation models: recommenderLab, arules, rCUR.

This is a small sampling of how to identify flaws in models. Add your own ideas on how to break models in the comments!

Brian Lee Yung Rowe is Founder and Chief Pez Head of Pez.AI // Zato Novo, a conversational AI platform for guided data analysis and automated customer service. Learn more at Pez.AI.

Panoptez is now Pez.AI

Panoptez is now Pez.AI. Why the change? We’ve added a conversational AI interface on top of our data analysis platform to improve the experience for non-technical people. That means you can talk to Pez.AI in Slack like you would talk to a business analyst or data scientist. Based on the conversation, Pez.AI identifies what you want and executes code to get your answers. Under the hood, the same Pez language you’ve grown to love powers the analysis and platform, giving you the best of both worlds.

Our first conversational application targets Google Analytics. You’ll be able to ask Pez.AI questions about your web traffic stats and trends in plain English. Pez.AI does all the hard work of constructing GA queries and summarizing the results right in Slack. And since Pez.AI is always there for you, you can get insights whenever you need them.

Everyone on the beta list will have early access to the GA application. We’ll be rolling out instances within the next few weeks. We’ve streamlined the installation to under 30 seconds, so you’ll be able to get insights quickly.

In other news, we also have a few partnerships in the works for more enterprise applications in the finance and customer service verticals. If you are interested in a partnership opportunity or would like custom AI development, give us a shout.

Intro to data structures for Excel users

In this series of posts, we teach programming concepts from the perspective of spreadsheets using pez, Zato Novo’s data analysis language. If you know Excel, then you already have the foundation to start coding!

Data structures form the backbone of any programming language (and software system), and for computer science students it can send a shiver down their spine. But data structures don’t have to be intimidating. By the end of this post, you’ll be able to work with them confidently and efficiently.

So what is a data structure? Simply put they are containers that hold data. A spreadsheet is actually a massive data structure that represents data as a grid. Spreadsheets are good for displaying all the gory details of a (tabular) dataset but are cumbersome when moving data around or creating custom functions to modify data. Programming languages, on the other hand, provide compact notation for working with data structures but it can be cumbersome to see all of the data.

Most programming languages come with “batteries included”, meaning once it’s installed you have everything you need to immediately play with it. What’s implied is that all sorts of data structures are provided out of the box, which is great for variety but difficult to pick up and remember. Pez likes to err on the side of simplicity, so there are two primary data structures: lists and data frames. We’ll explore both of these structures using an example of creating financial projections for a startup.

Forecasting MRR

To make the lessons concrete, we’ll use a business forecasting example. In a previous article I showed how to use Panoptez to calculate the MRR of Slack using a basic set of assumptions. For this article, we’ll forecast the MRR of my startup, Zato Novo, based on an even simpler set of assumptions. As with the previous article, we establish a baseline approach using a Google Sheets document. This spreadsheet has a handful of columns, starting with the forecast date, followed by a projected number of paying customers. For pedagogical purposes, I’m assuming a fixed subscriber growth rate of 5% per month, which annualizes to 80%. Then I take that user number and multiply it by the base monthly price of $25/user to get a monthly recurring revenue number. To keep things simple, I’m ignoring tiers, annual prepay, and churn. This spreadsheet will be examined throughout the article as we walk through various concepts.

zn_ss_mrr

Working with lists

Okay, now let’s see how to construct the same thing in pez. Lists are an ordered collection of items and can contain any type of data. In a spreadsheet, a range of cells is analogous to a list. When we say an ordered list, items in the list are guaranteed to be in the same order as you entered them. This is like a spreadsheet where the value in A4 always follows the one in A3. In our revenue forecast example, each column is a list. It’s fine to treat each row as a list as well, although later we’ll see why it’s more convenient to think of lists as columns.

Let’s look at the first column that contains dates. In a spreadsheet we create this column by starting with an initial date. Next we define a formula that adds one month to create the next date (using EDATE in Google Sheets). We then copy and paste this formula for each successive cell to create the whole range. Our final date range lives in the cells A2:A25.

zn_ss_edate

Notice that for each successive date, we are adding one month to the previous date. Hence, the second date adds 1 to the initial date, while the fourth date adds 3, and so on. In pez, we take advantage of this observation to create the dates more compactly. First, we create the initial date, which is simply the literal text 2016-01-01. If you enter dates with this specific format, pez knows that it’s a date, just like in a spreadsheet. (The same is true of timestamps.)

Now let’s create an integer range that represents how many months the initial date needs to be added to create the complete date range. For this we use the range operator, ... For example, 0..23 creates 24 integers, from 0 to 23. The final step is to create the dates, which simply requires adding this date to the list of numbers.

See how much simpler this is than copy and pasting a formula into a number of cells? In the spreadsheet, there is one other detail, which is that the column has a header. In pez, we just assign this expression to a variable, which we’ll call month. Here is what it looks like in our Panoptez-enabled Slack.

zn_ss_month

Literal list creation

We saw how easy integer ranges can be created in the previous section. What if you want to create a list that is not an integer range? In this case, a literal list can be created using bracket notation: [x1, x2, x3, ..., xn]. With this syntax, each element is specified explicitly within square brackets. Using the date range above, the first four elements can be created as [2016-01-01, 2016-02-01, 2016-03-01, 2016-04-01]. This approach is perfectly legal, but for efficiency, it’s often easier to think about using an expression to generate the appropriate range for you.

Learn more about lists

Element selection

So what can we do with this list? In a spreadsheet we can pull specific elements from a range and reference them in a separate cell using its coordinates. For example, January of 2017 is located at A14. This approach is convenient, but what happens if we move this column somewhere else? Let’s say we add one column to the left of A. Most of the time the spreadsheet automatically updates the cell references to reflect its new location. However, that means if we need to reference it anew, we need to know where it is in the spreadsheet! For complicated spreadsheets it can start to feel like a perverse Where’s Waldo exercise. Wouldn’t be nice if we could always reference the range using the same locations? In pez, our date range is called month, so any time we access month[13] we get the first day of 2017. That means no more missing references!

The operation using the name of the variable followed by brackets, x[y], is called indexing or subsetting. The number inside the brackets is called the index. In pez, the first element starts at an index of 1, while the last element is at length(x). There are other ways to index a list, but for now we’ll stick to the basics.

Compounding growth

Let’s move on to the second column, which contains a hypothetical user growth rate. Starting with an initial value of 100 users (hey, you gotta start somewhere), we assume a monthly growth rate of 5%. So growth is compounding monthly, meaning that each month is 1.05 times greater than the prior month. To model this in a spreadsheet, we again turn to a formula. This time the formula multiplies 1.05 to the previous value instead of adding a value.

zn_ss_customers

In pez, there are a few ways to tackle this. One approach is to use the cumprod function, which takes a list of numbers and computes the cumulative product of all the numbers in the list from the first element to the current element. For example, cumprod 1..4 yields [1, 2, 6, 24], which is equivalent to [1, 1*2, 1*2*3, 1*2*3*4]. For the growth rate, we create a repeated list of 1.05 and apply cumprod to it.

zn_ss_growth_1

Calling functions is similar to calling functions in a spreadsheet, where the name of the function is followed by its arguments wrapped in parentheses. Pez supports a simpler syntax as well, which will be discussed in a future post.

You may have noticed that there’s one problem with this approach. While the spreadsheet starts at 100, our pez list starts at 105. We need to modify the list to do this. However, an even simpler approach takes advantage of how compounding works. Since the compounding rate is constant, each compounding term raises the power of the compounding. Month one is just 1, while month two is 1.05, month three is 1.05^2, and so on. Using what we’ve already learned, we can raise 1.05 to the sequence 0..23, which produces all the powers for us!

zn_ss_growth_2

Calculating the MRR

The last column to create is the monthly recurring revenue. The current assumption is $25/user/month, so we multiply each value in C2:C25 by 25.

zn_ss_mrr_g

In pez, the range C2:C25 corresponds to the variable customers, so we multiply that by 25 and assign its result to a new variable mrr.

zn_ss_mrr_pz

Again, notice how simple it is to describe this operation.

Creating the data frame

The final step is to bring all these variables together into a single table. Data frames are organized by column, which is why we claimed that it’s best to think of lists as columns. Each variable we defined is simply a column in the table.

The output table is just like the spreadsheet. To make the table easier to work with, it’s actually better to assign our dates to the index of the table. This reduces the number of columns and sets the index to the dates. We use a special @index key at the end of the table definition to specify the index.

This looks pretty good. However, notice that we had to create a whole bunch of variables to create this table. This pollutes your workspace, which makes it harder to find useful stuff in the future. It’s better to use a let expression to define temporary variables instead.

Now only the variable you care about is created in your workspace. All the others are deleted once the let expression is evaluated.

zn_ss_rev_forecast

As a final goodie, here is a plot of the MRR based on the data we created.

zn_ss_rev_plot

Conclusion

Data structures are an important part of programming. In this article, we took your existing knowledge of Excel and showed how cell ranges are lists and tables are data frames. You also got a taste of let expressions and vectorization, which are two powerful features of pez.

Panoptez is a collaborative data analysis and visualization platform accessible via chat systems, like Slack. Request an invite to the beta or contact us for preferred access.

How to calculate monthly recurring revenue (MRR) in Slack instead of Excel

FastCompany wrote an article about Slack, which cited some subscriber numbers. This got me wondering what their monthly recurring revenue (MRR) is based on these figures. The MRR is a key metric that helps determine if your company is cashflow positive or not. Knowing the MRR also gives you insight into a SaaS company’s P/E ratio. Since we don’t know if Slack is profitable, we can’t compute the P/E. We can, however, use price-to-revenue as a naive proxy. In this article, I show how to use Panoptez within Slack to calculate the MRR and P/R instead of Excel (or other spreadsheet program).

A spreadsheet (e.g. Excel, Google Sheets) is often the go-to tool when you want to make a quick back-of-the-envelope calculation. In isolation this is sufficient, but when sharing your calculation with others, it becomes more involved. Within a team, it’s also likely that you want to share your methodology or the function you wrote to your colleagues. In a spreadsheet this becomes a bit more challenging since usually it means writing a function in Visual Basic or something comparable and then figure out how to distribute that among your colleagues. For this article, we’ll ignore the sharing aspect and focus on only the calculations. Our baseline will be using Google Sheets to implement these values.

The Data

First, we need the raw data. In this case, it comes from FastCompany, which says Slack has 370,000 paid subscribers. Slack has two tiers of pricing, but FastCompany doesn’t break this out for us. The pricing itself comes from Slack, where they list the price of the standard and plus plans.

slack_pricing

To get a single value for the MRR, we need to know how many people pay for the standard versus the plus tier. We also need to know how many pay month-to-month versus annually. Since these numbers aren’t available, we have to make assumptions for the proportion of subscribers in each plan as well as the ratio of subscribers paying month-to-month versus annually. My hand-waving guess is 70% pay for the standard tier and 30% pay for plus. I also assume that 70% of the standard tier pay month-to-month and 30% pay annually. For the plus tier I assume the opposite. If you have better assumptions, please let me know in the comments!

Spreadsheet Calculation

In a spreadsheet, the normal procedure is to populate cells with these values and add some labels for the rows and columns. Next we create a formula to hold some intermediate results. In our case this is the weighted monthly value of a user in the standard and plus tiers. The formula bar shows the computed value for the standard tier.

slack_mrr_1

To get the MRR we tally those up and multiply by the number of paid subscribers. This gives us $3.44 million per month, or $41.3 million per year.

slack_mrr_2

That means with a private valuation of $2.8 billion, the P/R is about 68. Remember, this doesn’t equate to the P/E, since we aren’t accounting for expenses, so the P/E will likely be much higher. This is a detail overlooked in the Business Insider article that you shouldn’t ignore.

Using Panoptez

Now let’s see how to do the same thing in Panoptez. First, we create a nearly identical table. Remember that since this is in Panoptez, once this table is created, any colleague on Slack can access this same table to use as they wish. We’ll create a data frame using { } notation and assign it to the variable slack_stats. In case you’re wondering, a “data frame” is a fancy way of saying “table”.

slack_stats

Here’s a text version so you can copy and paste into your Panoptez-enabled Slack.

Each list within the data frame represents a column of the table. In our spreadsheet, the first column of data represented the standard pricing tier. To reference it, we would create a range from B2:B6. Our data frame holds the same data, except we reference it as slack_stats$standard. The @index at the end of the data frame sets the row names for the table. If we don’t specify this, the rows will simply be numbered numerically.

To calculate the weighted value of each tier, we’ll create a temporary function. Since Panoptez tracks all variables created in your workspace, it can fill up with a bunch of garbage quickly. To reduce clutter, you can use what’s known as a “let expression” to create temporary variables that will disappear after the expression has been evaluated. The basic structure of a let expression is let x in y. In this example, we create a temporary function f and then apply it to slack_stats$standard. The function itself is doing the same thing as in the spreadsheet formula =B2 * (B3*B5 + B4*B6), except we use the dot product (the ** operator) instead of explicitly summing the two products. The value at x[1] corresponds to B2 in the spreadsheet, since that is the range we are passing to the function. If we had used slack_stats$plus instead, then x[1] would correspond to C2.

slack_wv_fn

Putting it all together, we can take our let expression and use it inside a function! That means we can create a temporary function to simplify the overall calculation. This last step creates a function that accepts the number of paying users and calculates the MRR. Notice that the expression following the in is essentially the same as in the spreadsheet, which was =F2*(B8+C8). The difference is that instead of cell positions, we are using variables and functions. The variable u is equivalent to F2, while f(slack_stats$standard) evaluates to the same value as B8.

slack_mrr_fn

This is the code to try in your Panoptez-enabled Slack session.

To get the final result, we simply call this function like !pez slack_mrr 370000. The nice thing about having a function is that as Slack’s user base changes, we can call this function again to get the latest MRR.

slack_mrr_calc

Conclusion

In this post, I’ve shown how to use Panoptez to calculate an estimate of Slack’s MRR. I’ll leave it to reader to write an expression that calculates the P/R ratio from this. In a subsequent post, we’ll look at changing the assumptions used in this example.

Panoptez is a collaborative data analysis and visualization platform accessible via chat systems, like Slack. Request an invite to the beta or contact us for preferred access.

Data-driven collaboration just got a whole lot easier

holy_grail

Today’s Holy Grail is the data-driven organization. Like the Grail, nobody knows what it looks like, though many are on the difficult quest to find it. Becoming data-driven is hard, and many obstacles prevent organizations from reaching this goal. Two major obstacles include data inaccessibility and limited collaboration. The SaaS duo of Slack and Panoptez offers a shortcut around these challenges, getting you faster to data-driven bliss.

The Promise of Being Data-Driven

Data-driven organisations offer numerous advantages over traditional businesses. The promise is that data is a strategic asset that can “inform decision-making processes and drive actionable results” (IBM). When everyone has access to organizational data and analytical tools, “data empowers people to make decisions without having to consult managers three levels up” (VentureBeat). Taking advantage of data means that hunches can be replaced by hard data enabling even “junior employees to make decisions” (VentureBeat). Consequently, employees are empowered to make decisions and react faster to the market.

Competitive edge is often a byproduct of superior information. While all organizations are producing reams of data around their business and operational processes, data-driven organizations are capturing this data to make it usable. By quantifying processes and collecting this data, inefficiencies can be rooted out and new opportunities discovered. This is usually easy within an organizational silo, but it gets increasingly complicated as you span silos. Paradoxically, these insights typically have the most power. Examples include:

  • How do social media impressions affect sales inquiries?
  • Does content marketing engagement boost registrations?
  • How does lead velocity compare with expectations?
  • How do meetings affect productivity?
  • How do software releases affect customer support volume?
  • Do managers actually matter? (via Google)

Obstacles Along The Way

Most of the benefits of a data-driven organization are only possible when data is transparent and accessible to everyone in the organization. When data is not accessible, it’s near impossible to conduct a holistic analysis. Requests can easily be lost in a swamp of bureaucracy leading to lost opportunities. Furthermore, it creates bottlenecks in the decision-making process when only a handful of people can provide specific datasets.

But why is it so hard to become data-driven? Most IT systems were not designed for interoperability and data sharing, so getting data out of these systems is difficult. According to McKinsey, “existing IT architectures may prevent the integration of siloed information, and managing unstructured data often remains beyond traditional IT capabilities.

The traditional solution to this problem is to embark on a strategic IT project to integrate data together. While strategic initiatives can benefit a company in the long-term, you can’t hold your breath and wait for these projects to be completed. It’s quite often that “fully resolving these issues often takes years” (McKinsey). Protracted timelines are anathema to data-driven organizations — who has time to wait years for an answer?

sagrada_familia

What we really need is to quickly conduct an ad hoc analysis: immediate answers to immediate questions. In finance, desk quants served this purpose, answering complex analytical questions in near real-time. Not everyone needs the fire power of a quant but many do need immediate answers with the help of analytics.

When IT involvement is not an option, many people resort to spreadsheets as a way to get quick answers. Nowadays it’s fairly easy to collaborate on spreadsheets and get data into them. However, not all data is easily accessible. Operational data is largely buried in legacy systems lacking friendly APIs. Getting data out of spreadsheets for use in other analyses can also be tricky. Dashboards are similarly flawed since data is not easily shared across reports. Interactivity is also typically limited to drill-downs. But what if you want to quickly explore the relationship of one variable with another that might not be in the report? Now you have to ask the analyst that created the dashboard, which again creates bottlenecks!

Slack as a Data Hub

Thanks to Slack service integrations, all sorts of operational data are now appearing in Slack. Spanning all silos of an organization, this operational data feed in from sales, marketing, customer service, product development, etc. The key is that all these operational events transform Slack into a de facto data hub. Hence, data accessibility no longer requires a strategic initiative: any Slack user has access to organizational data immediately.

If an organization is using Slack, then a lot of attention is already in Slack making it a great canvas for collaboration. What if it were possible to conduct an analysis and visualize it directly in Slack? Then your colleagues don’t need to switch apps nor download anything because it’s right there. Now imagine if your colleague has an idea on how to improve the analysis. What if she could modify your analysis straight from Slack?

Panoptez for Collaborative Analytics

Slack can make data accessible to all, but to truly democratize data, it needs to be usable. Enter Panoptez. Panoptez is a collaborative analytics environment that integrates with messaging systems like Slack. As part of the offering, Panoptez automatically parses messages from service bots and transforms them into data structures. Without ceremony nor strategic initiatives, you get your operational data in usable form — as they are created.

trello_move

And since it’s all accessible in your Panoptez environment, you can instantly conduct cross-silo analyses. Want to know how social media marketing campaigns affect customer service inquiries? How do customer support requests affect the velocity of development? Which projects are most tightly coupled? Are development priorities aligned with marketing messages? How much do meetings affect development velocity?

Here’s an example that compares the size of our Kanban Review queue, with the number of commits to bitbucket. The idea is that high commit activity might indicate hastily written code or bug fixes.

trello_bitbucket

At the beginning and end of this series there appears to be outsized commit activity versus the change in the Review queue. Perhaps something in the Active queue is causing this issue? We can add that series to the plot and render it.

trello_bitbucket_2

The other half of being data-driven is collaboration. Most analytics platforms claim collaboration but what they mean is presentation. Dashboards, videos, narratives are for presentation. Collaboration is about working together to arrive at a solution. Panoptez moves beyond comments and annotations on a report. Instead, business users and analysts can share actual data and functions so they can collaboratively conduct an analysis.

In our example, if a product manager wanted to dig deeper into the data, it’s right there in Slack. This statement gets the last few events on our Trello engineering board and has links to the cards in question.

trello_review

In short, not only is your visualization instantly available to any user in your Slack channel, so are the commands and data to create the visualization. Panoptez democratizes your data and your analytics tools, so everyone within your organization is empowered to make decisions. With Panoptez, becoming data-driven doesn’t have to be an epic journey.

Learn More

Watch my webinar to learn more about how Panoptez makes collaboration easy. Ready to give Panoptez a test drive? Request an invite to our free 30 day trial while we’re in beta.