You have 4 summaries left

Software Engineering Daily

Making Data-Driven Decisions with Soumyadeb Mitra

Tue Jul 11 2023
Customer Data PlatformData ActivationData UnificationCloud Data WarehousesIdentity StitchingPersonalizationRudderStackData ExpertiseGenerative AIBuilding a Successful Company


RutterStack is a warehouse native customer data platform that helps businesses collect, unify, and activate customer data from all of their different sources. This episode covers the importance of activating all data, challenges of integrating different sources, building a data-driven culture, leveraging cloud-based solutions like Snowflake, identity stitching and customer 360, the lack of data expertise, personalization with generative AI, RudderStack's offerings and a customer case study, and insights on building a successful company.


RutterStack enables businesses to collect, unify, and activate customer data from various sources.

By providing a warehouse native CDP, RutterStack addresses the challenge of fragmented customer data and helps companies leverage their data for marketing attribution, churn prediction, and product recommendations.

Cloud-based solutions like Snowflake have revolutionized data collection and unification.

With the ability to store and process high-volume streaming and transactional data in one place, cloud-native warehouses have made it easier for traditional CDPs to run on top of them, solving data unification and analytics problems.

Identity stitching is a crucial step in creating a unified customer record.

By combining activities across multiple IDs, RutterStack enables businesses to compute features like browsing history or revenue, creating a comprehensive customer 360 view.

Personalization with generative AI is now possible with advancements in technology.

Businesses can create personalized messages based on individual user data, allowing for one-to-one personalization at scale.

Data expertise is essential for implementing personalized marketing strategies.

Companies need to understand the value of a customer 360 view and invest in hiring data professionals to derive business value from their existing data assets.

RudderStack offers a comprehensive suite of products for data collection, unification, and activation.

Their open source offering provides the data collection piece, while their SaaS offering provides high availability and scalability. They also offer commercial products for unification and activation.

Building a successful company requires focusing on solving a known problem with a better solution.

Technologists should not shy away from competition, as it can indicate a market need. White spaces and markets without competition can be more lucrative than trying to do something completely new.


  1. Introduction to RutterStack
  2. Evolution of CDPs and Cloud-based Solutions
  3. Data Collection, Unification, and Activation with RutterStack
  4. Data Plane and Unification in RutterStack
  5. Identity Stitching and Customer 360 in RutterStack
  6. Data Expertise and Personalization in RutterStack
  7. RudderStack's Offerings and Case Study
  8. Building a Successful Company

Introduction to RutterStack

00:00 - 07:43

  • RutterStack is a warehouse native customer data platform that helps businesses collect, unify, and activate customer data from all of their different sources.
  • The importance of activating all of your data and how RutterStack can help you with that.
  • The challenges of integrating different data sources.
  • How to build a data-driven culture in your organization.
  • Companies need to expose customer data for more sophisticated use cases such as marketing attribution, churn prediction, and product recommendations.
  • Somiya Mitra is the founder of RutterStack and has experience in the data, ML, and analytics space.
  • RutterStack is building a warehouse native CDP (Customer Data Platform).
  • CDP collects customer data from various channels to provide personalized experiences and insights about customers.
  • One problem CDPs address is having multiple lines of business with fragmented customer data that doesn't provide a clear picture of customers.

Evolution of CDPs and Cloud-based Solutions

07:28 - 14:18

  • CDPs address the problem of personas not fitting together due to incomplete customer data.
  • CDPs have been around for almost 10-15 years, solving the challenge of accessing and collecting customer data.
  • Early CDPs required setting up a Hadoop cluster and a team of data engineers to collect and process large amounts of data.
  • Cloud-based SaaS solutions like Segment emerged to solve the slowness and complexity of traditional CDPs.
  • However, these SaaS solutions have limitations in collecting all customer data from various systems within an enterprise.
  • Incomplete customer data has been a challenge for leveraging rich data in training machine learning models.
  • Building churn models requires complete app activity and support ticketing data, which are often stored separately.
  • The hard problem is bringing together different types of data and training ML algorithms on top of it.
  • Cloud-native warehouses like Snowflake enable storing and processing high-volume streaming and transactional data in one place.

Data Collection, Unification, and Activation with RutterStack

13:48 - 21:14

  • Cloud data warehouses have enabled the collection of high volume streaming data and ETL data into one platform.
  • The traditional CDPs can now be run on top of a cloud data warehouse, solving many data unification and analytics problems.
  • Rutter stack follows the collection, unification, activation pipeline for use cases like building a churn model.
  • The collection step involves getting first-party app data and ticketing data through an ETL pipeline.
  • The unification step stitches the collected data together to create a golden customer record with various features.
  • Training an ML algorithm is the simplest part of the puzzle once the features are obtained.
  • The activation step involves pushing the churn score back into the business system for action.
  • Collection, unification, and activation form the core pipeline of Rutter stack.
  • The data plane in Rutter stack is responsible for moving data between sources and destinations in various pipelines.

Data Plane and Unification in RutterStack

20:50 - 27:37

  • Data plane should run in the same environment as the data warehouse (e.g., AWS, GCP)
  • Separate gateway and data plane components
  • Data plane is split into multiple components, including a gateway for ingesting data
  • Gateway component ensures high availability and handles scaling and failovers
  • Consumers of data perform transformations on events before delivering them to destinations
  • System guarantees at least once delivery with retries
  • Rutter Stack provides open source code that can be run on a laptop or Kubernetes cluster
  • Next step for Rutter Stack is unification of data pipelines
  • Unification involves handling different schemas and data definitions for customers

Identity Stitching and Customer 360 in RutterStack

27:09 - 33:50

  • To combine data from different sources with different schemas and definitions, the first step is to stitch together all the identities associated with a user.
  • Identity stitching, or ID resolution, can be challenging to do in SQL and requires careful logic.
  • Features like browsing history or revenue need to be computed by combining activities across multiple IDs.
  • Creating a unified customer record can take months of manual work or hiring a team of data analysts.
  • The process of unification becomes complex and time-consuming when marketing teams require new features for campaigns.
  • Snowflake launched a product to simplify the creation of customer records for both technical and non-technical teams.
  • The output of unification, known as customer 360, allows users to see one record per customer with all relevant features.
  • Having access to this unified data often sparks creativity in adding additional features or sources.
  • The lack of data expertise among non-technical staff is an existential problem for many businesses.

Data Expertise and Personalization in RutterStack

33:25 - 40:05

  • There is a lack of data expertise among people who know what features they need.
  • The majority of people in organizations lack the necessary tools to understand and utilize data.
  • Creating a customer 360 view and demonstrating its value is crucial for securing budget to hire more data professionals.
  • The ability to achieve one-to-one personalization with generative AI is seen as a major driver for businesses.
  • Previously, the technology was not available to scale personalized messaging, resulting in segmentation-based marketing strategies.
  • With advancements in technology, it is now possible to create personalized messages based on individual user data.
  • Collecting comprehensive user data is essential for implementing personalized marketing strategies.
  • The promise of true personalization has been long-awaited by businesses, and cloud data solutions are seen as a step towards achieving this goal.
  • The value proposition of DataStack is centered around helping customers derive business value from their existing data assets.
  • DataStack focuses on understanding the specific business use case and ensuring the availability of complete and accurate customer records before activating use cases.
  • The implementation process varies in duration, ranging from one month to six months depending on the complexity of the use case and data collection requirements.
  • Data collection is another product offered by DataStack, positioning itself as an API-compatible replacement for Segment.

RudderStack's Offerings and Case Study

39:43 - 46:36

  • Video has built a big business around data collection.
  • They offer an API compatible replacement for segment.
  • Their open source offering is the data collection piece.
  • The SaaS offering provides high availability and scalability.
  • Unify and activate are their commercial offerings available only in SaaS.
  • They charge based on the event or data moved through their stack.
  • A customer case study with WISE demonstrates increased productivity using RudderStack's unified product.
  • RudderStack prioritizes data privacy and compliance, storing no customer data and allowing deployment in specific regions or within a customer's VPC.

Building a Successful Company

46:14 - 50:45

  • As a technologist, the goal is not to build something new, but to build something people need.
  • Building something people care about often means building something that has been done before.
  • Technologists have a higher chance of success if they go after a known market with a better solution.
  • Don't try to do something new, go after something that has been done before and do it better.
  • Many companies are trying to do generative AI for support, but only one will survive in the market.
  • There are white spaces and markets where there is no competition, which can be more lucrative than trying to do cool stuff.
  • Competition should not be avoided when building a company; it's important to have some competition.
  • The nature of competition is an important topic that deserves further discussion.