Episode 52: Integrating Your CRM With Your Data Warehouse

Chris Strom:

Hey everyone, welcome back to another episode of the RevOps Hero podcast. On today's episode, I have our guest, Ryan Severns is joining us here, and we're going to talk about a topic that I've been really interested to get into for a while now, and that is integrating your CRM system with a data warehouse. We'll talk about why some people do that, how they go about doing it, who should consider doing it, what are the different considerations you need to look into when planning it out, and other aspects of it. So first off, we'll start with meeting Ryan. So Ryan, can you tell us a little bit about yourself and what you're doing these days?

Ryan Severns:

Yeah, absolutely. Thanks for having me on, Chris. So I'm Ryan Severns. I am the founder of Floodlight Growth, a small RevOps agency. We work with clients primarily on a fractional RevOps basis, RevOps as a service, but also do project work. Some of it related to setting up the data warehouse and the integrations with the CRM. Prior to this, I led marketing teams in B-to-B SaaS companies with a co-founder of a B-to-B SaaS company where I built out all of go-to-market.

And really a thread throughout that is that the data and processes, RevOps side of things has always been my favorite Pardot the business to work on. So after leaving my last company, I decided to start this and get to work with many different customers on what I always enjoy working with, which is the RevOps side of the house.

Chris Strom:

So you've been doing RevOps and now you're doing RevOps consulting, and how did you get into specifically the data warehouse integration side of things versus just straight CRM stuff?

Ryan Severns:

Yeah, absolutely. So my background has been largely with developer tools, DevOps products, and there's always been a product-led growth component of it. So oftentimes with either a free trial or free premium aspect to the product being a really core component of the overall go-to-market and client acquisition, customer acquisition process.

And so pulling that data into the CRM has been really core for how sales teams engage, how marketing engages. So really this dates back to, I started doing this in, I believe, 2015 with the company that I was working with at the time, was leading marketing there, and we needed more data in Salesforce at that point to make sure that we're tracking the right metrics and engaging with customers in the right way as part of their trial process, which again was part of the buying process.

That was where I first got started. And then there's been lots of different use cases since then and the technology has evolved quite a bit, and it's something that I always love working on because I find it to be a really powerful way to put information into the CRM, which is where most of the go-to-market team lives on a day-to-day basis.

Chris Strom:

So let's talk about, first off, why a company, say a company's been running Salesforce or HubSpot and they've been running it for a couple of years and everyone's been working out of that. What are some situations where a company might want to look at setting up a data warehouse and connecting it to their CRM system?

Ryan Severns:

Yeah, I think of it in three main areas. The first one is deeper analytics and reporting. If you've been in the RevOps world, or any sort of go-to-market leadership, you've probably encountered a scenario where you can't get the report that you want out of Salesforce or HubSpot. And being able to have that raw data in a data warehouse allows you to do deeper analysis and analytics that you might not be able to do directly within the CRM. So that's the first thing, and often what I find is the first thing that people tend to think of around this, but I actually have found over time that the other two reasons, the way that I categorize it, the other two reasons for doing it actually tend to be a lot more powerful.

The second one is data cleaning. And this is, you said a couple of years if a company had been running Salesforce or HubSpot for a couple of years, even after a couple of years, you might be in this scenario, or it might be a company that's been around for 10 years, has had three different marketing automation platforms over time. Maybe they migrated CRMs at one point, and there's a lot of disparate data. Being able to clean that and really create an accurate record for the CRM can be really powerful.

So one example that I'll use here is, when did a lead first get created? And this is a basic question, but I've worked with companies where you have, based on how previous marketing automation tools were set up, and what was being routed to Salesforce and Zoom Info Auto pushing things into Salesforce, getting a clear answer to when was this record created and when did we first have any engagement with them, is actually a really tough problem and not something that's clear in the CRM. So being able to do that sort of cleanup or clean historical data, that's really the second purpose of why I see a lot of companies doing this.

And then the third one is really operationalizing and activating that data. So there's a few different ways that this can play out. One is just other data sources coming into the CRM. So things that I will see. Billing data is a big one. If we can push billing data from Stripe, for example, into the CRM, it allows customer success teams to engage in the right way and maybe are part of the collections process with finance. Maybe we just need to know, especially if it's a usage-based product, how much are we billing them for? There's a lot of reasons that you might push billing data in. And then product data getting pushed into the CRM for SaaS tools is a really, opens up a lot of opportunities. So again, on the customer success front, you can engage in the right way and customer success reps are able to have the right conversations and maybe get triggered to go reach out to a specific customer if we see usage declining.

You can run marketing campaigns. If you see a customer that's using three of your features and they're not using feature number four, you can target your marketing at them and try to drive that expansion of utilization. And even in the sales process, if a salesperson is engaging on a demo call with somebody who, if it's more of a product-led growth, go to market if you're able to see if somebody has used certain features or how far they've gotten in their trial. So those are some of the reasons that you might put data, first into the data warehouse, and then push it into the CRM.

Chris Strom:

So things like product data, it's like a software company, product usage data from the software product, pushing that into the CRM, billing data from your billing system, pushing that into the CRM, probably including things like subscription data as well.

Ryan Severns:

Yeah, yeah, absolutely. Absolutely. And a lot of product data is something where there's not native integrations just by definition. And then if you look at a lot of your billing tools, and a lot of them have integrations with CRMs. And depending on your tools, I, worked a lot with Stripe, which I'm a huge fan of, but they're a developer product and a lot of it is driven by code and their integrations with a HubSpot or a Salesforce are a bit more thin and offer less customization. And this is the way that you can build some of that customization in a way that doesn't require additional engineering resources from your product development team.

Chris Strom:

Yeah, we'll get into some specifics of customization scenarios like that pretty shortly in this episode here. And then I remember one other thing you had mentioned was using data warehouses for calculating metrics that you just can't calculate in the CRM itself. Would you be able to give any examples of that?

Ryan Severns:

Absolutely. So sometimes, an example of what that might look like is, how many touch points have we had with a customer over the last 30 days or 90 days? And when you think of the sales automation tools, marketing emails going out, maybe somebody attending a webinar, these all live in these different siloed tools, and getting, let's say that you're running sales automation through Apollo and you have marketing emails coming out of HubSpot, maybe you're using Zoom webinars. If you want to aggregate all metrics from those three things to get number of tech points over the last 30 days, you're going to have to do a lot of wizardry with HubSpot workflow. And even then it might not be something that you can calculate, whereas pulling all of these metrics together in a data warehouse is a lot easier.

Chris Strom:

Yeah. Rather than dealing with trying to keep a continuous integration between both systems directly and keeping them both up to date, just keep them in their separate areas, but then bring them into the warehouse and you can join the tables however you want for analysis purposes without changing the underlying data.

Ryan Severns:

Yeah, exactly. Another thing that I'll say on the metrics and reporting front is, the way that I will articulate this is I always want to be able to look at historical company performance based on current definitions. So, if I take something that RevOps folk, who are familiar with HubSpot will know this world life cycle stages in HubSpot, where you might change a definition of your life cycle stages, and now from this point forward you have data of how people are moving through your new life cycle stages, I like to be able to apply those definitions historically as well.

And using the data warehouse, there's ways that you can do this and you can push it back onto custom fields and HubSpot, but I am sure a lot of folks have run into a scenario where you've introduced a new process, but you can't compare against historical data because we didn't have this data jumped the journey being tracked, whereas typically these are triggered by an action or there's something that triggers this new stage of the journey and we can, by this current lens, this current view of our stages to all of our historical data as well and push that back into the CRM.

Chris Strom:

Oh, interesting. I'd love to hear more about that as well. That could be its own whole separate topic possibly.

Ryan Severns:

Yeah, it is one of those things that, it's really powerful. Because I think where RevOps becomes so strategic is in the data and the reporting and it's helping the company have visibility into what's working and what's not, so that you can optimize and ultimately drive revenue growth.

And the way I always approach that is how are we trending relative to our historical metrics because we should be constantly improving and optimizing, or at least seeking to do that. And sometimes that means, hey, we need to know not only if we take the widest view, we have a lead and then we have a deal or an opportunity, and we have a conversion rate there.

But there's a lot of different steps that we likely want to track along that journey. Did a salesperson reach out? Did they get back to us? Did we have any sort of discovery meeting with them? And we introduced some of those new stages. I love to be able to track those stages, which we have the data underlying in the CRM, but it maybe isn't tracked in a life cycle stage. So can we go recreate the historical data and now view the funnel historically and see how we're trending, comparing to those historical metrics?

Chris Strom:

That would be very valuable. I've seen, I mean pretty much every company, something changes over time, deal stages, life cycle stages, adding ones, or removing ones, or changing the definition of ones, would be especially hard to detect scenario in retrospect.

Ryan Severns:

Yeah, yeah, yeah, absolutely. Absolutely.

Chris Strom:

Cool. So tons of value in doing something like this. So I'd like to talk about, if this sounds really good for a company and they want to start doing it, let's talk about how they set up a data warehouse and connect it to their CRM.

Ryan Severns:

Absolutely. So this is, in my mind, this is the golden age to, we've arrived at a point where it becomes a lot more easy these days than it used to be. The way I think of it is, there's that there's a lot of tooling and expertise that is behind that tooling, that replaces the need for a whole data engineering team that if you rewind 10, 15 years ago, you would need a data engineering team to build custom data pipelines and maintain those data pipelines in order to do this. Today we have tooling that really makes it simple for us.

So the first thing is ETL tooling. So this is products like Fivetran or Airbyte that are extracting the data from ETL or ELT. It's a long-standing data engineering term or acronym. ETL stands for extract, transform, load, and ELT is kind of been the repositioning of it as data storage has become cheaper.

We have extract, load, and transform, where the transformation is done in the data warehouse. But this is a tool like Fivetran can dump your Salesforce credentials, your HubSpot credentials, whatever you're using for billing. Maybe if you have your own PostgreSQL products database, you can dump your credentials in for that. And it starts replicating the data into the data warehouse.

And you have here, which is super nice, you can calculate change logs here and all of this data gets replicated into the data warehouse. And really it's as easy as anybody with credentials, the right authorization in Salesforce can go enter their credentials and now the data is being dumped into Snowflake or whatever you use for your data warehouse. So you have your ETL tool, you have the data warehouse itself, this is Snowflake, Redshift, BigQuery.

And then there's two other things that are newer products on the scene. And the first one is DDT, which really it's where you store logic and define tests. So it's taking a lot of data engineering principles and developer principles and putting them into a tool that makes it easy to define your logic. And if you think of aggregating data from multiple tools into one table, this is where you build the queries to do that. And then you can define tests.

A really easy one as is as you're joining data together, if you are looking for data on leads or individuals, let's make sure we only have one record per individual. So you're not joining, if you have leads and you're joining webinar attendance, and maybe they've attended multiple webinars, now you have multiple records for one lead. And if that's not the expected case, it'll create a test failure because you should only have one record per lead, and you need to do something like tell me the most recent time that they attended a webinar. So that's DDT.

And then there's reverse ETL tooling. So a product like Hightouch that takes the data from the data warehouse and pushes it back into your various SaaS tools or even advertising platforms. So this, maybe you've defined a query about the number of times that someone has logged in over the past 30 days, or if they've used a given feature in their trial and that's returning in a database query, you can take that from the data warehouse and Hightouch can map that to the CRM, Salesforce or HubSpot, and you put it on the field, you define which field should this go from or go to, and it's this column from your database query that is mapping there. And then they manage all of the updates and pushes to the CRM.

So I'll recap it real quick. It's ETL tool, like Fivetran data warehouse tool, like Snowflake, DDT, and then the reverse ETL tool, Hightouch.

Chris Strom:

So you have your sources of data, like your CRM is a source, Stripe is a source, or maybe like your ERP, Sage or NetSuite is a source. If you're a software company, maybe your user logs are a source. And you use the ETL tools to take the data from the sources and bring them into the data warehouse.

Ryan Severns:

That's right. That's right.

Chris Strom:

Cool. And then you use tools like DBT to apply different transformations to it within the warehouse.

Ryan Severns:

That's right. And this is where a lot of it, I usually build staging, intermediate and then production and activation data sets. And staging is just bringing that data into the DBT world. And then intermediate would be where you define a lot of that logic. And the most simple level, it might be that ACME Corp company has this Salesforce account ID, this HubSpot company ID, this IDE or customer ID within NetSuite, this org ID from our product database. And you do some of that mapping where you tie together the entities across your different sources.

Chris Strom:

So you said staging, intermediate, and what were the two others?

Ryan Severns:

So staging, intermediate, and then a production or activation table or schema. And sometimes I'm breaking those into two separate ones. Sometimes I keep them the same really depending on how much a company is investing in the AI versus the operationalization of data. And if a company is going heavy on the AI, we might have production tables there for reporting, which are separate from activation tables that are pushing data back out to the CRM and other third-party tools.

Chris Strom:

And you're doing that so that that way you're never change... The source data is always unchanged. That's in the staging table, right, and then-

Ryan Severns:

That's right.

Chris Strom:

... You're piping it to another table and you can do whatever you want with it in that table without worrying about overriding the original staging data.

Ryan Severns:

Yep, you got it.

Chris Strom:

Okay, cool. So the ETL tools bring it into the warehouse, you apply different transformations to it within the warehouse using the DBT tools, and then the reverse ETL tool, if you want to take some of the refined data and then push it back into Salesforce or HubSpot or something, that's where you use a reverse ETL tool?

Ryan Severns:

Yeah, that's right. That's right. So a great example is if you have a free trial, a software company, let's say, that has a free trial as part of the buyer's journey, and people can sign up for the free trial. So somebody signs up for it and that now is logging in the product database or the company, and we have certain fields, like when did they sign up for the trial and have they used a given feature? All of that gets calculated, transformed, joined together in the data warehouse.

And then Hightouch would be where we push out to a field on Salesforce, something like this is when the company signed up for the trial. And I often will do things like, this is the user within the account that signed up for it, and these are the invited users that have gotten invited over time, because your salespeople will likely want to at least start their outreach with the person who signed up. And then if they're not getting any response there, they move on to the people who have been invited on to the trial.

Chris Strom:

Yeah, I can totally see the value in that there. You had mentioned BI, that reminds me of another aspect of this too. Can you tell us what BI means and why you might use it and what sort of tools there are for that, and how it connects to everything else here?

Ryan Severns:

Yeah, so BI is business intelligence, or you'll also hear it called analytics. There's products like Looker, Power BI, Omni. There's a bunch of different data visualization tools, and there's different reasons you might select one over another, but really it's taking the raw data from these different sources, and sometimes they have native connectors, but I really like working from a data warehouse foundation. And it's opening up the opportunity to do deeper analysis than what you can do in native CRM reporting. So it gives a lot more flexibility.

Things that, one of my favorite things to look at is cohort-based data and take, if you have, let's talk about opportunities. What percentage of opportunities have closed within one quarter and two quarters of when they were created? And these give some of the leading indicators of pipeline quality if you have a longer sales cycle for an enterprise B2B SaaS company.

So you're able to track this in a way that you can somewhat manually calculate in Salesforce, but you can see curves of, over time, what percentage of this cohort has been moved to close one based on the amount of time that it's been around. You can get deeper reporting like that out of a BI tool.

Chris Strom:

So that'd be one of the big reasons why people use a BI tool like Tableau or Power BI versus just native CRM reporting?

Ryan Severns:

Yeah, exactly. And I will say, earlier in my career, I did analytics, and so I would go in with this mindset of, setting up. The first thing I would want to do is set up the BI tool. And really what I've seen is, no matter how great we build out BI tools and the data in there, at the end of the day, your go-to-market people, your head of sales, is going to run reporting out of Salesforce or HubSpot. That's what they're going to look at, what's going to go into the board report, and that is where they live.

And so I've really moved to a world of, make sure that we can get good and accurate reporting out of the CRM. And there should be nothing in... There should be nothing in the CRM that differs from the reporting that we get in the BI tool. It just is additive. So we shouldn't have restated metrics in BI. We should make sure that our CRM data is accurate. Some of this comes through that data cleaning and pushing the right data back into the CRM. And then anybody on the team can build the right reports in CRM, in the CRM. But if we want to go deeper, then they can move into the BI tool and get that deeper level of analysis.

Chris Strom:

That makes sense. If the salespeople and marketing people they're living in the CRM and the marketing automation, they're going to build reports in there no matter what. So that's why you're saying it's important to use, probably the reverse ETL tool, not just for the data, but even just for being able to report the data in those systems.

Ryan Severns:

Absolutely, absolutely. I'm sure you've seen it before, Chris, where there's, you have a sales leader and a marketing leader who have wildly different numbers of leads that were created, or the pipeline is driven, because they're working out of their disparate systems. And I, a principal in the work that I do, I believe that we should have consistent data everywhere. So I always think of it as a Venn diagram with a lot of overlap. And there's going to be, in our marketing automation tool, there's going to be data that inherently doesn't make it over to Salesforce.

And same with the sales data in Salesforce. It might not make it over to the marketing automation tool, but anywhere that we do have overlap, we should have consistent data and make sure that we're reporting our metrics based on those same foundations and same consistent data.

Chris Strom:

Yeah. It's a bad look if you immediately start off with, well, my report says this, why does your report say that? Then the conversation's derailed immediately.

Ryan Severns:

Yeah, and it's so common. I mean, I'm on calls all the time where the basis of the call is, we can't trust our data. And I think anybody who's worked in this world runs into that really often.

Chris Strom:

Yeah, I've seen that as well. So we've been talking about considerations like where does reporting actually live? Let's talk about some other considerations when planning out a CRM and data warehouse system setup here. Starting off with, who should manage it within the company.

Ryan Severns:

Yeah, do this is a hard question to answer, and it's something that has, I think in the GenAI world that we live in now, it's become a lot easier to address because ChatGPT is great at writing SQL queries. And the way that I always approach this is I skew towards business person owning this versus engineering owning this. And there has to be some overlap because ultimately we have a data pipeline here, even if it's managed very significantly by third party tooling. There's some technical aspect of this that engineering often wants to be involved in. This stuff should be backed by a GitHub repo.

And so even if the only involvement is engineering, creating a analytics or rev op get repo and having it part of the broader GitHub organization, there's some overlap with engineering. But what I have found is that when engineering purely owns this, is we lack the business context to get the right outputs that RevOps and the go-to-market team really needs.

And so I was skewed towards running this and ownership of this living within RevOps. And really to the GenAI point, one of the beauties of this is once you have it set up in the first place and some of these foundations in place, it's a lot easier to make updates as now we have a new field in our billing system that we want to push over to Salesforce. We already have a lot of that billing data going over. And with some support from ChatGPT, you can figure out how do I add this field into the query and make sure it gets mapped over? And it becomes a lot easier for a RevOps person with a little bit of technical aptitude and curiosity to go update and maintain over time.

Chris Strom:

Yeah. The people working on the business side are the most tuned in to what's the overarching objective and what do we need to know to get there. But of course, not everyone on the business side is going to have the technical aptitude, or should be given the opportunity to start doing the technical stuff as well. How do you identify promising candidates within the RevOps side to do something that is definitely on the more technical side like this?

Ryan Severns:

Yeah, I mean, I always look for someone who has just, I think of it as a data and process mindset, which covers a lot of RevOps folks. And if somebody does a really good job approaching a problem with building a workflow and HubSpot and they're very thoughtful of, okay, in what cases does a contact or a company enroll here and what are the updates that are made, and they're very thoughtful in those sorts of approaches, you can take that raw foundation and apply it to this different skill set of, okay, why would a particular contact end up in this query? Let's make sure that we have the right filters applied in the query, and what outputs do we need? How do we join our data together?

And typically, if somebody had those technical capabilities, even if they're just working in a user interface, they have technical thinking of data and process, you're able to apply it to the CRM, the data pipelines, the data warehouse.

Chris Strom:

So we talked a little bit about who would manage it, and then from there, probably last thing we'll cover here is talking about data governance considerations.

Ryan Severns:

Yeah, this is a... I mean, both the beauty and the reality to understand is that you have a lot of raw data that comes out of these tools into the data warehouse. And so making sure that you have the right permissions set up around, who can view the data, and what data you're putting in there, is really important.

So example might be if you're dumping QuickBooks data in there, because maybe you run invoicing out of QuickBooks, you want to be sure that you or NetSuite, you want to be sure that you are aware of what other data is pushing in there, because if you have P&L data also going in there, you might expose things like payroll. So it is important to have somebody who owns this, who has that data governance mindset, and make sure that the right data is going in and that the right access is applied to who can access that data.

Chris Strom:

Permissions as to who can access the data, who can define what filters or criteria, define what data goes in?

Ryan Severns:

Absolutely.

Chris Strom:

And then you also talk about your staging table versus intermediate versus production. That sounds like that probably falls into one of the key components you need to plan for your data governance too, right?

Ryan Severns:

Yeah, absolutely. I always set up different role types. So there's a role type to load the data in, which is largely just getting to tooling. Then you have, I always call it the transformer role type. So people who have this are able to transform the data, they're able to do the calculations.

And then you can have your analyst type, which just has access to that end data, end state data, the production database tables that they are analyzing, but they aren't going into the raw data and doing the transformations themselves. So you can expose that a lot more because you have curated data sets.

And whether your concern is exposing some sensitive data or just even somebody running the wrong calculations and making a bad business decision because they don't understand the underlying data sets, you're serving up the curated data sets at that production layer, that makes sure that analysts and whoever's using it can make the right decisions based on it.

Chris Strom:

Yeah, that makes sense. So the access levels on different staging. And then another thing that comes to my mind too is planning out the nomenclature and common definitions, and then documenting it in a SOP somewhere. How do you like to go about doing that?

Ryan Severns:

Yeah, absolutely. I mean, there's some of this that can be self-documenting because it's in code and it depends on how much you want to point people towards that. And oftentimes, for speed, I will have that self-documentation in code as well as then the definitions in a Salesforce or HubSpot field that we're pushing the data to. This is what this means.

And then often companies will want to go deeper, and there's both tooling out there or just Google Sheets creating records and logs of what the definitions are, what the source data is. There's also, within DBT, there's some nice lineage graphs. So if you have a database table, you can see all the different tables, draws, graphs of where that data is coming from, and what is resulting in that final data set that you're looking at. So it's another place where the tooling does a certain element of self-documentation.

Chris Strom:

Oh, that's cool. I don't know DBT that well myself, but as I learn it, I'll have to look at the lineage tables in particular.

Ryan Severns:

Yeah, it's super valuable to be able to, both trace back if you're looking at a production table, or if you're looking at an earlier staging or intermediate table, you're looking to make a change, you can look forward and say, what downstream is this going to affect that I need to be aware of? Because maybe it's going to affect a production table or an activation table that you aren't aware of.

Chris Strom:

And then, in addition to that, in the code itself, make sure you're putting comments for documentation in the code itself. In the CRM systems, make sure you're putting in accurate descriptions on their properties and fields, and then additional documentation, even something like a Google Doc or Google sheet for additional documentation from there.

Ryan Severns:

Yep, yep. Exactly.

Chris Strom:

I think this has been a great overview of the whole topic of building in CRMs to data warehouses and planning out how it moves and why. So really appreciate you walking us through all this, Ryan, and taking the time to do it.

Ryan Severns:

Yeah, for sure. For sure. I appreciate you having me on this topic I love chatting about. Happy to do it here and really appreciate you letting me join you here, Chris.