Fast break AI: How Databricks helped the Pacers slash ML costs 12,000X% while speeding up insights

MT HANNACH
12 Min Read
Disclosure: This website may contain affiliate links, which means I may earn a commission if you click on the link and make a purchase. I only recommend products or services that I personally use and believe will add value to my readers. Your support is appreciated!

Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more


Statistics can be everything in basketball – but for Pacers Sports and Entertainment (PS & E), fans data is just as precious.

Yet, while the parent company of the Indianapolis Pacers (NBA), the Indiana fever (WNBA) and the Indiana Mad Ats (NBA G League) pumped unspeakable quantities in an automatic learning platform of $ 100,000 per year (ML) to generate predictive models around factors such as prices and requests for tickets, ideas n ‘ did not happen quickly enough.

Jared Chavez, director of engineering and data strategy, decided to change this, moving to Databricks on Salesforce a year and a half ago.

NOW? His team carries out the same range of predictive projects with meticulous calculation configurations to obtain critical information on fans’ behavior – for only $ 8 per year. It is a decrease in the jaw and apparently unthinkable that Chavez largely attributes to the capacity of its team to reduce the calculation of the ML to almost infinite amounts.

“We are very good to optimize our calculation and determine exactly how far we can push the limit to operate our models,” he told Venturebeat. “This is really what we are known with Databricks.”

Cut Opex from 98%

In addition to its three basketball teams, PS & E, based in Indianapolis, operates an ESports of Pacers’ games, hosts March Madness games and manages a busy business of more days Gainbridge Fieldhouse Arena (concerts, comedies, rodeos, other sporting events). In addition, the company announced last month its intention to build a $ 78 million Indiana Fever Sports Performance Centerwhich will be connected by Skybridge to the arena and a parking lot (which should open its doors in 2027).

All this allows an astounding amount of data – and data. From the point of view of the data infrastructure, Chavez stressed that up to two years ago, the organization hosted two completely independent warehouses built on Microsoft Azure Synapse Analytics. Different company teams have all used their own form of analysis, and the tools and skills varied in depth.

While Zeure Synapse did an excellent job by connecting to external platforms, he was prohibitive for an organization of the size of Ps & e, he explained. In addition, integrate the company’s ML platform with Microsoft Azure Data Studio leads to fragmentation.

To solve these problems, Chavez went to Automl Databricks and the Databricks Machine Learning Workspace In August 2023. The initial objective was to configure, train and deploy models around ticket pricing and game demand.

Technical and non-technical users immediately found the platforms useful, noted Chavez, and they quickly accelerated the ML process (and fell costs).

“This considerably improves response times for my marketing team because they have not to code,” said Chavez. These are all buttons for them, and all of this data return to Databricks as unified records. »»

In addition, his team organized the 60 company systems in Salesforce data Cloud. Now he reports that they have 440x additional storage data and 8 times more production sources in production.

PS & E today operates at just under 2% of its annual annual costs. “We saved hundreds of thousands a year just depending on the operations,” said Chavez. “We have reinvested it in the enrichment of customer data. We reinvested in better tools not only for my team, but the company’s analysis units. »»

Continuous refinement, in -depth understanding of data

How was her team calculated so incredibly low? Databricks has continuously refined cluster configurations, improved connectivity options in diagrams and model outputs integrated into PS & e data tables, said Chavez. The powerful ML engine is “enriching, refining, fusion and continuously predicting” on PS & E Customer records on all systems and sources of income.

This leads to better informed predictions for each iteration – and in fact, the occasional fall model sometimes goes directly to production without any other passing from his team, Chavez reported.

“In fact, it is a question of knowing the size of the data that enters, but also about how long it will take to train,” said Chavez. He added: “It is on the smallest cluster size that you could possibly execute, it could be a cluster optimized by memory, but it only makes Apache Spark quite well and knowing how we could store and Read the data in a fairly optimal way. “

Who is most likely to buy subscriptions?

Chavez’s team uses the data, the AI ​​and the ML are in propensity for the packages of season tickets. As he said: “We sell an impious number.”

The objective is to determine the characteristics of customers who influence where they choose to sit. Chavez explained that his team is geo-localizing addresses they have in the file to make correlations between demography, income levels and travel distances. They also analyze user purchasing history in all retail, food and drinks, commitment to mobile applications and other events they could attend on the PS & C campus.

In addition, they draw data from Stubhub, Seat Geek and other suppliers outside of ticketmaster to assess prices and determine to what extent stocks move. All this can be married to everything they know about a given customer to determine where they are going to sit, said Chavez.

Armed with this data, they could then, for example, a come to the search for a given customer of article 201 at the court of article 101. “Now, we cannot only resell his seat in the bridge Higher, but we can also sell another smaller set on the same seats it bought in mid-season, using the same characteristics for another person, “said Chavez.

Likewise, data can be used to improve sponsorship, which is essential for any sports franchise.

“Of course, they want to line up with organizations that overlap their own,” said Chavez. “So, can we better enrich ourselves?” Can we better predict? Can we do a personalized segmentation? »»

Ideally, the objective is an interface where any user could ask questions like: “Give me a section of the Pacers fans base in the mid-1920s with disposable income. Go even further: “Look for those who earn more than $ 100,000 per year and who have an interest in luxury vehicles”. The interface could then reduce a percentage that rides the data from the sponsor.

“When our partnership teams try to conclude these offers, they can simply extract information without having to count on an analysis team to do so for them,” said Chavez.

To further support this objective, his team seeks to build a clean data room or a secure environment that allows sensitive data sharing. This can be particularly useful with sponsors, as well as collaborations with other teams and the NCAA (which has its headquarters in Indianapolis).

“The name of the game for us right now is the response time, whether in front of the customer or internal,” said Chavez. “Can we considerably reduce the knowledge required to cut information and sort it using AI?”

Data collection and AI To understand traffic models, improve signage

Another area of ​​interest in the Chavez team is to examine where people are at any time on the PS & campus (which includes a three -level arena with an outdoor place). Chavez explained that data capture capacities are in place throughout its network infrastructure via WiFi access points.

“When you enter the arena, you scath them all, even if you don’t connect, as your phone checks WiFi,” he said. “I can see where you are moving. I don’t know who you are, but I can see where you are moving.

This can possibly help guide people around the arena – say, if someone wants to buy a Bretzel and are looking for a concession stand – and help their team determine where to position the food and goods kiosks.

Likewise, location data can help determine optimal signaling points, said Chavez. An interesting way to identify the number of signaling impressions is to place vision gradients with equivalent spots at the average fan height.

“Then, let’s calculate how much someone would have seen it walking with the number of people around them,” said Chavez. “So I can tell my godfather that you got 5,000 impressions about it, and 1,200 of them were pretty good.”

Likewise, when fans are at their seats, they are surrounded by digital panels and displays. Location data can help determine the quality (and quantity) of prints as a function of the angle of their sitting position. As Chavez noted it: “If this announcement was only on screen for 10 seconds in the third quarter, who would have seen it?”

Once PS & e has adequate rental data to help answer these types of questions, his team plans to work with VR laboratory of the University of Indiana To model the entire campus. “Then we are just going to have a very fun sandbox to run and answer all these 3D space questions that have bored me for two years,” said Chavez.

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *