Halloween AI Tricks, Treats, & Stats, Visualized
Andrew Huang
For Halloween this year, we decided to look into what consumer economic data can tell us about how Americans celebrate the holiday. Using a combination of multimodal AI tools and open-source Python packages, we demonstrate how you can streamline the data analysis workflow.
To examine all of the code that went into this project, check out this Anaconda Notebook.
Using Perplexity to search for data
But first… we need data! Let’s use LLMs to help, starting with Perplexity, a free AI search engine.
I asked it: “Where can I find a Halloween dataset” and some of the links I found from Perplexity and a bit of diving into its results include:
- https://nrf.com/research-insights/holiday-data-and-trends/halloween/halloween-data-center
- https://fred.stlouisfed.org/series/PCU31133113
- https://fred.stlouisfed.org/series/IPG3113N
- https://github.com/fivethirtyeight/data/tree/master/candy-power-ranking
The first link is the NRF (National Retail Foundation) which has been conducting its annual Halloween survey with Prosper Insights & Analytics for over a decade to see how Americans celebrate Halloween.
The second (two) links are from FRED (Federal Reserve Bank of St Louis), which tracks the Producer Price Index by Industry: Sugar and Confectionery Product Manufacturing & Industrial Production: Manufacturing: Nondurable Goods: Sugar and Confectionery Product, respectively.
The final link is from FiveThirtyEight, which started a candy popularity contest that saw thousands of participants take part in 269,000 randomly generated candy matchups.
Using multimodal models to scrape data
We now have data sources, but unfortunately, not all of them are machine-readable — sure there are pictures and graphs that are easily interpretable by humans, but not really usable by machines… or is that still true with the advent of multi-modal LLMs? Let’s see!
I start out by taking a screenshot of a plot from the NRF dataset:
Then, I pasted the screenshot into ChatGPT, asking it to “Extract the x & y values from the graph”, and to my pleasant surprise it was able to do it!
Seeing this success, I continue on to screenshotting another graph:
This time, using the same prompt, it only provided a summary of the plot. So, I follow up by asking it to “Create a pandas DataFrame from the plot,” which outputs the following:
While it was mostly able to interpret the plot, the accuracy wasn’t perfect. After asking it several more times, each time there was some mistake, making the outputs untrustworthy.
Experimenting with Mistral’s multimodal model, Pixtral, again, the accuracy wasn’t there.
Okay, so what about older technology like `pytesseract` which is an optical character recognition (OCR) tool? Using ChatGPT to help get started, it gave this response:
It looked easy to use, but when I tried running it on the screenshot, without any fine-tuning, the results were unusable:
NRF 2025: Retail's Big Show Jan 12-14, 2025 > NRF Foundation NRF Job Board
National . ; _
Retail NRF Membership Login Search Q. Explore —
j» Federation
in
f
aa Halloween
~~
Overview:
Total Spending (in Billions) Per Person Spending Percent Celebrating
$12.2B
° 11.6B
~a
$10.6,
$10.1B
—
$9.1B $oB $8.88
$8.48-—°——~*——,
$8B ° \ 888
° $7.4B °
$6.9 $7B s0.99/
$5.8B $5.8B
$5B ne Ne °
$4.7B
o——* 78 /
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
@ Total Spending
Source: NRF's 2024 Halloween Spending Survey, conducted by Prosper Insights & Analytics NRE Rot
etail
Federation
“
Related content
While the code picked up bits and pieces of the data, it was not in a structured way that could be used to write some code to parse.
Overall, while the multimodal models weren’t able to complete every task, they performed much better than older technologies and are getting better with every new release.
For now, let’s return to the reliable method of web scraping: Selenium and BeautifulSoup!
Using Selenium and BeautifulSoup to scrape data
While most people (myself included) are not web scraping experts, just about anyone can develop a snippet for web scraping with ChatGPT.
For this project, using a combination of prompts to ChatGPT and help from GitHub Copilot to write some regex, we were able to create a data frame with all the data.
The other two links provided a download button for their CSVs, so those were easy: just press download!
Armed with all the data, we can now visualize it.
Using hvPlot to plot and analyze data
Percent Celebrating
To start, I created a bar plot of the percentage of Americans celebrating Halloween over the years.
To begin the analysis, a bar plot was generated to visualize the percentage of Americans celebrating Halloween over the years.
Several noticeable dips stood out and warranted further investigation.
The drop in 2020 aligns with the peak of the COVID-19 pandemic and widespread quarantines, which understandably limited celebrations.
In 2007, the data revealed an early indicator of an economic downturn. According to the Federal Reserve, October 2007 marked the onset of financial instability, with the official recession beginning just two months later in December. It’s likely that economic uncertainty impacted consumer behavior even before the recession was formally announced.
However, the decline in 2005 was less obvious. To explore further, an inquiry was made using ChatGPT, yielding additional context to help uncover possible explanations:
Economic Factors: In 2005, the U.S. economy was still recovering from the early 2000s recession. Economic uncertainty or rising costs could have led some households to reduce spending on non-essential holidays like Halloween.
Natural Disasters: In 2005, several natural disasters had widespread impacts, most notably Hurricane Katrina, which devastated the Gulf Coast, particularly New Orleans, in late August. The widespread destruction, displacement of people, and overall disruption to normal life may have affected Halloween celebrations in many areas.
Percent Buying
For those participating in Halloween celebrations, the analysis explored what Americans typically purchase.
Violin plots, similar to box plots, were used to visualize the distribution of purchases across various categories. Individual years were represented by overlaid points, with darker blue indicating earlier years and lighter shades representing more recent ones. Some years, like 2020 and 2021, were highlighted in distinct colors to mark significant events.
The data shows that nearly everyone celebrating Halloween—95-100%—purchases candy. On the other hand, greeting cards are far less popular, with only about 35% making such purchases. However, 2020 and 2021 stood out as exceptions, likely influenced by the COVID-19 pandemic.
Costumes and decorations fall in the middle, with participation ranging between 60-80%. Interestingly, the data reveals a drop in purchases for costumes, decorations, and greeting cards in 2005, 2008, and 2009, possibly reflecting the impact of economic challenges during those years.
Per Person Spending
The analysis also explored how much Americans spend per person during Halloween celebrations. A second violin plot was added below the original, focusing on dollar amounts spent across different categories.
As with previous findings, greeting cards stand out as a category with minimal spending, while spending on other categories shows more consistency. Notably, the years 2005, 2008, and 2009 exhibit lower per-person spending, likely influenced by economic challenges during that period.
The violin plot also reveals a trend of increasing spending in more recent years. Lighter-colored points, representing later years, are generally positioned higher than darker ones, indicating rising costs over time. This upward trend, especially noticeable after 2020, aligns with inflationary pressures following the COVID-19 pandemic.
Sugar Manufacturing and Price Index
This section presents a time series analysis of the sugar manufacturing and price index, with recession periods highlighted by shaded spans and Octobers marked with dotted lines for clarity.
The graphs display indices, which provide a way to compare values relative to a baseline year. In this case, the 2017 index is set at 100. Values above 100 indicate higher production or prices compared to 2017, while values below 100 reflect a decline. For example, an index of 120 means production was 20% higher than in 2017, while an index of 80 indicates a 20% drop.
The production index reveals seasonal cycles, with output peaking in October, aligning with Halloween’s demand for sugar, and hitting lows in June. Recessions are evident as production dips within the shaded spans, reflecting the economic slowdown’s impact on sugar manufacturing.
The price index shows a significant surge in sugar prices following the COVID-19 pandemic. Between 2005 and 2020, the index rose by about 50 points, but between 2020 and 2024, it jumped by nearly 100 points, underscoring the inflationary pressures that emerged post-pandemic.
Candy Brand Names
Shifting focus from sugar economics to candy preferences, what are America’s favorite Halloween treats?
Reese’s Peanut Butter Cups rank at the top, reflecting the preferences of many Americans who participated in the survey. Interestingly, several candies with less than 50% popularity are less familiar, suggesting regional or niche appeal.
Does this align with your favorites?
Candy Categories
What makes a candy popular? The data from FiveThirtyEight offers some insights:
By grouping the data by candy attributes and calculating the average, a bar plot was created with individual win percentages overlaid as translucent points.
The results suggest that crisped rice wafer and peanut/almond chocolate nougat bars are the most favored. In contrast, fruity hard candies rank lower in popularity—perhaps because, at that point, people might prefer real fruit, like an apple!
Candy Categories Count
Does having multiple attributes make a candy more popular? To explore this, violin plots were generated, with dots indicating specific features like chocolate, fruity, or caramel components.
The data suggests that the more complex a candy, the more popular it tends to be—think Twix, Snickers, and Kit Kat. Additionally, the orange dots reveal that chocolate and peanut or almond components often correlate with higher win percentages.
Conclusion
This exploration of Halloween trends highlights how AI and data visualization can uncover meaningful insights into spending patterns, consumer behavior, and candy preferences. Using tools like multimodal models, web scraping, and libraries such as hvPlot, we’ve delved into diverse datasets to gain a deeper understanding of holiday festivities.
Beyond seasonal insights, this project demonstrates the broader potential of advanced data tools. AI-driven analysis empowers businesses to identify trends, predict customer behavior, and fine-tune strategies across various functions, providing a competitive edge in today’s data-centric landscape.
Thanks for joining in, and happy Halloween!
Talk to an Expert
Talk to one of our experts to find solutions for your AI journey.