The following guest post comes Daniel Teachey, an expert in
data management, analytics, and cloud computing and a member of
the SAS External Communications team.
Is it Tupac Shakur or 2pac Shakur?
Notorious BIG, Notorious B.I.G. (note the periods), or Biggie
Smalls? Ted Leo and the Pharmacists, Ted Leo & the
Pharmacists or Ted Leo/Pharmacists? And, famously,
how about Prince, the Artist Formerly Known as Prince, the
Artist, or that little symbol deal? It’s enough
to make “Big Poppa” break down and cry.
These are all valid ways to spell the names of just a few
recording artists. Luckily, these variations
don’t cause a problem when you’re speaking to another
music lover. If you tell me that you were listening to
a song by Prince, I can recall most of his entire catalog,
regardless of the moniker.
But what happens when you want to find the entire music catalog
of your favorite artist in iTunes, Spotify, Pandora and
the myriad other online sites? What if I wanted to listen
to (or purchase) the entire Prince catalog?
That can be a little tricky – and it all comes down to the way
you manage that data. And this has far-reaching effects
beyond just consumer frustration—it can actually impact how an
artist collects royalties on the purchase or streaming of
their music. Unfortunately, there is no consensus on the syntax
for an artist, an album or a song.
Here’s another example: Darryl Hall and John Oates.
For decades, they have been regaling fans with their
Philly blue-eyed soul. But check any music site, and how
is their name listed? “Darry Hall and John Oates,”
“Hall & Oates,” and “Hall and Oates” are all options.
You would assume that high-powered music sites wouldn’t get
confused by something as small as an ampersand.
However, if the software running these sites doesn’t have
a way to reconcile these things, then “Hall and Oates” and
“Hall & Oates” are essentially separate bands. Can’t find
“Sara Smile” or “She’s Gone?” Maybe they’re with that
other Hall & Oates.
Behind the Music: Data
We like to think of digital music as a collection of audio
files. What’s less understood is that the file
also has data – also known as metadata – that helps
systems classify songs, albums or artists. These problems
may be new to the average music lover, but they have flummoxed
data professionals since the days of punch cards.
Computers love order. They thrive on certainty. Humans?
We have no such need, and as things get added to computer
systems, there are invariably inconsistencies in how the
data is entered. (Anybody with thousands of songs in
their iTunes system is probably nodding sadly at this
point, after another frustrating effort at data quality
In the digital age, those problems can become magnified.
Digital music comes from both publishers
and independent artists, and there are no set standards
for how to classify names across systems. Unfortunately,
data hiccups can lead to frustrating issues to the online
listener. If you want to find all the songs by a
single artist, you may have to think about all the permutations
of that name. The question is: how do you
introduce some order and certainty to millions of songs,
artists, albums and other music elements?
The key is to understand that the data about songs is an asset
that you can explore, rationalize and, ultimately,
Managing Your Data and Playing the Hits
Do you remember the days when you once got four pieces of mail
to the same address? That was because a database
marketer had your address in their system four separate times.
Data quality technology allows marketers to find
these duplicates and reconcile them. Nowadays, duplicate
mailings rarely happen, because companies got smarter
about their data. Along the way, they exacted
significant cost savings (no more extra mailings) while at
the same time improving customer satisfaction (no
more multiple catalogs clogging your mailbox).
In most organizations, there is usually a group focused on the
health of data like this. Sometimes, there is even
an executive, often called the Chief Data Officer, assigned to
managing this data as an asset. For a bank or a
retailer, this group will work with the organization’s
technology and business leaders to codify the rules for
their company – and how to apply them within their systems.
This same principles apply to digital music. By creating rules
within the databases behind the online music, you can
start to rationalize all the permutations of an artist’s name.
These rules can be “always on” to catch potentially
non-standard data as it gets added to the catalog.
The data problems in the music industry are not a new story,
especially for groups that collect and distribute royalty
fees. Just imagine the data problems that a royalty
company faces when trying to track down recording artists,
authors and publishers who are often on the move – and possibly
using different stage names.
Ain’t No Party Like a Crowdsource Party
While the royalty world has a business need to get data right,
there is less of an imperative for online music sites.
They are wildly popular already, and the occasional bit of
dirty data may not seem to be a huge problem. But,
as Taylor Swift showed us in 2014 in her tiff with Spotify,
artists are looking at the royalty stream from online
As there is increased scrutiny on payments per stream, in
order to prove their value to artists, there will be more
pressure on these services to get payments right. Which
will mean getting the data right.
Perhaps a fix will spring from the collaborative nature of
online music. These sites create – and thrive on – a
community. There are audiophiles and music lovers
everywhere consuming and interacting with music in ways
that seemed ludicrous just two decades ago. Music, in many
ways, is a social mechanism.
There have been efforts to “crowdsource” the categorization of
music – is the song laid-back, mellow, acoustic, etc., beyond
the set genres in iTunes. Sites like Pandora, Spotify and
Google Play are getting smarter at making recommendations
based on these categorizations—and users can easily modify
the suggestions with a simple thumbs up or thumbs down.
Maybe it’s the time to crowdsource the data behind the
music that we listen to every day. If a
listener sees a song title or artist’s name that isn’t
square with other entries, they can just “Shake it Off,” flag
it for review, and the data geeks can fix the data behind
For both new and legacy artists, digital music is their main
point of exposure to rabid fans and new listeners alike;
now, more than ever, the data behind their music matters: for
listeners, for artists and for the industry.
Daniel Teachey is a member of the SAS External
Communications team, and in his current role, he
works closely with global marketing groups to generate
content about data management, analytics and
cloud computing. Prior to this, he managed marketing
efforts for DataFlux, helping the former SAS subsidiary go
from a niche data quality software provider to a world leader
in data management solutions.
Image by Joanna Poe, adapted under a Creative
Commons Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)