Home Uncategorized Native Instruments Shouldn’t Be the Only Company Creating the Stems Format

Native Instruments Shouldn’t Be the Only Company Creating the Stems Format




You are using an outdated browser. Please upgrade your browser or activate Google Chrome Frame to improve your experience.

Follow Us

DMN on Feedburner
divider image

Screen Shot 2015-03-30 at 10.52.47 AM

The following post comes from Matt Aimonetti,
Co-Founder and CTO of Splice.  Reposted with permission
from Splice’s blog.

The launch of Stems from Native Instruments at
World Music Conference in Miami is a major step forward in how
artists will create and share music together. As reported by
DJ TechTools, NI will support stems through
an open audio multi track format. Using a container file
format, a song can be distributed with five tracks: a mixdown
compatible with most audio players and four discrete tracks
(stems). What this means is that we will begin to see music
producers split their tracks into four musical sub elements
such as a drum stem, a bassline stem, a synth stem, and a vocal


At Splice we’ve been looking at stems and various formats for a
while now. We are strong open source advocates so we’re
thrilled to see NI has used an existing, well documented, open
container format called MP4 originally invented by Apple. Our engineering team
frequently deals with file formats and what we do know is that
the audio and music industry has a hard time with standards.
Even though we’ve seen in numerous occasions that standards are
in fact beneficial. In 1996, Steinberg changed the industry by releasing their
VST plugin specification format and SDK. By opening up their
format, VST became a standard used by dozens of DAWs and most plugins are now
written for VST and wrapped to support AudioUnit and RTAS/AAX
formats. Modern web features are also based on long discussions
between the main browser manufacturers (Chrome, Safari, IE,
Opera), the end result is that a website looks the same on all
browsers and developers don’t have to write different code for
Chrome or Safari.

While we are delighted to see NI use open source technology and
formats, we also think that if we want to have a long
living, unified format supported by today and tomorrow’s main
players, we should have an open discussion about the
specifications of this format.

How is NI planning on supporting a multitrack file format? The
details will not become available until it’s full release in
June, but based on the information NI provided and our
knowledge of the MP4 container, it’s pretty easy to

What we know:

  • NI Stems use the standard MP4
    for backward compatibility.
  • NI Stems files will be played on players supporting mp4
  • NI will require Stems to use a .stem.mp4 file extension.
  • NI Stems can only contain a maximum of 4 stem tracks per
  • NI Stems will use ID3 tags + maybe another kind of tagging
    for stem naming.
  • There is another similar format out there called .MOGG.
    It’s used by the Rock Band game, supported by Audacity using
    Ogg Vorbis, and popular with users looking for illegal stem

The MP4 container format:

MP4 aka MPEG-4 format is a container format meaning that
it’s a well defined way to package information. It’s usually
used to store video, audio, subtitles and images. The format is
directly based off of Apple’s QuickTime format and was
standardised as an ISO format in 2001 then updated in 2003. An
MP4 file is basically some metadata and one or more data
streams. MP4 is used for both audio and video, the underlying
media streams can be encoded using a wide range of codecs, H264
and AAC being the most common ones. The metadata support is
much richer than MP3 and was designed by Adobe.

Putting stems into a MP4 container has been fully
supported since at least 2003, and I will show you how you can
create your own stem files today.
Note that NI didn’t
mention the stream formats their tools will support but based
on the demos shown online a stem file generated by their editor
is somewhere between 30 and 80MB which would assume
compression. Technically speaking, if we are just looking at
the container format, there is no reason why they would not
also allow raw PCM stems. There is also no format
limitation that enforces only 5 streams (1 mixdown + 4 stems),
that’s an artificial limitation added by NI, probably because
they don’t expect to use more tracks for now.

So how does it work or, how should it work?

Everything starts with a normal MP4 container and an audio
“stream”. To make things easier and avoid having you to read
code, I’ll use FFmpeg — a very popular, free, cross platform and
open source collection of libraries and tools to inspect,
record, convert and stream audio and video. I don’t have a NI
Stem file, but because we know that they followed to the
MP4 specs, we can safely assume that the mixdown/bounce
track is a normal MP4 audio file, more than likely using
AAC since iTunes doesn’t support MP3 streams in a
MP4 container.
* The backup image section of this tag has been generated for use on a
* non-SSL page. If this tag is to be placed on an SSL page, change the
*   ‘…’
* to
*   ‘…’
* This noscript section of this tag only shows image banners. There
* is no width or height in these banners, so if you want these tags to
* allocate space for the ad before it shows, you will need to add this
* information to the tag.
* If you do not want to deal with the intricities of the noscript
* section, delete the tag (from … to ). On
* average, the noscript tag is called from less than 1% of internet
* users.

Here is how I convert a mixdown I exported from Ableton into a
mp4 file using FFmpeg:

$ ffmpeg -i ~/Desktop/bounce.wav -c:a libfaac

FFmpeg nicely converted my input WAV file into a
MP4 container file with an aac encoded stream. The file
plays perfectly in iTunes and all players supporting
MP4 audio files.

As previously mentioned, MP4 is a container format, we can
put a lot of things into the file and you have room for extra
storage such as metadata and data (MIDI, samples etc..) Often
MP4 video files contain a video stream + a few audio
streams. The main audio stream is the main language and the
others are dubbed versions of the main track. All streams are
synchronized allowing the audience to change audio stream while
the video keeps on playing.

Let’s add some stems to our MP4 container. I have
extracted two stems from my Ableton session: drums.wav and
synth.wav. I’d like to inject them in my container but I still
want the mixdown to continue playing properly in iTunes.

$ ffmpeg -i ~/Desktop/bounce.wav -i ~/Desktop/drums.wav -i
~/Desktop/synth.wav -map 0 -map 1 -map 2 -c:a libfaac

This long command is telling FFmpeg that we have
three audio input files and that we want them transcoded
to AAC and multiplexed into one file, with the first file
being the first stream.

FFmpeg abides and give us a bigger file that our original
bounce.mp4. The first output was 3.6MB, the second one 11.1MB —
which is normal since my stem.mp4 file contains 3 streams
instead of one.

The file plays perfectly in iTunes so we know we didn’t mess
things up. How about checking that the stems were properly
added to the container? Good news, since we are using
the MP4 container format we can open
 and check the various streams.


It looks good, how about listening to the streams to make sure
they work? Not a problem, click on the Audio menu > Audio
Track and pick another track.


Awesome, everything works as expected! The only issue remaining
is how would a DJ/producer know the content of our stems?
Pretty easy, the MP container has a good metadata support
and we can use FFmpeg to set the title of our tracks:

$ ffmpeg -i ~/Desktop/bounce.wav -i ~/Desktop/drums.wav -i
~/Desktop/synth.wav -map 0 -map 1 -map 2 -c:a libfaac
-metadata:s:0 title=mix -metadata:s:1 title=drums -metadata:s:2
title=synth ~/Desktop/bounce.stem.mp4


Based on this quick demo, I think it’s fair to say Native
Instruments products will read multiple streams at once to
allow you to mix them live. There is technically no reason to
limit the stems to 4 the way NI is doing it. Our guess is that
they have their own reasons (memory limitations, user
experience based on their 4 track hardware). Our hope is that
Traktor Pro and the existing and upcoming controllers will
accept mp4 files with more stems, but will let NI users access
the first 4 stems. This will allow other products
(software/hardware) to go beyond NI’s current limit.

+How to Make Music With Sushi, by Just Blaze

There is also a lot more we can do. At Splice we work at the
source code level of music, we can inject all kinds of
data/metadata into a container format, from cues, markers and
loops to automation, visualization, samples, MIDI, and
presets. While we praise NI for choosing a great existing ISO
format, the devil is in the details. If we truly want a
unified stem format, the details of this format can’t be solely
crafted by one company in order to factor in everyone’s
Apple realized that quickly and knew that if
they wanted their format to be adopted by all, they had to have
the community manage it. That step forward is what allowed
Native Instruments to support multitrack files using the ISO
version of the QuickTime format MP4.

We would like to begin leading an initiative of open format
development for Stems, but also for cross DAW format exchange.
In the near term we will begin reaching out to our software
partners to drive this forward. If you’re interested in joining
the conversation, email us at open@splice.com.

blue bar background graphic

Comments (1)

  1. anon

    Friday, April 3, 2015

    wutevz son




Source link