<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[OliNox AI]]></title><description><![CDATA[AI × Sound experiments]]></description><link>https://blog.olinox.ai</link><image><url>https://cdn.hashnode.com/uploads/logos/6802c3406275c65d6ecc73df/c8df081b-abd9-499b-91e0-64ee27020d9f.jpg</url><title>OliNox AI</title><link>https://blog.olinox.ai</link></image><generator>RSS for Node</generator><lastBuildDate>Sat, 16 May 2026 22:45:05 GMT</lastBuildDate><atom:link href="https://blog.olinox.ai/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Music GenAI Detection]]></title><description><![CDATA[I explore Deezer’s GenAI detection system through a series of hands-on experiments. This work follows the release of a music album created using a combination of instruments and generative AI within a]]></description><link>https://blog.olinox.ai/music-genai-detection</link><guid isPermaLink="true">https://blog.olinox.ai/music-genai-detection</guid><category><![CDATA[music]]></category><category><![CDATA[genai]]></category><category><![CDATA[suno]]></category><category><![CDATA[ai detection tool]]></category><dc:creator><![CDATA[Olivier]]></dc:creator><pubDate>Tue, 12 May 2026 09:14:36 GMT</pubDate><content:encoded><![CDATA[<p><em>I explore</em> <em><strong>Deezer’s GenAI detection</strong></em> <em>system through a series of hands-on experiments. This work follows the release of a music album created using a combination of instruments and generative AI within a DAW (Digital Audio Workstation). It looks at the motivations behind this approach, the current challenges of GenAI in music, possible detection algorithms, and the broader implications for how music is produced and evaluated today.</em></p>
<hr />
<p>Like in many other domains, several foundational open-source models for music emerged between 2022 and 2024, including AudioCraft from Meta, MusicLM from Google, Stable Audio, and others. These models marked an important step forward, yet none established a sustainable business model on their own.</p>
<p>Some companies seized the opportunity and developed proprietary systems, likely trained on large-scale music catalogs gathered from multiple sources, including streaming platforms. Udio builds on Stable Audio by Stability AI, while Suno relies on BARK, itself derived from MuLAN developed by Google. Beyond the ongoing debates around ethics, both for artists and researchers, the quality of the generated music has reached a remarkably high level, often creating a real “wow” effect.</p>
<p>Recently, a new generation of GenAI plugins has started to emerge within DAWs. Tools such as ACE or Synthesizer V Studio illustrate this shift. These solutions remain recent, yet they already show strong potential for producers working directly inside their production environments.</p>
<h2><strong>The problem: artificial streaming</strong></h2>
<p>Streaming platforms face a growing challenge with artificial streaming. Bot farms generate large volumes of plays to capture royalties, creating a system that can be exploited at scale. The rise of GenAI further lowers the barrier to entry, making it easier to produce large quantities of music tailored for this purpose.</p>
<p>GenAI music generation itself is not the core issue. The real challenge lies in how it can be used at scale within these systems, as well as in the question of copyright and the use of artists’ data to train these models. This is why streaming platforms are developing different approaches to address the problem.</p>
<h2><strong>The problem for the artist: the revenue</strong></h2>
<p>For artists, revenue distribution becomes increasingly diluted, and the value of genuine listening is harder to preserve within an ecosystem shaped by automated activity. This pressure extends an existing shift that began with the rise of streaming platforms more than a decade ago, pushing artists to adapt their models toward merchandising, public support, and live performances.</p>
<p>Copyright protection has always been a challenge, with gaps between countries. The way foundational models are trained, often to reproduce similar outputs from large datasets, also raises important questions.</p>
<h2><strong>Why I did this experiment?</strong></h2>
<p>As an engineer, curiosity naturally leads to experimentation. I recently released a music album, mainly for fun, on streaming platforms where I combined generative AI, instruments, and traditional production tools. GenAI was mainly used for vocals to sing my lyrics, along with contributions on other stems, resulting in a hybrid workflow that even included some Python-based mathematical generation for the bass.</p>
<p>After the release, Deezer flagged the album with the label “AI-generated content: This album includes tracks detected as generated with the use of AI.” Some tracks indeed incorporated Suno v5.0 before further work in the DAW. This triggered a deeper curiosity about how such detection is performed. Other streaming platforms, such as Spotify or Apple Music, did not flag the album as AI-generated, although they may also use detection systems.</p>
<h2><strong>Initial theory about Deezer’s AI detection</strong></h2>
<p>I initially suspected a clustering approach based on musical features extracted from each stem. Another hypothesis pointed toward a fingerprinting mechanism, similar to Shazam, applied to individual stems, such as vocal signatures, spectral artifacts, or structural patterns. In both cases, the first logical step would be to separate the stems before analysis.</p>
<h2><strong>Test method and limits</strong></h2>
<p>The approach consisted of releasing four singles, one every four days, and observing whether each track was flagged on Deezer. Each track used GenAI as a starting point, then underwent different levels of rearrangement and restructuring within the DAW. The objective was to identify which transformations would trigger Deezer’s GenAI detection.</p>
<p>Minimalist electronic music is particularly well suited for computational processing. I created four singles with a similar artistic direction while applying different production approaches to each. The lyrics and musical ideas were written by me, kept intentionally simple, with a focus on a recognizable style, clear scales/chords, and coherent, distinctive melodies. The primary goal of these tracks is experimentation.</p>
<p><strong>Test cases:</strong></p>
<img src="https://cdn.hashnode.com/uploads/covers/6802c3406275c65d6ecc73df/2e837782-0377-4dfb-8af6-b4a44a160f80.png" alt="" style="display:block;margin:0 auto" />

<p><strong>Material:</strong></p>
<ul>
<li><p><strong>Generation</strong>: Suno v5.5 with stems separation</p>
</li>
<li><p><strong>Digital Audio Workstation (DAW)</strong>: Ableton Live with plugins</p>
</li>
<li><p><strong>Controller / sound library</strong>: Arturia &amp; PML</p>
</li>
<li><p><strong>Hardware</strong>: Mac, Focusrite audio interface, Pioneer/Shure monitors</p>
</li>
</ul>
<p><strong>Limits:</strong></p>
<ul>
<li><p>No use of real instruments, only sound libraries and MIDI</p>
</li>
<li><p>Limited hardware and software setup</p>
</li>
<li><p>No access to Deezer’s internal detection systems</p>
</li>
</ul>
<h2><strong>T2: Be a Better Man - A Minor</strong></h2>
<p>The objective of T2 was to apply <strong>minimal transformation</strong> to the Suno v5.5 output after stem separation. The work focused on standard DAW processing, including EQ, Dynamic EQ, compression, limiting, and saturation, with no use of GenAI plugins. Mastering followed a traditional approach.</p>
<p>The expected outcome was detection as AI-generated content.</p>
<p>The result confirmed this expectation, as the <strong>track was flagged by Deezer.</strong> This suggests that basic post-processing is not sufficient to alter the underlying signal used for detection. The system appears to rely on deeper audio characteristics rather than simple clustering or surface-level features.</p>
<p><a class="embed-card" href="https://open.spotify.com/album/2GhkDN9RMtmCzPGWVFdT3D">https://open.spotify.com/album/2GhkDN9RMtmCzPGWVFdT3D</a></p>

<p><em>Also available in all other streaming platforms or</em> <a href="https://drive.google.com/file/d/1YnlHUdhA7_Xq0TBeh2U2fQkiCh4bhu-Z/view?usp=sharing"><em>here</em></a></p>
<h2><strong>T3: AI Rising - A Minor</strong></h2>
<p>The objective of T3 was to go further than T2 by <strong>restructuring</strong> the track after stem separation. This included rearranging sections, introducing cuts and offsets, and modifying the overall flow of the song within the DAW. I also added a layer of white noise and additional effects (FX) to further alter the texture. The same set of standard tools was used, including EQ, dynamic EQ, compression, limiting, and saturation, with no GenAI plugins. Mastering followed a traditional approach.</p>
<p>The expected outcome was less certain. The hypothesis was that structural changes, combined with added noise and effects, might weaken the detection signal or alter the classification.</p>
<p>The results shows that the track was still <strong>flagged by Deezer</strong> (possibly slower than T2). This indicates that modifying the structure and adding surface-level perturbations are not sufficient to remove the underlying characteristics identified by the detection system. The key takeaway is that the algorithm likely relies on deeper audio signatures that persist despite rearrangement and added noise, rather than on temporal structure or superficial texture alone.</p>
<p><a class="embed-card" href="https://open.spotify.com/album/7LAekp9yIkQU1mEnMVMvzu?si=s0S70PgOR3m42DM9-6VUlA">https://open.spotify.com/album/7LAekp9yIkQU1mEnMVMvzu?si=s0S70PgOR3m42DM9-6VUlA</a></p>

<p><em>Also available in all other streaming platforms or</em> <a href="https://drive.google.com/file/d/1HzHqg-mWIPBZo4ZBltNbXjl8inDFdsT8/view?usp=sharing"><em>here</em></a></p>
<h2><strong>T4a: Think Twice - G Minor</strong></h2>
<p>The objective of T4a was to go further by <strong>replacing a core musical component</strong>. After stem separation and restructuring in the DAW, I created a completely new drum track using a Drum Rack in Ableton, while keeping the other elements from the original workflow. Standard processing was applied, including EQ, Dynamic EQ, compression, limiting, and saturation, with no GenAI plugins. Mastering followed a traditional approach.</p>
<p>The expected outcome remained uncertain. Replacing the rhythmic foundation could alter the overall signature enough to impact detection.</p>
<p>The results shows that the track was <strong>not flagged by Deezer.</strong> This suggests that modifying a key component such as the drums can significantly affect detection outcome. The main takeaway is that the system appears sensitive to the internal consistency of the mix. Altering one major element may reduce the coherence of the original GenAI signal and bring it below the detection threshold.</p>
<p><a class="embed-card" href="https://open.spotify.com/album/4uWEhGSjdDaHokvy03BTlO?si=XjMCy1NoT1240eAYrz2sgg">https://open.spotify.com/album/4uWEhGSjdDaHokvy03BTlO?si=XjMCy1NoT1240eAYrz2sgg</a></p>

<p><em>Also available in all other streaming platforms or</em> <a href="https://drive.google.com/file/d/10WtSLGnAbSfdFQsLL0nkiDW7xwmgl3hj/view?usp=sharing"><em>here</em></a></p>
<h2><strong>T4b: Cities - G Minor</strong></h2>
<p>The objective of T4b was to push the transformation further by <strong>rebuilding the track almost entirely</strong>. After stem separation and restructuring in the DAW, all musical elements were replaced with new stems created manually in Ableton, while keeping only the original vocal. The arrangement, instrumentation, and overall texture were redesigned within Ableton. Standard processing was applied, including EQ, Dynamic EQ, compression, limiting, and saturation, with no GenAI plugins. Mastering followed a traditional approach.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6802c3406275c65d6ecc73df/8f8b1cc9-b1d7-448c-9d04-49257d7c19d7.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>What the results seem to suggest so far</h2>
<img src="https://cdn.hashnode.com/uploads/covers/6802c3406275c65d6ecc73df/8f1302c2-d490-4d53-89cb-94acb658970a.png" alt="" />

<p>Based on the experiments, a realistic <strong>assumption</strong> is that Deezer relies on a compact set of well-known audio ML building blocks rather than a single complex system. The pipeline can be simplified into a few core stages, each associated with concrete methods.</p>
<p><strong>Step 1: Representation learning (feature extraction)</strong><br />The audio is transformed into a format suitable for machine learning. This is most likely done using <strong>log-mel spectrograms</strong> or pretrained embedding models such as <strong>OpenL3</strong> or <strong>VGGish</strong>. A more advanced setup could rely directly on transformer-based encoders like <strong>AST (Audio Spectrogram Transformer)</strong>. The goal is to map raw audio into a feature space where generative signatures become separable.</p>
<p><strong>Step 2: Core classification model</strong><br />A supervised model is trained to distinguish between human-produced and GenAI-generated tracks. Likely candidates include:</p>
<ul>
<li><p><strong>CNN-based models</strong> such as <strong>PANNs (CNN14)</strong> for robustness and efficiency</p>
</li>
<li><p><strong>Transformer-based models</strong> like <strong>AST</strong> or <strong>HTS-AT</strong> for capturing long-range temporal patterns</p>
</li>
<li><p>Or a hybrid approach combining embeddings with a lighter classifier such as <strong>XGBoost</strong> or a small <strong>MLP</strong></p>
</li>
</ul>
<p>This model learns the global “texture” of GenAI audio, including spectral smoothness, transient behavior, and harmonic consistency.</p>
<p><strong>Step 3: Component-aware analysis</strong><br />To go beyond the full mix, the system likely incorporates some form of component-level reasoning. This can be done in two ways:</p>
<ul>
<li><p>explicit separation using models such as <strong>Demucs</strong> (INRIA for Meta ..) or <strong>Spleeter</strong> (created by Deezer), followed by per-stem analysis</p>
</li>
<li><p>or implicit decomposition within the model using <strong>attention mechanisms</strong> that specialize in different regions of the spectrum</p>
</li>
</ul>
<p>Each component can then be processed with the same type of classifier as in Step 2.</p>
<p><strong>Step 4: Cross-component consistency modeling</strong><br />A key assumption is that the system evaluates how different parts of the track relate to each other. This can be implemented with:</p>
<ul>
<li><p>similarity measures such as <strong>cosine similarity</strong> between embedding vectors</p>
</li>
<li><p>or learned fusion layers using <strong>multi-head attention</strong> or a small <strong>fusion MLP</strong></p>
</li>
</ul>
<p>This stage captures whether vocals, drums, and harmonic layers share a coherent generative signature.</p>
<p><strong>Step 5: Decision layer and thresholding</strong><br />The outputs from global and component-level models are aggregated. This can be done with:</p>
<ul>
<li><p>a simple <strong>weighted average</strong> (simple but efficient)</p>
</li>
<li><p>or a learned <strong>ensemble model</strong> such as <strong>logistic regression</strong> or <strong>gradient boosting</strong></p>
</li>
</ul>
<p>A <strong>confidence threshold</strong> determines whether the track is labeled as AI-generated. Multiple thresholds or passes may exist, which would explain delayed detection in some cases. If one track in the album is flagged, the entire album is flagged, probably to save computing resources.</p>
<hr />
<p>In this setup, the system combines a <strong>deep audio classifier (AST or PANNs)</strong> with <strong>optional stem-aware analysis</strong> and a <strong>fusion mechanism that evaluates consistency across components</strong>. This aligns well with the experiments, where structural edits alone have limited impact, while modifications to a key element of the mix can influence the final outcome.</p>
<hr />
<h3>Limits of the experiment</h3>
<p>Producing a track, even with the support of GenAI, still requires time, inspiration, and sustained effort. This naturally limited the scope of the experiment to four singles, each representing a distinct transformation approach.</p>
<p>The results should also be understood within a changing context. Additional variations between T3 and T4a could provide more data points, yet the detection system itself is likely evolving over time. Each observation reflects a specific moment, and future outcomes may differ as the platform updates its models and thresholds.</p>
<h2>Conclusion</h2>
<p>The experiments indicate that Deezer’s detection system is <strong>highly effective</strong> at identifying tracks generated with tools such as Suno when post-processing remains limited. The model appears robust to standard DAW transformations and captures deeper characteristics of the audio.</p>
<p>For Deezer users, there may be some interest in knowing how music is produced and whether GenAI is involved.</p>
<p>For GenAI platforms such as Suno and Udio, this type of detection is a win. It creates a feedback loop that may encourage better supervision of training data and help maintain clearer boundaries over time, avoiding AI pollution.</p>
<p>However, this approach <strong>does not address the broader issue</strong> of artificial streaming. The underlying economic pressure on artists remains, with revenue distribution still impacted by large-scale automated activity. Other platforms, such as Spotify, appear to focus more directly on detecting artificial streaming patterns, whether driven by GenAI or by large volumes of low-cost production music. Copyright issues also remain and extend beyond music.</p>
<p>In my opinion, the current business model of platforms such as Suno and Udio, based on classical multimodal approaches, may have limitations over time, with constraints in data and a tendency to be more reactive than anticipatory. New approaches such as I-JEPA seem promising to support artists in their music creation. Detection will likely evolve in the same direction. For both, it remains a million-dollar question: what is the purpose?</p>
<h3>Final words</h3>
<p>I truly enjoyed conducting this experiment. It opens the door to ongoing discussions about the future of music and reflects broader questions shared across many industries today. At its core, the issue remains simple: how we choose to use technology will shape what comes next.</p>
<hr />
<p><em>Next experiment: Lyria RealTime from Google.</em></p>
]]></content:encoded></item></channel></rss>