WWDC always have the best collection of music to groove to. During WWDC ’19, my friend used his Android phone to recognise songs played and made a playlist out of them. My iPhone lacked such capability back then (or I wasn’t aware of any). Now with ShazamKit to the rescue, let’s create an app called Musadora for this purpose!

In this article, we’ll go over the following -

  1. What is ShazamKit?
  2. How It Works
  3. ShazamKit App Services
  4. MusicRecognition
  5. Shazam’s Library
  6. Designing the UI using SwiftUI
  7. Background Recognition

Note - This is beta software, and can change anytime in the future. For following the tutorial, you’ll need Xcode 13.0 and iOS 15.0. When writing this post, I used Xcode 13.0 Beta 2, and iOS 15.0 Beta 2.

What is ShazamKit?

ShazamKit is a framework by Apple that helps you as a developer to integrate music recognition in your app. It may be either be recognising songs in Shazam’s catalogue or even your own custom audio. You can add the recognised song in Shazam’s music recognition history. Apple has surprisingly made it available for Android as well.

How It Works

I earlier assumed that Shazam actually matches the actual audio of the song with their vast library. But according to the documentation,

ShazamKit uses the unique acoustic signature of an audio recording to find a match. This signature captures the time-frequency distribution of the audio signal energy and is much smaller than the original audio. It’s also a one-way conversion, so it’s not probable to convert the signature back to the recording.

In simpler terms, Shazam creates a time-based frequency graph called Spectrogram for each audio in the catalogue. It then creates a signature based on the graph. A reference signature is a unique signature and the metadata with the information of the song.

The music catalogue of Shazam is a collection of such unique reference signatures. ShazamKit creates the audio representation of the recording we provide to match it with the catalogue. If there’s a sufficient part matching, it gives us back the information such as the title and artist of the song. Their algorithm is powerful enough to recognise music even in a noisy background.

As the audio representation cannot be converted back to the original recording, the content remains secure and private.

There’s a whole research paper on their algorithm if you’re curious to know more.

Getting Started

To get started working with ShazamKit and communicate with their services, we need to enable it for our app identifier. Head over to to the developer portal. Under Certificates, Identifiers, and Profiles, select the Identifiers tab from the sidebar and click the Add icon to create a new App ID.

Click continue. Name the Bundle ID according to your preference. Under App Services, check the ShazamKit for adding the capability for it.

ShazamKit

It is divided into three components -

  • Shazam catalogue recognition
  • Custom catalogue recognition
  • Library management

We’ll use Shazam catalogue recognition to identify songs in our app. Then, we’ll add them to Shazam’s library.

Music Recognition

Open Xcode and create a new SwiftUI iOS project with the name Musadora. Create a new Swift file named HomeViewModel.swift, and add the following class to it:

import Combine
import ShazamKit
import AVKit

@MainActor
class HomeViewModel: NSObject, ObservableObject {
  // 1
  @Published private(set) var mediaItems: [SHMediaItem] = []
  // 2
  @Published private(set) var isRecognizingSong = false

  // 3
  private let session = SHSession()

  // 4
  private let audioEngine = AVAudioEngine()

  private let feedback = UINotificationFeedbackGenerator()

  override init() {
    super.init()
    // 5
    session.delegate = self
  }
}

Here’s the breakdown of this code:

  1. An array of SHMediaItem. It represents the metadata for a reference signature. Every time Shazam finds a match, we’ll add the item in this array.
  2. A @Published variable isRecognizingSong, which helps us to manage the state of song recognition.
  3. The session object is used to manage audio recording matching.
  4. The reason for using AVAudioEngine is to simultaneously convert audio to a signature while recording.
  5. The results from the SHSession are communicated via its delegate. By setting the session’s delegate to self, we can access all its methods.

First, we start with the code to record the audio. Add the following in HomeViewModel.swift -

// MARK: Audio Recognition
extension HomeViewModel {
  // 1
  private func prepareAudioRecording() throws {
    let audioSession = AVAudioSession.sharedInstance()

    try audioSession.setCategory(.record)
    try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
  }

  // 2
  private func generateSignature() {
    let inputNode = engine.inputNode
    let recordingFormat = inputNode.outputFormat(forBus: .zero)

    inputNode.installTap(onBus: .zero, bufferSize: 1024,
                         format: recordingFormat) { [weak session] buffer, _ in
      session?.matchStreamingBuffer(buffer, at: nil)
    }
  }

  // 3
  private func startAudioRecording() throws {
    try engine.start()

    isRecognizingSong = true
  }
}

The recognition forms the core of the app, so here’s the detailed explanation:

  1. Start with a shared instance of AVAudioSession and configure it for recording audio from the microphone. Then, we activate the audio session.
  2. Create a variable inputNode to store the current audio input path. We create an audioFormat with the standard format. We configure the microphone input for the supported audio format. Then, we install an audio tap on the bus. In the block, we convert the audio in the buffer to a signature and search the reference signatures in the session catalogue.
  3. We start the audio engine and set isRecognizingSong to true.

To streamline the audio recognition process, we create two public methods to call in our views -

public func startRecognition() {
  feedback.prepare()

  // 1
  do {
    if engine.isRunning {
      stopRecognition()
      return
    }

    // 2
    try prepareAudioRecording()

    generateSignature()

    try startAudioRecording()

    feedback.notificationOccurred(.success)
  } catch {
    // Handle errors here
    print(error)
    feedback.notificationOccurred(.error)
  }
}

public func stopRecognition() {
  isRecognizingSong = false
  engine.stop()
  engine.inputNode.removeTap(onBus: .zero)
}

Going over the code -

  1. If the audio engine is already running, stop it and remove the tap from the input node.
  2. Prepare the audio for recording, generate the signature and start the audio recording.

The SHSessionDelegate provides us with two methods, one when the match is found, and another for no match. When we find a match, it returns a SHMatch object that represents the catalogue media items. We access the first item in the items. If the mediaItems already contain the song, we ignore it. If the song recognised is new, we add it to the array.

// MARK:- SHSessionDelegate
extension HomeViewModel: SHSessionDelegate {
  func session(_ session: SHSession, didFind match: SHMatch) {

    guard let mediaItem = match.mediaItems.first else { return }

    async {
      if mediaItems.contains(where: { $0.shazamID == mediaItem.shazamID }) {
        // Song already identified and in the list. Do nothing.
      } else {
        self.mediaItems.append(mediaItem)
      }
    }
  }
}

Add Music to Shazam’s Library

You can also add the recognised music in your app to Shazam’s library. For this, add the following code in HomeViewModel -

public func addToShazamLibrary() {
  SHMediaLibrary.default.add(mediaItems) { error in
    if let error = error {
      print(error)
    } else {
      let generator = UINotificationFeedbackGenerator()
      generator.notificationOccurred(.success)
    }
  }
}

We access the user’s default Shazam library and add the list of mediaItems to it.

Note - As of beta 3, Media items added to the default instance of SHMediaLibrary don’t appear in Shazam. (77785557)

Workaround: Touch and hold the Music Recognition Control Center module to view SHMediaLibrary contents.

With this, we’ve written the functionality required to recognise music with ShazamKit. Let’s design a UI to use it!

Designing the User Interface using SwiftUI

You can design the screen according to your liking. I’m using the album artwork and displaying the name of the song and the artist alongside it.

struct ShazamMusicCard: View {
  var item: SHMediaItem

  var body: some View {
    HStack {
      ArtworkImage(url: url) { image in
        image
          .scaledToFit()
          .transition(.opacity.combined(with: .scale))
      }
      .cornerRadius(12.0)
      .frame(width: 100.0, height: 100.0)

      VStack(alignment: .leading, spacing: 4.0) {
        Text(name)
          .fontWeight(.bold)
          .font(.callout)

        Text(artistName)
          .fontWeight(.light)
          .font(.caption)
      }
      .foregroundColor(.white)
      .frame(maxWidth: .infinity, alignment: .leading)
      .multilineTextAlignment(.leading)
    }
  }

  private var name: String {
    item.title ?? ""
  }

  private var artistName: String {
    item.artist ?? ""
  }

  private var url: URL? {
    item.artworkURL
  }
}

With a blurred background artwork image, I add the card on top of it -

struct ShazamMusicRow: View {
  var item: SHMediaItem

  var body: some View {
    ZStack {
      ArtworkImage(url: item.artworkURL) { image in
        image
          .scaledToFill()
          .layoutPriority(-1)
          .overlay(Color.black.opacity(0.8))
          .transition(.opacity.combined(with: .scale))
      }

      ShazamMusicCard(item: item)
        .background(.ultraThinMaterial)
    }
    .cornerRadius(12.0)
    .padding(4.0)
  }
}

ArtworkImage is a handy view to use the latest AsyncImage in SwiftUI Release 3.0, and some default views to handle different phases -

struct ArtworkImage<Content>: View where Content: View {
  private let url: URL?
  private var content: (_ image: Image) -> Content

  public init(url: URL?, @ViewBuilder content: @escaping (_ image: Image) -> Content) {
    self.url = url
    self.content = content
  }

  var body: some View {
    if let url = url {
      AsyncImage(url: url, transaction: .init(animation: .spring())) { phase in
        switch phase {
        case .empty: progressView()
        case .success(let image): content(image.resizable())
        case .failure(let error as NSError): errorView(with: error)
        @unknown default: unknownView()
        }
      }
    } else {
      Text("Wrong URL")
    }
  }

  private func progressView() -> some View {
    ProgressView().transition(.opacity.combined(with: .scale))
  }

  private func errorView(with error: NSError) -> some View {
    ZStack {
      Color.red.transition(.opacity.combined(with: .scale))

      Text(error.localizedDescription).foregroundColor(.white)
    }
    .transition(.opacity.combined(with: .scale))
  }

  private func unknownView() -> some View {
    Color.gray.transition(.opacity.combined(with: .scale))
  }
}

Create another SwiftUI file called HomeButtonsView.swift and add:

extension Image {
  func imageButton(with size: CGFloat, color: Color) -> some View {
    self
      .resizable()
      .scaledToFit()
      .frame(width: size, height: size)
      .foregroundColor(color)
  }
}

struct ButtonImageType {
    static let addToLibrary = "square.and.arrow.down.fill"
    static let startRecognition = "shazamIcon"
    static let stopRecognition = "stop.circle.fill"
}

struct HomeButtonsView: View {
  @ObservedObject var viewModel: HomeViewModel

  private let size = 50.0

  private var isRecognizingSong: Bool {
    viewModel.isRecognizingSong
  }

  var body: some View {
    HStack {
      // 1
      Button(action: { viewModel.addToShazamLibrary() }) {
          Image(systemName: ButtonImageType.addToLibrary)
          .imageButton(with: size, color: .green)
      }

      Spacer()

      // 2      
      Button(action: { viewModel.startRecognition() }) {
          Image(ButtonImageType.startRecognition)
          .imageButton(with: size * 2, color: .clear)
      }
      .disabled(isRecognizingSong)
      .scaleEffect(isRecognizingSong ? 0.8 : 1)
      .animation(songRecognitionAnimation(), value: isRecognizingSong)

      Spacer()

      // 3
      Button(action: { viewModel.stopRecognition() }) {
          Image(systemName: ButtonImageType.stopRecognition)
          .imageButton(with: size, color: .red)
      }
    }
    .padding(.horizontal, 24)
  }

  // 4
  private func songRecognitionAnimation() -> Animation {
    isRecognizingSong ? .easeInOut(duration: 1.5).repeatForever() : .default
  }
}

Here is what the code is doing:

  1. Adds the list of media items in Shazam’s library. If you open the control center and long press on Shazam’s icon, you’ll find the recognised songs with the app name mentioned.
  2. Start the audio recognition by clicking the Shazam button.
  3. Stop the audio recognition by clicking the stop button.
  4. Animate the button with an ease-in-out animation while it recognises songs.

Wrapping up the design process, we add the ShazamMusicRow in a list and HomeButtonsView below it -

struct ContentView: View {
  @StateObject private var viewModel = HomeViewModel()
  @Environment(\.openURL) var openURL

  var body: some View {
    NavigationView {
      VStack {
        List {
          ForEach(viewModel.mediaItems, id: \.shazamID) { item in
            Button(action: { openAppleMusic(with: item.appleMusicURL) }) {
              ShazamMusicRow(item: item)
            }
            .buttonStyle(.plain)
            .listRowSeparator(.hidden)
          }
        }
        .listStyle(.plain)

        HomeButtonsView(viewModel: viewModel)
      }
      .navigationTitle("Musadora")
    }
    .navigationViewStyle(.stack)
  }

  private func openAppleMusic(with url: URL?) {
    if let url = url {
      openURL(url)
    }
  }
}

We loop over the media items, and tapping individual rows opens that particular song in Apple Music.

With this, we’re ready to shazam everything! Run the app, and enjoy your own Shazam playlist app!

Background Recognition

The app currently recognises music when it’s in the foreground. Imagine you’re at a party and want to create a playlist of the good songs played. Making our recognition app more efficient, we’ll enable background mode so you can keep your phone in your pocket and enjoy the party!

Select the target Musadora, select Signing & Capabilities and click the Capability button. Search for Background Modes and add them. Under Background Modes, check to mark Audio, Airplay, and Picture in Picture. This helps us to record and recognise audio in the background.

Run the app, tap the Shazam button and lock the iPhone. Play some music, and recheck the app. You’ll find the recognised song on the list!

Conclusion

You can find the whole source code here - Source Code for Experimenting with ShazamKit

Shazam is a compelling application to almost instantly recognise music. With Apple giving us this power to experiment with, we can make creative apps out of it or integrate them into existing ones!

I hope you loved this post. If you did, don’t forget to share it with everyone! Let’s Shazam everything!