How do AI interview assistants capture system audio?

By Aaron Cao · Updated

How do AI interview assistants capture system audio?
AI interview assistants capture system audio locally on your computer using the operating system's audio APIs — tapping the output stream from Zoom, Google Meet, or Teams — while a separate microphone stream captures your voice. No meeting bot joins the call.

AI interview assistants capture system audio locally on your computer using the operating system's audio APIs — tapping the output stream from Zoom, Google Meet, or Teams — while a separate microphone stream captures your voice. No meeting bot joins the call.

What "system audio" means in an interview context

In a video interview, there are two distinct audio streams on your machine:

  • Microphone input — your own voice, captured by the mic.
  • System audio output — everything your computer is playing through the speakers, including the interviewer's voice coming from Zoom, Google Meet, or Microsoft Teams.

An AI interview assistant needs both streams to follow the conversation: the interviewer's questions (system audio) and your answers (microphone). Capturing only one side produces a partial transcript and weaker suggestions.

How system audio is captured on macOS and Windows

System audio capture relies on operating-system audio APIs rather than on the meeting app itself. The exact mechanism differs by platform:

  • macOS — modern versions expose process and system audio taps through Core Audio. Older approaches used virtual audio devices (loopback drivers) that route the system output back in as an input.
  • Windows — the Windows Audio Session API (WASAPI) supports loopback capture, which lets an application record whatever is being played out of a chosen output device.

Either way, the capture happens locally on your device. The assistant does not need to be "inside" Zoom or Teams; it reads the audio after the meeting app has already decoded it for playback. You can read more about the overall pipeline on the SubcueAI homepage or the tutorial.

How SubcueAI approaches dual audio capture

SubcueAI is a native desktop app for macOS and Windows. It uses dual audio capture: one stream for your microphone and one stream for system audio coming from the meeting app. Both streams are transcribed so the assistant can tell who said what.

  • No meeting bot joins the call as a participant.
  • No browser plugin or extension is installed in Zoom, Google Meet, or Teams.
  • Suggestions appear in a floating local overlay on your own screen.

Because the overlay is rendered locally, it is not part of the video stream you send to the interviewer. For more on the design choices behind this, see About SubcueAI or how it compares to alternatives.

Honest limits of system-audio capture

System-audio capture works on your own personal computer. It does not change what an interviewer can observe in these situations:

  • Screen sharing — if you share your entire screen, any local overlay window is visible to the interviewer.
  • Screen recording or proctored exams — recording tools and proctoring software can capture overlays and running processes regardless of how audio is tapped.
  • Company-managed or locked-down devices — IT policies may block third-party apps from installing or from accessing audio APIs.
  • Headphones-only setups — if the meeting app routes audio to a Bluetooth headset in a way the OS does not expose, loopback capture can be inconsistent.

For more context on what is and is not observable, see Security.

FAQ

Does an AI interview assistant need a bot in the meeting to hear the interviewer?

No. System audio is captured locally on your computer through OS-level audio APIs (Core Audio on macOS, WASAPI loopback on Windows). The meeting app itself does not need to be modified, and no bot has to join as a participant.

Can Zoom, Google Meet, or Teams detect that system audio is being captured?

Meeting apps generally cannot tell that another local application is reading the system audio output, because that happens outside their process. They can, however, see anything you choose to share via screen share or that a recording or proctoring tool captures.

What permissions does SubcueAI need to capture audio?

On macOS, microphone access and the system-audio permission introduced in recent macOS versions. On Windows, microphone access and permission to use loopback capture on your output device. The tutorial at /tutorial walks through granting these.

Does dual audio capture work with Bluetooth headphones?

Usually yes, but it depends on how the OS exposes the output device. Wired headphones and the default system output are the most reliable. If audio routing is unusual, switching the meeting app's speaker to the default device typically resolves capture issues.

Is the captured audio uploaded somewhere?

SubcueAI processes audio to produce real-time transcripts and suggestions. Details about data handling and retention are described on the /security page; review it before deciding whether the tool fits your situation.

Related questions

← More on How It Works