How do AI interview assistants capture system audio?
By Aaron Cao · Updated

AI interview assistants capture system audio locally on your computer using the operating system's audio APIs — tapping the output stream from Zoom, Google Meet, or Teams — while a separate microphone stream captures your voice. No meeting bot joins the call.
What "system audio" means in an interview context
In a video interview, there are two distinct audio streams on your machine:
- Microphone input — your own voice, captured by the mic.
- System audio output — everything your computer is playing through the speakers, including the interviewer's voice coming from Zoom, Google Meet, or Microsoft Teams.
An AI interview assistant needs both streams to follow the conversation: the interviewer's questions (system audio) and your answers (microphone). Capturing only one side produces a partial transcript and weaker suggestions.
How system audio is captured on macOS and Windows
System audio capture relies on operating-system audio APIs rather than on the meeting app itself. The exact mechanism differs by platform:
- macOS — modern versions expose process and system audio taps through Core Audio. Older approaches used virtual audio devices (loopback drivers) that route the system output back in as an input.
- Windows — the Windows Audio Session API (WASAPI) supports loopback capture, which lets an application record whatever is being played out of a chosen output device.
Either way, the capture happens locally on your device. The assistant does not need to be "inside" Zoom or Teams; it reads the audio after the meeting app has already decoded it for playback. You can read more about the overall pipeline on the SubcueAI homepage or the tutorial.
How SubcueAI approaches dual audio capture
SubcueAI is a native desktop app for macOS and Windows. It uses dual audio capture: one stream for your microphone and one stream for system audio coming from the meeting app. Both streams are transcribed so the assistant can tell who said what.
- No meeting bot joins the call as a participant.
- No browser plugin or extension is installed in Zoom, Google Meet, or Teams.
- Suggestions appear in a floating local overlay on your own screen.
Because the overlay is rendered locally, it is not part of the video stream you send to the interviewer. For more on the design choices behind this, see About SubcueAI or how it compares to alternatives.
Honest limits of system-audio capture
System-audio capture works on your own personal computer. It does not change what an interviewer can observe in these situations:
- Screen sharing — if you share your entire screen, any local overlay window is visible to the interviewer.
- Screen recording or proctored exams — recording tools and proctoring software can capture overlays and running processes regardless of how audio is tapped.
- Company-managed or locked-down devices — IT policies may block third-party apps from installing or from accessing audio APIs.
- Headphones-only setups — if the meeting app routes audio to a Bluetooth headset in a way the OS does not expose, loopback capture can be inconsistent.
For more context on what is and is not observable, see Security.
FAQ
Does an AI interview assistant need a bot in the meeting to hear the interviewer?
Can Zoom, Google Meet, or Teams detect that system audio is being captured?
What permissions does SubcueAI need to capture audio?
Does dual audio capture work with Bluetooth headphones?
Is the captured audio uploaded somewhere?
Related questions
- What is an AI interview answers generator and how does it work?
- How does an AI generate interview answer suggestions in real time, during a live interview?
- How much latency does an AI interview assistant add during a live interview?
- Can an AI interview assistant transcribe both the interviewer and the candidate?
- What is an interview copilot and how does it work?
- What is an AI interview assistant and how does it work?