AI Chat
Status: draft. Concrete reference content + structural placeholders; flesh out the prose + screenshots when ready.
Purpose
A conversational interface to your VL3X and your library. Ask the AI to create presets from natural-language descriptions ("warm pop ballad lead with subtle plate reverb"), describe what an existing preset sounds like in musician terms, suggest and rank presets that fit a song, toggle effects on the device live, adjust mixer levels / key / scale / specific parameters on the active preset, build songs and setlists with cue-by-cue automation, and undo what it just made.
VL Studio uses your own AI API keys — no embedded keys, no telemetry, no usage that doesn't cross your own provider account.
VL Studio difference — AI is entirely new
The VL3X has no AI features. There is nothing on the device that creates presets from a description, describes what a preset sounds like, suggests presets for a song, or builds a setlist from a paragraph of natural language. The only ways to make a new preset standalone are to start from a similar preset and tweak it knob by knob, or to copy a preset to a new slot and re-edit it.
VL Studio's AI Chat adds a layer that doesn't exist on the device at all — a conversational partner that knows the VL3X's effect blocks, can read and write your active preset live, generate new presets, and undo what it just made. Because it uses your own API keys, there's no subscription on top of what your provider already charges.
Walk-through
First-time setup
- Settings → AI Configuration — choose a provider:
- Claude — primary recommended provider. Add your Anthropic API key.
- Gemini — a cheap alternative for describe / enrich / suggest tasks. No native tool-calling, so chat is text-only with Gemini.
- OpenAI-compatible — works with any endpoint that speaks the OpenAI format. Useful for Grok, OpenAI, Mistral, or a local model running on your own machine.
- (Optional) Per-task routing — for each task (generate, describe, suggest, chat) you can pick which provider handles it. Use a cheap fast model for describe, a smart model for chat. Providers without tool-calling support are eligible for chat as text-only — the dropdown labels them so.
- (Optional) Per-task model override — within Claude or Gemini, pick a specific model per task. Useful for "cheap Haiku for describe, Sonnet for chat."
First-launch — no API keys
Open AI Chat without configuring a provider first, and the page renders normally but every send returns the error: "No AI provider configured for chat." Open Settings → AI Configuration and add at least one provider's API key.
Sending a message
Type and press Enter, or click Send. The Send button changes to a red Stop while the AI is working — click Stop to cancel a request that's taking too long or heading the wrong direction. Cancellation signals at the next round boundary (typically under 2 seconds depending on network latency and any in-flight tool call). Partial responses, including tool calls already made, are preserved in the session.
If the page goes idle waiting for the model for too long, a 90-second watchdog resets the UI so you can try again.
Empty-chat suggested prompts
When a chat session is empty (new chat, no messages yet), the page shows a panel of 12 suggested prompts to get you started — things like "Make me a warm ballad lead," "What's on the device right now?" or "Suggest presets for a 90s grunge cover band." Click any to fill the input field, then edit if you want and Send.
Voice input
Click the Mic icon in the input row and speak. Click again to stop. The transcript lands in the input field for review before sending, so you can fix anything Whisper misheard. Voice mode also nudges the AI toward terse, action-focused replies (good for "turn off the delay"); for longer brainstorming, type.
The nav-bar Mic button is different — it runs voice commands in the background without switching pages, and the AI's reply appears as a toast. Useful during performance or when you don't want the chat window to take focus.
Tool calls render inline
When the AI calls a tool, you see it inline in the conversation as the model is working:
- Amber pulse while the tool call is in flight.
- Emerald dot when it completes.
The display shows the tool name and a short result summary (e.g., Created 'Tone' (42 params) → preset #18). The display is a summary, not a raw JSON dump — for the full data, the AI's response that follows usually explains what it did.
Sessions
Every chat auto-saves about 100 ms after each turn completes. The Sessions drawer (top of the page) lists your history with a session count; click any past session to reload the full conversation — messages plus the AI's tool-call history. New Chat starts a fresh session. Each row has Rename and Delete controls. Sessions persist across app restarts.
Switching providers mid-conversation
Use the provider dropdown at the top of the chat to switch which AI you're talking to. The new provider sees a flattened text summary of what was said and the tool calls that happened, so the conversation continues smoothly even when switching from a tool-calling provider (Claude) to a text-only one (Gemini). Useful if you want a second opinion from a different AI partway through a chat.
Context usage bar
A thin teal bar in the header shows how much of the AI's memory window is filled. Color bands:
- Green — up to 50% full.
- Yellow — between 50% and 80% full.
- Red — above 80% full.
When the window is near full, the oldest messages get trimmed automatically and a [N earlier messages trimmed] marker appears in the chat. You can adjust how aggressive the trimming is from Settings → Inference Parameters (Token Budget %).
The yellow "Custom prompt active" banner
If you see a yellow banner above the conversation, the AI's built-in system instructions have been overridden in Settings. That changes how the AI behaves — most importantly, it may stop being careful about telling you when it's unsure of a result. Visit Settings → System Prompts to reset to defaults if you didn't mean to change anything.
The 23 tools
The AI has access to a set of tools grouped by purpose. You don't address tools directly — the AI picks them based on what you ask.
Sound Design (6 tools)
| Tool | What it does |
|---|---|
| generate_preset | Creates a new preset from a natural-language description. |
| describe_preset | Writes a short musician-facing description of an existing preset. |
| suggest_presets | Recommends presets from your library that fit a query. |
| list_presets | Lists presets in your library with filters (name, source, genre, slot). |
| inspect_preset | Reads back the parameter set of a specific preset for the AI to reason about. |
| push_preset | Sends a preset to its device slot. |
Device Control (8 tools)
| Tool | What it does |
|---|---|
| get_device_status | Reads the live device state — current preset slot, key/scale, effect block on/off. |
| select_preset | Switches the device's active preset. |
| toggle_effect | Turns an effect block on or off on the active preset. |
| set_key_scale | Sets the device's global key and scale. |
| get_mixer | Reads mixer-block levels (harmony, doubling, wet sends). |
| set_mixer | Writes mixer-block levels — the canonical lever for "make X louder." |
| list_preset_params | Looks up parameter names so the AI can write surgically. The AI calls this when it's not sure of a parameter's exact name. |
| set_preset_param | Writes a specific parameter on the active preset by name. |
Songs & Setlists (7 tools)
| Tool | What it does |
|---|---|
| create_song | Builds a new song with cues from a natural-language description. |
| list_songs | Lists songs in your library. |
| get_song | Reads back a specific song's cue structure. |
| create_setlist | Builds a setlist from a description. |
| list_setlists | Lists setlists in your library. |
| get_setlist | Reads back a specific setlist's entries. |
| enrich_lyrics | Fills in or refines per-cue brief lyrics for songs. |
Ranking (1 tool)
| Tool | What it does |
|---|---|
| rank_presets | Ranks a list of preset candidates by fit for a query — used internally to refine suggestion results. |
Undo (1 tool)
| Tool | What it does |
|---|---|
| undo_last_action | Pops the single most-recent thing the AI created (preset, song, or setlist) off the stack and deletes it. The stack resets when you restart the app. |
AI Skills
User-editable markdown files that give the AI extra domain knowledge — the same idea as system-prompt "expertise notes," but per-topic.
How they work
- Trigger matching — case-insensitive substring match. The AI loads a skill into context when your chat message contains any of the skill's trigger keywords.
- Priority — higher priority skills are injected first when multiple match.
- Token budget — up to 2000 tokens total can be injected per turn across all matching skills. Anything beyond that budget is dropped.
- Disabled vs deleted — the Settings UI's per-skill toggle flips the
enabled: falsefrontmatter flag, keeping the file on disk for later re-enable. Delete removes the file entirely.
The 25 starter skills
On first launch, VL Studio seeds your skills directory with 25 bundled skills covering the vocal blocks (harmony, doubling, delay, reverb, hardtune, synth, transducer, micromod, choir, rhythmic, stutter), the guitar blocks (amp, comp, wah, micromod, delay, reverb, rhythmic, octaver), and core concepts (preset basics, HIT function, NaturalPlay & key/scale, tempo & clocks, mix routing, global effects).
Each shipped skill tells the AI things like "when to use HardTune Pop style," "what a typical reverb decay for a ballad is," or "where the user-facing harmony loudness lever actually lives" — context that goes beyond what the AI can derive from a preset's parameters alone.
For the full file format spec (YAML frontmatter + body), see AI Skills (.md).
Voice input details
VL Studio uses Whisper for offline speech recognition with three model tiers, each downloadable from Hugging Face:
| Tier | Model | Approximate size |
|---|---|---|
| Fast | ggml-tiny.en.bin | ~75 MB |
| Balanced | ggml-base.en.bin | ~142 MB |
| Accurate | ggml-small.en-q5.bin | ~181 MB |
The Fast model is bundled in the application and works without internet. The Balanced and Accurate models are downloaded on demand the first time you switch to them (Settings → Voice → download).
The voice mic in the chat input field captures audio, sends it to Whisper, and drops the transcript into the input field for you to review — no auto-send. Edit anything Whisper misheard, then click Send.
The nav-bar Mic is a separate path: it runs the voice command, fires the chat round in the background, and shows the AI's reply as a toast so the current page stays put. Use it during performance or any time you don't want chat to take focus.
Provider configuration
Claude
Native multi-round tool calling. Recommended default for chat.
- Max tokens defaults — 8192 for tool-using tasks (chat, generate, suggest); 4096 for text-only describe.
- Models — configured per provider plus per-task overrides.
Gemini
No native tool-calling. Eligible for chat as text-only (the chat dropdown labels it as such). Cheap alternative for describe / enrich / suggest tasks.
OpenAI-compatible
Works with any endpoint that speaks the OpenAI function-calling format. Per-provider fields:
base_url,api_key,model,supports_toolstoggle,timeout_secs- Inference overrides:
temperature,max_tokens,context_window(defaults to 32,768 if unset)
Configure multiple OpenAI-compatible providers side-by-side and route per task.
Custom instructions
Two text areas append to the system prompt without replacing it:
- Global custom instructions — apply across every provider.
- Per-provider custom instructions (Claude and Gemini) — apply only when that provider is the active one.
Use these for "always speak in metric units," "prefer terse responses," or any other persistent direction without unlocking the per-task system prompts.
Reference
Sessions — what gets saved
Each session stores:
- The full message history (user, assistant, and tool-call entries).
- The undo stack at the time of the last save.
- The session name (auto-generated from the first user message, editable via Rename).
Auto-save fires about 100 ms after each turn completes — short enough that you won't lose more than a single message even on an unexpected app exit. Sessions persist across app restarts; the Sessions drawer shows a count of stored sessions.
Per-task system prompts (advanced)
Each AI task (generate / describe / suggest / chat) has a built-in system prompt that's been tuned for honest device interaction and musician-friendly language. They're previewable read-only by default; unlocking for edit requires a confirmation modal that warns the prompts have been tuned for good behaviour and changes can degrade it. Reset to default is always available.
Disabled tools
In Settings you can toggle individual tools off — the AI never sees them in the tool list. Use this if a specific tool keeps making bad decisions for your style of use.
Token budget
The Token Budget % slider (Settings → Inference Parameters, range 50–95, default 75) controls how much of the active provider's memory window is reserved for chat history vs. headroom for the AI's reply. Lower = more headroom for big tool results in long sessions; higher = remember more conversation turns. Default 75 is fine for most use.
When trimming happens, the chat shows a [N earlier messages trimmed...] marker so you know context was dropped.
What happens on errors
- Tool returns an error — rendered inline with a red error indicator. The AI sees the error in its turn and usually explains what went wrong or tries a different approach.
- Provider returns an error mid-response — the partial response is saved into history, and the error is surfaced as a status message.
- Network failure during streaming — the 90-second watchdog resets the UI. Partial response stays in the session.
- Empty voice clip — Whisper returns an empty transcript; nothing fills the input field.
Troubleshooting
| Symptom | Fix |
|---|---|
| Send returns "No AI provider configured for chat" | Open Settings → AI Configuration and add at least one provider's API key. |
| AI says it adjusted a parameter but nothing happened on the device | Ask explicitly: "what's the parameter name for X?" The AI should call list_preset_params before writing. If you suspect a regression, file a bug. |
| AI calls "toggle effect" but the effect doesn't seem to engage | The effect's Control parameter is HIT-gated. Toggling the on/off bit alone doesn't make sound until HIT is pressed. Either engage HIT on the device, or change the effect's Control to plain On/Off. |
| "Lower the harmony level" doesn't seem to actually lower it | The user-facing loudness control for harmonies is the mixer-block Harmony level in VocalShaper's EQ/Mix tab, not the per-voice levels inside the Harmony block. Ask the AI to "use set_mixer to change the Harmony mixer level." |
| Voice mic does nothing when I click | OS-level microphone permission may be missing for the VL Studio app, or the Whisper voice model hasn't loaded. Check Settings → AI → Voice and confirm a model is selected. |
| Chat reply gets cut off mid-sentence | The provider's max-tokens setting is too low or the context budget is squeezing the response headroom. Raise max_tokens in Settings → AI → Inference Parameters, or lower the Token Budget %. |
| Provider errors with 401 / authentication | API key is wrong, expired, or doesn't have access to the model. Re-enter the key in Settings → AI. |
| Stop button doesn't actually stop | Stop signals at the next round boundary — generally under 2 seconds. If it really doesn't stop, refresh the page; partial responses are preserved in the session. |
| Page seems frozen waiting for the AI | A 90-second watchdog will reset the UI. If you don't want to wait, refresh the page; the session's partial state is auto-saved. |
| "Custom prompt active" banner won't go away | A per-task system prompt has been overridden. Settings → System Prompts → click Reset to Default on the overridden task. |
| Context bar is red and the AI seems to forget things | The window is over 80% full and trimming is dropping older messages. Either lower the Token Budget %, start a new chat session, or summarize the relevant context yourself in a fresh message. |
| Switched to Gemini and now the AI seems to forget tool calls happened | Gemini has no native tool calling — past tool calls are flattened to text for context continuity, but the AI can't continue calling tools in this session. Switch back to Claude or an OpenAI-compatible provider with tool support. |
| Voice transcript came out wrong | Whisper's small model can mishear technical terms. Edit the transcript in the input field before sending, or switch to the Balanced or Accurate tier in Settings → Voice. |
| Empty-chat suggested prompts disappeared | They only appear when a chat has zero messages. Start a new chat to see them again. |
| Sessions drawer shows the wrong count | Refresh the page; the count is read once on mount. |
See also
- Quickstart — first-launch setup, Pull All from device.
- Library — the preset library the AI reads from and writes into (via
list_presets,generate_preset,push_preset, and others). - Songs / Setlists — the AI can build these from natural-language descriptions and enrich existing lyrics.
- Settings — AI provider configuration, voice settings, system prompts, AI Skills editor.
- Sharing & Files — exporting AI Skills as standalone
.mdfiles for sharing. - AI Skills (.md) — the markdown file format used for skills.
