AI Chat

Status: draft. Concrete reference content + structural placeholders; flesh out the prose + screenshots when ready.

Purpose

A conversational interface to your VL3X and your library. Ask the AI to create presets from natural-language descriptions ("warm pop ballad lead with subtle plate reverb"), describe what an existing preset sounds like in musician terms, suggest and rank presets that fit a song, toggle effects on the device live, adjust mixer levels / key / scale / specific parameters on the active preset, build songs and setlists with cue-by-cue automation, and undo what it just made.

VL Studio uses your own AI API keys — no embedded keys, no telemetry, no usage that doesn't cross your own provider account.

VL Studio difference — AI is entirely new

The VL3X has no AI features. There is nothing on the device that creates presets from a description, describes what a preset sounds like, suggests presets for a song, or builds a setlist from a paragraph of natural language. The only ways to make a new preset standalone are to start from a similar preset and tweak it knob by knob, or to copy a preset to a new slot and re-edit it.

VL Studio's AI Chat adds a layer that doesn't exist on the device at all — a conversational partner that knows the VL3X's effect blocks, can read and write your active preset live, generate new presets, and undo what it just made. Because it uses your own API keys, there's no subscription on top of what your provider already charges.

Walk-through

First-time setup

Settings → AI Configuration — choose a provider:
- Claude — primary recommended provider. Add your Anthropic API key.
- Gemini — a cheap alternative for describe / enrich / suggest tasks. No native tool-calling, so chat is text-only with Gemini.
- OpenAI-compatible — works with any endpoint that speaks the OpenAI format. Useful for Grok, OpenAI, Mistral, or a local model running on your own machine.
(Optional) Per-task routing — for each task (generate, describe, suggest, chat) you can pick which provider handles it. Use a cheap fast model for describe, a smart model for chat. Providers without tool-calling support are eligible for chat as text-only — the dropdown labels them so.
(Optional) Per-task model override — within Claude or Gemini, pick a specific model per task. Useful for "cheap Haiku for describe, Sonnet for chat."

First-launch — no API keys

Open AI Chat without configuring a provider first, and the page renders normally but every send returns the error: "No AI provider configured for chat." Open Settings → AI Configuration and add at least one provider's API key.

Sending a message

Type and press Enter, or click Send. The Send button changes to a red Stop while the AI is working — click Stop to cancel a request that's taking too long or heading the wrong direction. Cancellation signals at the next round boundary (typically under 2 seconds depending on network latency and any in-flight tool call). Partial responses, including tool calls already made, are preserved in the session.

If the page goes idle waiting for the model for too long, a 90-second watchdog resets the UI so you can try again.

Empty-chat suggested prompts

When a chat session is empty (new chat, no messages yet), the page shows a panel of 12 suggested prompts to get you started — things like "Make me a warm ballad lead," "What's on the device right now?" or "Suggest presets for a 90s grunge cover band." Click any to fill the input field, then edit if you want and Send.

Voice input

Click the Mic icon in the input row and speak. Click again to stop. The transcript lands in the input field for review before sending, so you can fix anything Whisper misheard. Voice mode also nudges the AI toward terse, action-focused replies (good for "turn off the delay"); for longer brainstorming, type.

The nav-bar Mic button is different — it runs voice commands in the background without switching pages, and the AI's reply appears as a toast. Useful during performance or when you don't want the chat window to take focus.

Tool calls render inline

When the AI calls a tool, you see it inline in the conversation as the model is working:

Amber pulse while the tool call is in flight.
Emerald dot when it completes.

The display shows the tool name and a short result summary (e.g., Created 'Tone' (42 params) → preset #18). The display is a summary, not a raw JSON dump — for the full data, the AI's response that follows usually explains what it did.

Sessions

Every chat auto-saves about 100 ms after each turn completes. The Sessions drawer (top of the page) lists your history with a session count; click any past session to reload the full conversation — messages plus the AI's tool-call history. New Chat starts a fresh session. Each row has Rename and Delete controls. Sessions persist across app restarts.

Switching providers mid-conversation

Use the provider dropdown at the top of the chat to switch which AI you're talking to. The new provider sees a flattened text summary of what was said and the tool calls that happened, so the conversation continues smoothly even when switching from a tool-calling provider (Claude) to a text-only one (Gemini). Useful if you want a second opinion from a different AI partway through a chat.

Context usage bar

A thin teal bar in the header shows how much of the AI's memory window is filled. Color bands:

Green — up to 50% full.
Yellow — between 50% and 80% full.
Red — above 80% full.

When the window is near full, the oldest messages get trimmed automatically and a [N earlier messages trimmed] marker appears in the chat. You can adjust how aggressive the trimming is from Settings → Inference Parameters (Token Budget %).

If you see a yellow banner above the conversation, the AI's built-in system instructions have been overridden in Settings. That changes how the AI behaves — most importantly, it may stop being careful about telling you when it's unsure of a result. Visit Settings → System Prompts to reset to defaults if you didn't mean to change anything.

The 23 tools

The AI has access to a set of tools grouped by purpose. You don't address tools directly — the AI picks them based on what you ask.

Sound Design (6 tools)

Tool	What it does
generate_preset	Creates a new preset from a natural-language description.
describe_preset	Writes a short musician-facing description of an existing preset.
suggest_presets	Recommends presets from your library that fit a query.
list_presets	Lists presets in your library with filters (name, source, genre, slot).
inspect_preset	Reads back the parameter set of a specific preset for the AI to reason about.
push_preset	Sends a preset to its device slot.

Device Control (8 tools)

Tool	What it does
get_device_status	Reads the live device state — current preset slot, key/scale, effect block on/off.
select_preset	Switches the device's active preset.
toggle_effect	Turns an effect block on or off on the active preset.
set_key_scale	Sets the device's global key and scale.
get_mixer	Reads mixer-block levels (harmony, doubling, wet sends).
set_mixer	Writes mixer-block levels — the canonical lever for "make X louder."
list_preset_params	Looks up parameter names so the AI can write surgically. The AI calls this when it's not sure of a parameter's exact name.
set_preset_param	Writes a specific parameter on the active preset by name.

Songs & Setlists (7 tools)

Tool	What it does
create_song	Builds a new song with cues from a natural-language description.
list_songs	Lists songs in your library.
get_song	Reads back a specific song's cue structure.
create_setlist	Builds a setlist from a description.
list_setlists	Lists setlists in your library.
get_setlist	Reads back a specific setlist's entries.
enrich_lyrics	Fills in or refines per-cue brief lyrics for songs.

Ranking (1 tool)

Tool	What it does
rank_presets	Ranks a list of preset candidates by fit for a query — used internally to refine suggestion results.

Undo (1 tool)

Tool	What it does
undo_last_action	Pops the single most-recent thing the AI created (preset, song, or setlist) off the stack and deletes it. The stack resets when you restart the app.

AI Skills

User-editable markdown files that give the AI extra domain knowledge — the same idea as system-prompt "expertise notes," but per-topic.

How they work

Trigger matching — case-insensitive substring match. The AI loads a skill into context when your chat message contains any of the skill's trigger keywords.
Priority — higher priority skills are injected first when multiple match.
Token budget — up to 2000 tokens total can be injected per turn across all matching skills. Anything beyond that budget is dropped.
Disabled vs deleted — the Settings UI's per-skill toggle flips the enabled: false frontmatter flag, keeping the file on disk for later re-enable. Delete removes the file entirely.

The 25 starter skills

On first launch, VL Studio seeds your skills directory with 25 bundled skills covering the vocal blocks (harmony, doubling, delay, reverb, hardtune, synth, transducer, micromod, choir, rhythmic, stutter), the guitar blocks (amp, comp, wah, micromod, delay, reverb, rhythmic, octaver), and core concepts (preset basics, HIT function, NaturalPlay & key/scale, tempo & clocks, mix routing, global effects).

Each shipped skill tells the AI things like "when to use HardTune Pop style," "what a typical reverb decay for a ballad is," or "where the user-facing harmony loudness lever actually lives" — context that goes beyond what the AI can derive from a preset's parameters alone.

For the full file format spec (YAML frontmatter + body), see AI Skills (.md).

Voice input details

VL Studio uses Whisper for offline speech recognition with three model tiers, each downloadable from Hugging Face:

Tier	Model	Approximate size
Fast	`ggml-tiny.en.bin`	~75 MB
Balanced	`ggml-base.en.bin`	~142 MB
Accurate	`ggml-small.en-q5.bin`	~181 MB

The Fast model is bundled in the application and works without internet. The Balanced and Accurate models are downloaded on demand the first time you switch to them (Settings → Voice → download).

The voice mic in the chat input field captures audio, sends it to Whisper, and drops the transcript into the input field for you to review — no auto-send. Edit anything Whisper misheard, then click Send.

The nav-bar Mic is a separate path: it runs the voice command, fires the chat round in the background, and shows the AI's reply as a toast so the current page stays put. Use it during performance or any time you don't want chat to take focus.

Provider configuration

Claude

Native multi-round tool calling. Recommended default for chat.

Max tokens defaults — 8192 for tool-using tasks (chat, generate, suggest); 4096 for text-only describe.
Models — configured per provider plus per-task overrides.

Gemini

No native tool-calling. Eligible for chat as text-only (the chat dropdown labels it as such). Cheap alternative for describe / enrich / suggest tasks.

OpenAI-compatible

Works with any endpoint that speaks the OpenAI function-calling format. Per-provider fields:

base_url, api_key, model, supports_tools toggle, timeout_secs
Inference overrides: temperature, max_tokens, context_window (defaults to 32,768 if unset)

Configure multiple OpenAI-compatible providers side-by-side and route per task.

Custom instructions

Two text areas append to the system prompt without replacing it:

Global custom instructions — apply across every provider.
Per-provider custom instructions (Claude and Gemini) — apply only when that provider is the active one.

Use these for "always speak in metric units," "prefer terse responses," or any other persistent direction without unlocking the per-task system prompts.

Reference

Sessions — what gets saved

Each session stores:

The full message history (user, assistant, and tool-call entries).
The undo stack at the time of the last save.
The session name (auto-generated from the first user message, editable via Rename).

Auto-save fires about 100 ms after each turn completes — short enough that you won't lose more than a single message even on an unexpected app exit. Sessions persist across app restarts; the Sessions drawer shows a count of stored sessions.

Per-task system prompts (advanced)

Each AI task (generate / describe / suggest / chat) has a built-in system prompt that's been tuned for honest device interaction and musician-friendly language. They're previewable read-only by default; unlocking for edit requires a confirmation modal that warns the prompts have been tuned for good behaviour and changes can degrade it. Reset to default is always available.

Disabled tools

In Settings you can toggle individual tools off — the AI never sees them in the tool list. Use this if a specific tool keeps making bad decisions for your style of use.

Token budget

The Token Budget % slider (Settings → Inference Parameters, range 50–95, default 75) controls how much of the active provider's memory window is reserved for chat history vs. headroom for the AI's reply. Lower = more headroom for big tool results in long sessions; higher = remember more conversation turns. Default 75 is fine for most use.

When trimming happens, the chat shows a [N earlier messages trimmed...] marker so you know context was dropped.

What happens on errors

Tool returns an error — rendered inline with a red error indicator. The AI sees the error in its turn and usually explains what went wrong or tries a different approach.
Provider returns an error mid-response — the partial response is saved into history, and the error is surfaced as a status message.
Network failure during streaming — the 90-second watchdog resets the UI. Partial response stays in the session.
Empty voice clip — Whisper returns an empty transcript; nothing fills the input field.

Troubleshooting

Symptom	Fix
Send returns "No AI provider configured for chat"	Open Settings → AI Configuration and add at least one provider's API key.
AI says it adjusted a parameter but nothing happened on the device	Ask explicitly: "what's the parameter name for X?" The AI should call `list_preset_params` before writing. If you suspect a regression, file a bug.
AI calls "toggle effect" but the effect doesn't seem to engage	The effect's Control parameter is HIT-gated. Toggling the on/off bit alone doesn't make sound until HIT is pressed. Either engage HIT on the device, or change the effect's Control to plain On/Off.
"Lower the harmony level" doesn't seem to actually lower it	The user-facing loudness control for harmonies is the mixer-block Harmony level in VocalShaper's EQ/Mix tab, not the per-voice levels inside the Harmony block. Ask the AI to "use set_mixer to change the Harmony mixer level."
Voice mic does nothing when I click	OS-level microphone permission may be missing for the VL Studio app, or the Whisper voice model hasn't loaded. Check Settings → AI → Voice and confirm a model is selected.
Chat reply gets cut off mid-sentence	The provider's max-tokens setting is too low or the context budget is squeezing the response headroom. Raise `max_tokens` in Settings → AI → Inference Parameters, or lower the Token Budget %.
Provider errors with 401 / authentication	API key is wrong, expired, or doesn't have access to the model. Re-enter the key in Settings → AI.
Stop button doesn't actually stop	Stop signals at the next round boundary — generally under 2 seconds. If it really doesn't stop, refresh the page; partial responses are preserved in the session.
Page seems frozen waiting for the AI	A 90-second watchdog will reset the UI. If you don't want to wait, refresh the page; the session's partial state is auto-saved.
"Custom prompt active" banner won't go away	A per-task system prompt has been overridden. Settings → System Prompts → click Reset to Default on the overridden task.
Context bar is red and the AI seems to forget things	The window is over 80% full and trimming is dropping older messages. Either lower the Token Budget %, start a new chat session, or summarize the relevant context yourself in a fresh message.
Switched to Gemini and now the AI seems to forget tool calls happened	Gemini has no native tool calling — past tool calls are flattened to text for context continuity, but the AI can't continue calling tools in this session. Switch back to Claude or an OpenAI-compatible provider with tool support.
Voice transcript came out wrong	Whisper's small model can mishear technical terms. Edit the transcript in the input field before sending, or switch to the Balanced or Accurate tier in Settings → Voice.
Empty-chat suggested prompts disappeared	They only appear when a chat has zero messages. Start a new chat to see them again.
Sessions drawer shows the wrong count	Refresh the page; the count is read once on mount.

AI Chat

Purpose

Walk-through

First-time setup

First-launch — no API keys

Sending a message

Empty-chat suggested prompts

Voice input

Tool calls render inline

Sessions

Switching providers mid-conversation

Context usage bar

The yellow "Custom prompt active" banner

The 23 tools

Sound Design (6 tools)

Device Control (8 tools)

Songs & Setlists (7 tools)

Ranking (1 tool)

Undo (1 tool)

AI Skills

How they work

The 25 starter skills

Voice input details

Provider configuration

Claude

Gemini

OpenAI-compatible

Custom instructions

Reference

Sessions — what gets saved

Per-task system prompts (advanced)

Disabled tools

Token budget

What happens on errors

Troubleshooting

See also

AI Chat ​

Purpose ​

Walk-through ​

First-time setup ​

First-launch — no API keys ​

Sending a message ​

Empty-chat suggested prompts ​

Voice input ​

Tool calls render inline ​

Sessions ​

Switching providers mid-conversation ​

Context usage bar ​

The yellow "Custom prompt active" banner ​

The 23 tools ​

Sound Design (6 tools) ​

Device Control (8 tools) ​

Songs & Setlists (7 tools) ​

Ranking (1 tool) ​

Undo (1 tool) ​

AI Skills ​

How they work ​

The 25 starter skills ​

Voice input details ​

Provider configuration ​

Claude ​

Gemini ​

OpenAI-compatible ​

Custom instructions ​

Reference ​

Sessions — what gets saved ​

Per-task system prompts (advanced) ​

Disabled tools ​

Token budget ​

What happens on errors ​

Troubleshooting ​

See also ​

AI Chat

Purpose

Walk-through

First-time setup

First-launch — no API keys

Sending a message

Empty-chat suggested prompts

Voice input

Tool calls render inline

Sessions

Switching providers mid-conversation

Context usage bar

The yellow "Custom prompt active" banner

The 23 tools

Sound Design (6 tools)

Device Control (8 tools)

Songs & Setlists (7 tools)

Ranking (1 tool)

Undo (1 tool)

AI Skills

How they work

The 25 starter skills

Voice input details

Provider configuration

Claude

Gemini

OpenAI-compatible

Custom instructions

Reference

Sessions — what gets saved

Per-task system prompts (advanced)

Disabled tools

Token budget

What happens on errors

Troubleshooting

See also