Vibe Pwning with GitHub Copilot

2025-10-28

Word count: 2.6k | Reading time≈ 16 min

Intro

Of the many uses of GenAI hitting technology stacks today, AI-assisted coding platforms offer one of the most compelling applications of LLM text generation. These platforms enable tech enthusiasts with an idea to make it a reality and speed up routine software development through context-aware code completions.

But this technology is not without its dark side. One element that gets a lot of attention is letting the LLM do the heavy lifting with limited human-generated code (AKA “vibecoding”), which can lead to flaws ranging from security gaps and missed requirements to code that can’t be easily modified or scaled.

Another element, the subject of our research in this blog, is the attack surface opened by integrating an LLM into development environments, where access to powerful developer tools allows the LLM to not only write code but execute the code or other commands on the system.

This topic was the subject of several recent blogs and conference talks where researchers demonstrated entire kill-chains that led to data theft or remote code execution. In some cases, those attack paths required zero interaction from the developer aside from asking the LLM to summarize the code project.

While many of the specific paths discussed in those blogs and talks have been mitigated to a degree, the very nature of AI-assisted coding capabilities and the non-deterministic nature of LLM text generation make this attack surface difficult to fully secure. We recently had the opportunity to dig further into GitHub Copilot for Visual Studio Code (VS Code) as part of an exercise, looking for new ways to achieve similar effects of data theft or remote code execution.

In the end, we discovered an Elevation of Privilege vulnerability in VSCode core and it was patched by Microsoft.

Note: The remainder of the blog will simply refer to GitHub Copilot as Copilot. This is not to be confused with other instances of Copilot, such as those tied to Microsoft 365 applications. Also, the standard disclaimer applies - this is for educational purposes to continue to raise awareness about security pitfalls of GenAI products. Always get approval for testing.

RCE via GitHub Copilot

Background

A recent blog from Johann Rehberger discussed a CVE awarded to multiple researchers (CVE-2025-53773) affecting GitHub Copilot, wherein a malicious prompt inserted in the code led to full remote code execution (RCE) with zero user approvals required.

When the victim asked Copilot to take some action that interacted with the compromised code like “summarize this project,” the hidden instructions triggered Copilot to modify the VS Code settings file. In the case of this attack, the instructions enabled “YOLO” (You Only Live Once) mode - an experimental setting that automatically approves all actions suggested by Copilot (setting: "chat.tools.autoApprove": true).

Copilot has the ability to read and write files in the current “workspace” - the open folder and subfolders - and any other files the user has open in the editor. For security purposes, Copilot cannot access other files on the system. The root of the issue stemmed from the presence of distinct workspace settings (workspace/.vscode/settings.json). Because Copilot could alter this file unrestricted, Copilot could enable YOLO mode with no user intervention if compromised by an attacker.

To fix this, Microsoft removed YOLO mode from workspace settings in favor of using the global profile settings, located elsewhere. As a result, Copilot can no longer directly enable the setting unless the user opens the global settings file within VS Code.

Problem solved? Partially.

While the patch was effective, returning to a statement from the introduction: the very nature of these platforms gives them extensive capabilities that you cannot simply patch out without degrading the product. VS Code is not the judge or jury for the code you write and execute in the platform, it simply facilitates and aids the process. As a result, attackers can still add YOLO mode and gain remote code execution with a few additional steps.

To be clear, the demonstration below is not a new CVE, it takes advantage of the platform’s core functionality. Another key-caveat is that this is no longer a zero-click exploit. With a little Red Team ingenuity to come up with the right pretext, though, it can still be a potent attack. Our goal in highlighting this bypass is to illustrate that securing GenAI tooling is not as straight-forward as issuing a patch, as may be the case with traditional applications, and organizations employing GenAI tools will face residual risks requiring other mitigations.

The Sitch

In a recent op, we had secondary objectives to target developers and the development environment and use GenAI tools when possible. We had access to a Git service and Jira. So, what could we do?

Search for creds (pff, boring)
Poison legitimate programs and wait for them to be deployed (lots of moving pieces, need to get a commit approved)
Get a developer to run a malicious command for us (well now that’s interesting…)

We ended up going with number three, because it’s fun (okay, okay, we also looked for creds). At a high level, the attack path was as follows: the attacker (your friendly neighborhood Red Team, in this case) creates a bug report in Jira for an active project. Inside the bug report are hidden instructions that:

Enable YOLO mode.
Disguise the artifacts of the attack.
Execute operating system commands under seemingly benign auspices to gain remote code execution.

The unsuspecting developer copies the Jira ticket into GitHub Copilot, leading to unintended effects.

The Exploit Part 1: The Insertion

Problem #1: How do we get a developer to run a malicious command?

By not letting them know they’re doing it, of course.

In the course of the op, we observed training materials instructing developers to cut and paste the description or other ticket information from Jira into the Copilot chat, adding a small request like “help me find this bug.” Pretty common these days as companies try to accelerate with GenAI.

So, how could we use that? Enter the zero-width Unicode characters all the cool kids are using.

Zero-width Unicode characters serve functional purposes like controlling text direction. In modern digital typography, these invisible characters help ensure text displays correctly across different systems by controlling how adjacent characters interact without adding visible space or marks.

Because these characters are present in training datasets, LLMs know how to interpret them, allowing attackers to encode their instructions in a way that only the LLM will see. The Red Team found that these hidden characters were preserved in Jira comments and text fields, making it a perfect opportunity to poison a legitimate source of information with little evidence.

Zero-width Unicode + Jira bug ticket == ticking time bomb

The Exploit Part 2: The YOLO Maneuver

Problem 2: YOLO mode was patched, right?

Alas, it’s true. But only for Copilot…

Copilot can’t edit the settings file directly, but the code Copilot writes can do whatever the user can do so… let’s just use that.

We targeted a Python project because of its interpreted nature and LLMs are quite adept at basic Python code. The first set of instructions given to the LLM in the hidden characters asked it to generate a new Python file, copilot_setup.py. This file contained code to modify the user’s global VS Code profile under the auspices of “getting ready to debug.”

Copilot consistently generated functioning code. Something to keep in mind as you’re building LLM-based payloads, though, is that they’re non-deterministic. Even though the code worked 90%+ of the time, there were a few times that required revision. The nifty thing about LLMs is that Copilot could revise the code itself in those cases, but such an error increased the time for the user to catch on to the attack.

But then we ran into a different problem…

The Exploit Part 3: The Bamboozle

Problem 3: Wait, they can see what we’re doing…

We can’t have that.

Because Copilot wasn’t YOLO-ing yet, executing this “setup” file still required user permission. Additionally, VS Code opens newly edited files in the editor window, which exposed the exact nature of the new code to the user. This would likely decrease the odds of clicking “continue.”

To hide the file from view for longer, we noticed that multiple file creations or changes resulted in the first file presenting as “active” while the subsequent files opened in tabs to the right.

So, we thought, let’s just create an arbitrary and capricious “debug log” first. That will hide the new Python code unless the developer explicitly opens the tab. As for getting the user to accept the command? Well… we found that Copilot supplied the description for the command we want to run, so we could manufacture an explanation of dubious veracity (that means lie).

Combining these approaches into new instructions, we got a pretty reasonable pretext. The last hurdle was the user approval. If approved, we’d have a shell (or data exfil) coming our way in however long it took for Copilot to meander through its answer. Which… could be 10 seconds or it could work itself into a tizzy actually debugging first. Either way, we were pretty confident a developer that’s cutting and pasting from Jira will probably not object too much to running a setup script, because LLMs do weird things sometimes.

The Exploit Part 4: Giving them the business

~~Problem~~ Step 4: Figuring out what to do with our power

Oh, the possibilities…

With auto-approve enabled and the files cleaned up, we could run any OS command desired with no user input, requiring user attentiveness and quick actions to stop a malicious command.

For the sake of simplicity, the demonstration path culminates with a request to a webpage hosting a command to spawn Calculator. Don’t let the AI glaze make you forget that you still need to be a good Red Teamer and come up with a pretext and reasonable-looking command, because Copilot is going to tell the user what it’s doing.

An Alternate Approach: Malicious VS Code Extensions

As mentioned in the above discussion, VS Code has internal commands offered via an API that Copilot can call directly with user approval. As an alternative means of delivering malicious code to the developers, we found that the LLM could be convinced to call the workbench.extensions.installExtension command with an arbitrary extension ID and name. The given name did not need to match the official name in the extension registry, so go nuts.

You can combine this with indirect prompt injection through code files or using the path discussed in the last section.

0day - Circumventing Copilot Workspace Restrictions

Stemming from the prior research, we identified a more serious vulnerability that circumvented Copilot restrictions, allowing us to access any file on the system to which the user had access. We submitted this to Microsoft who validated and patched the issue.

As discussed, Copilot tooling is typically restricted to the active workspace or files already open in the editor to limit the impact to the host from LLM hallucination or compromise. For example, the readFile tool explicitly checks whether the file is in the workspace or open in a tab during its preparation.

However, VS Code offers a fetchPage tool that accepts both HTTP and file URI patterns. Because this tool is offered by VS Code itself and not Copilot (it is invoked by Copilot using a wrapper), it does not have the same restrictions on accessing files outside the workspace. To be clear, this VS Code tool is not itself a vulnerability and aligns with normal functionality for a development environment. How else would you open stuff?

With the appropriate prompt, Copilot could be coerced to utilize this tool rather than its own readFile tool, circumventing Copilot restrictions to read arbitrary system files. In addition, the user did not receive an approval prompt when supplying a file URI because the wrapper for this tool only required user approval for untrusted web URLs. This has been fixed, unless you have autoApprove enabled in which case… YOLO.

While arbitrary file reads are concerning, the user already had access to anything that opened and Copilot still could not modify the files, making this issue largely moot. To have impact, the information needed to be exfiltrated to an unauthorized location. A common method for exfiltrating LLM output is the use of Markdown images, which can be used to send an arbitrary GET to the supplied image source. VS Code, however, does not render the Markdown syntax directly and converts it to a plain, HTML string.

Digging further, we determined that VS Code’s handling of JSON files with Intellisense could be abused to trigger arbitrary HTTP requests. When a new JSON file is created with a $schema key and a URL as the value, Intellisense automatically sends a request to fetch the schema. By having Copilot create a JSON file, it could append contents from the arbitrary file read to the schema URL to exfiltrate information.

As a demonstration, the below prompt exfiltrated a fake API key from a file by combining the arbitrary file read with the $schema exfiltration technique, none of which required user approval. This could be further combined with previously discussed delivery obfuscation mechanisms to hide the prompt, though the output into the chat sidebar will be much harder to explain as benign.

Conclusion

While the arbitrary file read constitutes a full-fledged vulnerability, the tactics and techniques covered in this blog primarily focus on LLMs susceptibility to manipulation through the data sources they operate on. The same concerns a company has about a well-intentioned employee inadvertently doing something harmful apply to LLMs, but without the human judgement to identify anomalous behavior and nudge them that something is wrong.

Companies can help mitigate against these attack paths in a few ways:

User training. Train users that output from LLMs is not necessarily safe despite coming from an “internal” source. Additionally, coach users on the potential threats to data sources, including copying and pasting, that may add unwanted instructions to the LLM.
Disable auto-approve settings. Some AI-coding platforms allow enterprises to universally disable auto-approve settings or specifically allow-list certain commands that can be auto-approved. Where possible, use those technical controls to keep a human in the loop. If not offered by the platform, conduct routine user training for developers and consider periodic audits of settings files to ensure compliance.
Segregate. While more time and labor intensive, further separation of development boxes from developers’ normal workstations can help mitigate potential compromise if malicious code is introduced through the coding platform.

If you are building your own applications, additionally consider:

Restrict hidden Unicode characters. Unless expressly needed for the functionality of the application, the application should filter these characters before passing the prompt to the LLM and raise the suspicion level of the interaction. Let’s be real, how often do you really need these for the LLM?
Consider all data sources. Indirect prompt injection has been and will continue to be the best way to introduce malicious prompts.
Check for redundant tools. As discussed with the arbitrary file read, one set of tooling was restricted while the overall environment offered another set of tooling. Ensure your application does not offer two sets of similar functions with differing security postures when introducing an LLM. This also applies to MCP servers your team may use, as malicious MCP servers can “shadow” legitimate tools, convincing the LLM to call the malicious tool instead of the benign one.