I don't like that people use security as an angle when criticizing the use of AI in KeePassXC.
-
@volpeon @errant AI is really easy to accidentally prompt in such a way that it goes off the rails, can make all sorts of mistakes and copyright issues (even subtle ones), and can do it at scale. Hypothetically it might be possible to use it in such a way to not trigger these issues, but surely all the checking and prerequisite expertise would nix most advantages of using it?
And it looks like for most AI PRs in KeepassXC, the person working with the AI ultimately approves the code (example: https://github.com/keepassxreboot/keepassxc/pull/12588)... hardly a rock solid review process. Usually, there's two humans in the loop.
@sitcom_nemesis @errant
> but surely all the checking and prerequisite expertise would nix most advantages of using it?
Sure, but why would this be a concern for anyone but the user themself? I'm sure I use things which other people may not like, such as VSCode or GNOME. Is it valid for them to tell me what to use and how?
> And it looks like for most AI PRs in KeepassXC, the person working with the AI ultimately approves the code... hardly a rock solid review process. Usually, there's two humans in the loop.
The way the AI is integrated in GitHub makes it a separate entity from the reviewer with an interaction workflow akin to iterating a PR with its author until it matches the project's standards. In both cases, the PR author — AI or human — is untrustworthy and the reviewer is trustworthy. There are also non-AI PRs where only one developer conducted the review, so there's no difference between AI and non-AI standards.
If this strikes you as flawed, then your concerns should lie with the review process itself. -
@gimmechocolate @volpeon feelings imply awareness, no? but copy-paste doesn’t have feelings either.
and like yeah it does create issues. i am aware that it creates issues. that is specifically why i mention it. it is still an occasionally useful tool which can be leveraged by programmers. The person liable for (mis)use is the developer using the tool, and copy-paste errors are also something that may not be immediately obvious in a code review either
@charlotte @volpeon
Yeah, but no feelings wasn't a criticism of LLMs, I was saying it does actually behave more or less like what you said but without the malice part. To be clear it's bad because it makes code that looks good, but isn't — it's just mashes up the semantic ideas from its training data without regard for how those different patterns will interact; the result will always *look* right, but won't necessarily be right, often for very subtle reasons, making it a poor tool that people simply shouldn't use. I wouldn't trust a project that put instructions on how to copy-paste from stackoverflow in their repo either.As for the tool thing, like, I can see ways where a LLM could be a useful tool. For example, it's very annoying to me how the results from the LSP autocomplete are in some arbitrary order, and it could be nice to have an LLM step in and rank em!! It's a tool, but like, only as a part of a well-designed machine-learning pipeline, made by the people who understand the limits of the technology — not something used by just about everyone almost raw.
The whole thing is just annoying to me though — I've been interested in machine learning for over a decade now, and when I took my first course in it, I was warned about this exact scenario. There's a cycle of AI spring and winter, where the AI business leaders push the idea that these tools can do anything. People metaphorically saying "Uhm, when all you have is a hammer, actually you can solve all your problems by whacking everything with it.". Now I get to see what they were talking about firsthand and it's extremely frustrating to feel like I am shouting it from the rooftops and no one listens.
-
@charlotte @volpeon yeah the people on that github issue were bizarrely saying things like software is untestable, like you couldn't write a test to verify correctness because the nefarious ai can insert tricks that human cognition weaknesses cannot detect
-
@sitcom_nemesis @errant
> but surely all the checking and prerequisite expertise would nix most advantages of using it?
Sure, but why would this be a concern for anyone but the user themself? I'm sure I use things which other people may not like, such as VSCode or GNOME. Is it valid for them to tell me what to use and how?
> And it looks like for most AI PRs in KeepassXC, the person working with the AI ultimately approves the code... hardly a rock solid review process. Usually, there's two humans in the loop.
The way the AI is integrated in GitHub makes it a separate entity from the reviewer with an interaction workflow akin to iterating a PR with its author until it matches the project's standards. In both cases, the PR author — AI or human — is untrustworthy and the reviewer is trustworthy. There are also non-AI PRs where only one developer conducted the review, so there's no difference between AI and non-AI standards.
If this strikes you as flawed, then your concerns should lie with the review process itself.The way the AI is integrated in GitHub makes it a separate entity from the reviewer with an interaction workflow akin to iterating a PR with its author until it matches the project's standards. In both cases, the PR author — AI or human — is untrustworthy and the reviewer is trustworthy.
I'd argue that integrating AI into GitHub this way is part of the problem. It's not an agent - it's a word guessing machine with access to an API. It fundamentally doesn't think like a human, trustworthy or otherwise. We have methods of understanding context, intention and trustworthiness with other humans - AI strips that all away while still claiming to be analogous to a human. That's, in part, what makes it so risky.
It's one thing to allow AI code with the caveat that the human needs to take full responsibility (e.g. the Fedora guidelines), but that doesn't seem to be happening with KeepassXC. Hence the concern.
-
The way the AI is integrated in GitHub makes it a separate entity from the reviewer with an interaction workflow akin to iterating a PR with its author until it matches the project's standards. In both cases, the PR author — AI or human — is untrustworthy and the reviewer is trustworthy.
I'd argue that integrating AI into GitHub this way is part of the problem. It's not an agent - it's a word guessing machine with access to an API. It fundamentally doesn't think like a human, trustworthy or otherwise. We have methods of understanding context, intention and trustworthiness with other humans - AI strips that all away while still claiming to be analogous to a human. That's, in part, what makes it so risky.
It's one thing to allow AI code with the caveat that the human needs to take full responsibility (e.g. the Fedora guidelines), but that doesn't seem to be happening with KeepassXC. Hence the concern.
@sitcom_nemesis @errant
It doesn't matter what the AI is or isn't. GitHub's presentation of it leads to the AI having the same status as an external contributor, which incentivizes the thinking that it must be held up to the same standards as human contributors. I'd even say that this is the only healthy approach because implicitly trusting humans more means that in case they use AI outside of GitHub and act like it's their own work, your own bias may prevent you from seeing flaws you'd pay closer attention to when reviewing an AI's output.
I'm not sure what the problem with responsibility is. Isn't this a project governance issue, i.e. "you take full responsibility" means you'll get kicked out if you contribute garbage AI code? How is this a security problem? -
@volpeon it’s treated as if AI is somehow able to insert magical code vulnerabilities that cannot be seen in a text editor or through review or testing
or that it maliciously adds maliciously good-looking code that is actively malicious instead. neither of which is informed by how it works
I saw this on my fedi timeline. I don't think people need to flat-out lie just to criticise the use of AI generated code - UZDoom going for the copyright/GPL angle during that whole thing made a lot more sense IMO.
-
I don't like that people use security as an angle when criticizing the use of AI in KeePassXC. If a project accepts public contributions, this means there will be malicious actors trying to smuggle in code which weakens security. The project must therefore have a solid review process in place to ensure this doesn't happen.
If you see AI as this huge security threat, then you don't trust this review process. But then you shouldn't have trusted the software at any time before to begin with.From what I saw, KeePassXC implements the same policy as Mesa and Fedora, so I don't see what the problem is.
That being said, the code of my password manager of choice is proprietary, they could very well be using AI generated code and I wouldn't even know (or care, since I trust the company enough).
-
From what I saw, KeePassXC implements the same policy as Mesa and Fedora, so I don't see what the problem is.
That being said, the code of my password manager of choice is proprietary, they could very well be using AI generated code and I wouldn't even know (or care, since I trust the company enough).
-
@charlotte @volpeon yeah the people on that github issue were bizarrely saying things like software is untestable, like you couldn't write a test to verify correctness because the nefarious ai can insert tricks that human cognition weaknesses cannot detect
-
i don't even use a password manager and use firefox's internal password manager thing, which is A BAD IDEA and i need to change it sooner
-
I don't like that people use security as an angle when criticizing the use of AI in KeePassXC. If a project accepts public contributions, this means there will be malicious actors trying to smuggle in code which weakens security. The project must therefore have a solid review process in place to ensure this doesn't happen.
If you see AI as this huge security threat, then you don't trust this review process. But then you shouldn't have trusted the software at any time before to begin with.@volpeon Yes! Exactly this!
The dev's responses that I have seen give me enough confidence to not lose trust in their use/allowance of AI gen'd code.
They plan to treat ALL AI gen'd code as equivalent to a drive-by PR which requires additional scrutiny.
That's a perfectly reasonable review policy, imho.
-
yup. I basically needed one independent of the browser since I use multiple browsers

-
I don't like that people use security as an angle when criticizing the use of AI in KeePassXC. If a project accepts public contributions, this means there will be malicious actors trying to smuggle in code which weakens security. The project must therefore have a solid review process in place to ensure this doesn't happen.
If you see AI as this huge security threat, then you don't trust this review process. But then you shouldn't have trusted the software at any time before to begin with.@volpeon@icy.wyvern.rip @ngaylinn@tech.lgbt I have been critical of the KeePassXC team's decision to accept LLM-assisted submissions for awhile now, and after long consideration I opted to move away from using it, security being one of my primary concerns. My view is that you're setting up a bit of a strawman in this post, and so I thought I'd elaborate more on my rationale in case it helps anyone who's weighing this decision too. The tl;dr is that code review should be your last line of defense, not your primary one, and that LLM use threatens to erode existing lines of defense while introducing new categories of risk. This is the opposite of what you should be doing when developing security-oriented software.
Here's a lot more words:
To my way of thinking the question isn't whether any given pull request is problematic. One of the KeePassXC maintainers who I've interacted with seemed to suggest this as well, that human beings sometimes submit poor quality pull requests too so what's the difference, especially if the review process catches them? The important question, and the difference, lies with the culture of the development team and process. Experience shows that security-oriented software in particular benefits greatly from a team of people dedicated to both transparency and the relentless pursuit of excellent implementations. My belief is that the use of LLMs in coding threatens both of those aspects, degrading them over time. Transparency, because no one can know exactly what an LLM is going to produce and why, and an LLM cannot tell you anything about its output; excellent implementations because (a) come on, have you ever looked at LLM output, especially for larger chunks of code; and (b) the only way we've ever found to produce excellent implementations of anything is by developing a well-functioning team of people and setting them loose on it.
Peter Naur famously argued that programming is theory building, and theories draw their power from their existence in the heads of the people who construct them. I am convinced by his argument by the simple fact that over the course of my career I've worked with large codebases written by other people, and have experienced firsthand that the only way to really understand the code is by talking to other people who understand the code. No one can look at a large codebase and understand how it works, not even with the best documentation in the world--not in a reasonable amount of time, anyway. Anyone who believes this hasn't picked up a non-trivial APL program and tried to figure out what it does. Anyone who believes this is mistaken about the practice of software development and engineering, and probably also believes in the myth of the 10x engineer or that women can't code as well as men, too.
LLMs are not people. They do not understand code. They cannot describe their thought process to you. They cannot point you to the most important functions, procedures, methods, or objects. They cannot give you hints about pitfalls you might fall into while working with their code. Any understanding like this that arises about LLM-generated code arises because human beings developed that understanding of the code and then communicated it.
LLMs are trained on masses of mediocre code. Their output has been found to include significantly more bugs and security issues than the average human-written code. Their use has been observed to result simultaneously in reduced productivity and a belief that productivity was increased, suggesting they might induce other blindspots in one's self-awareness too. Their use has been observed to result in de-skilling: people become less able to do things they used to be able to do without leaning on the tool. Given all that, I do not believe for a moment that an LLM can produce an excellent implementation, nor foster a culture in which excellent implementations arise; and I believe that any excellent implementation produced by a person using an LLM is a result of the person compensating for the weaknesses and traps of LLM use, all while it potentially degrades their future ability to produce excellent implementations and fools them into believing they wrote better code faster when they did not.
A good review process does not compensate for any of the issues I raise here. More importantly, actual security is about layers of protection. The code review is one of the last layers of protection. There should be many, many others, which to me includes a culture that does not succumb to the temptation to put a stochastic black box deskilling machine into the software development process. You wouldn't build a fortress with an open road leading into the center just because you had guards you could post on the road (it lets us get in and out faster, that portcullis is so slow!). You'd have the guards, and you'd have several layers of thick walls, and you'd have a moat, and you'd have archers, and... You certainly wouldn't voluntarily pull a giant wooden horse that could contain anything into your fortress!
I suspect that a project adopting more and more LLM-assisted submissions will not obviously suffer in the near term, but over the medium to long term is likely to develop issues, originating in one or more of my above observations, that eventually lead to problems in the software. As I said to someone about KeePassXC, I am not inclined to hitch my wagon to that train. Not when it comes to a piece of software like a password manager.
And that's not even opening up the moral and ethical issues of LLMs, which are substantial. Not to mention the dangers of becoming dependent on a technology and tools that might go away or become significantly more expensive when the asset bubble currently necessary for their continued existence finally deflates.
Other people might come to a different place, but for me this is more than enough reason to switch password managers.
-
@volpeon@icy.wyvern.rip @ngaylinn@tech.lgbt I have been critical of the KeePassXC team's decision to accept LLM-assisted submissions for awhile now, and after long consideration I opted to move away from using it, security being one of my primary concerns. My view is that you're setting up a bit of a strawman in this post, and so I thought I'd elaborate more on my rationale in case it helps anyone who's weighing this decision too. The tl;dr is that code review should be your last line of defense, not your primary one, and that LLM use threatens to erode existing lines of defense while introducing new categories of risk. This is the opposite of what you should be doing when developing security-oriented software.
Here's a lot more words:
To my way of thinking the question isn't whether any given pull request is problematic. One of the KeePassXC maintainers who I've interacted with seemed to suggest this as well, that human beings sometimes submit poor quality pull requests too so what's the difference, especially if the review process catches them? The important question, and the difference, lies with the culture of the development team and process. Experience shows that security-oriented software in particular benefits greatly from a team of people dedicated to both transparency and the relentless pursuit of excellent implementations. My belief is that the use of LLMs in coding threatens both of those aspects, degrading them over time. Transparency, because no one can know exactly what an LLM is going to produce and why, and an LLM cannot tell you anything about its output; excellent implementations because (a) come on, have you ever looked at LLM output, especially for larger chunks of code; and (b) the only way we've ever found to produce excellent implementations of anything is by developing a well-functioning team of people and setting them loose on it.
Peter Naur famously argued that programming is theory building, and theories draw their power from their existence in the heads of the people who construct them. I am convinced by his argument by the simple fact that over the course of my career I've worked with large codebases written by other people, and have experienced firsthand that the only way to really understand the code is by talking to other people who understand the code. No one can look at a large codebase and understand how it works, not even with the best documentation in the world--not in a reasonable amount of time, anyway. Anyone who believes this hasn't picked up a non-trivial APL program and tried to figure out what it does. Anyone who believes this is mistaken about the practice of software development and engineering, and probably also believes in the myth of the 10x engineer or that women can't code as well as men, too.
LLMs are not people. They do not understand code. They cannot describe their thought process to you. They cannot point you to the most important functions, procedures, methods, or objects. They cannot give you hints about pitfalls you might fall into while working with their code. Any understanding like this that arises about LLM-generated code arises because human beings developed that understanding of the code and then communicated it.
LLMs are trained on masses of mediocre code. Their output has been found to include significantly more bugs and security issues than the average human-written code. Their use has been observed to result simultaneously in reduced productivity and a belief that productivity was increased, suggesting they might induce other blindspots in one's self-awareness too. Their use has been observed to result in de-skilling: people become less able to do things they used to be able to do without leaning on the tool. Given all that, I do not believe for a moment that an LLM can produce an excellent implementation, nor foster a culture in which excellent implementations arise; and I believe that any excellent implementation produced by a person using an LLM is a result of the person compensating for the weaknesses and traps of LLM use, all while it potentially degrades their future ability to produce excellent implementations and fools them into believing they wrote better code faster when they did not.
A good review process does not compensate for any of the issues I raise here. More importantly, actual security is about layers of protection. The code review is one of the last layers of protection. There should be many, many others, which to me includes a culture that does not succumb to the temptation to put a stochastic black box deskilling machine into the software development process. You wouldn't build a fortress with an open road leading into the center just because you had guards you could post on the road (it lets us get in and out faster, that portcullis is so slow!). You'd have the guards, and you'd have several layers of thick walls, and you'd have a moat, and you'd have archers, and... You certainly wouldn't voluntarily pull a giant wooden horse that could contain anything into your fortress!
I suspect that a project adopting more and more LLM-assisted submissions will not obviously suffer in the near term, but over the medium to long term is likely to develop issues, originating in one or more of my above observations, that eventually lead to problems in the software. As I said to someone about KeePassXC, I am not inclined to hitch my wagon to that train. Not when it comes to a piece of software like a password manager.
And that's not even opening up the moral and ethical issues of LLMs, which are substantial. Not to mention the dangers of becoming dependent on a technology and tools that might go away or become significantly more expensive when the asset bubble currently necessary for their continued existence finally deflates.
Other people might come to a different place, but for me this is more than enough reason to switch password managers.@abucci Thanks for your reply! You're making good points which I overall agree with. I've had rather subpar experiences with LLM-generated code at work myself, so it's not like I don't see the downsides and how it leads to the erosion of skill. It's true that this also has implications on the security.
However, from what I've seen, I think the way GitHub integrates Copilot into the process makes it less likely to cause the same degradation as an AI assistant directly integrated into an editor. As I said elsewhere, GitHub presents Copilot as a PR author and your usage of it is akin to iterating a PR with a human author until it meets the project's standards.
If regular PRs don't pose a risk to one's skills, then I don't see why this would. It incentivizes the thinking that the AI must be held to the same standards as any other PR author, that it isn't inherently above them. I think this is a good way to handle it.
I'm happy to be corrected if my understanding of Copilot or the way the devs use it is wrong. You're clearly more involved in this topic than I am.
Apart from that, I do wonder how realistic it is to expect projects to reject LLMs contributions forever. No matter what you and I want, the global trend moves towards increasing adoption of AI and this means external contributions will become more and more "tainted", with and without their knowledge. Given this outlook, I think it's better to be open for AI contributions. This allows the developers to become familiar with the strengths and weaknesses of AI, and it creates an environment where contributors are willing to disclose their use of it so that reviews can be conducted with appropriate care. An environment where AI is banned will only lead to people trying to deceive the developers and causing necessary trouble.
@ngaylinn