Key Takeaway
The training data problem in AI code generation stems from models learning from outdated or low-quality software, which can reintroduce existing vulnerabilities and create new issues, according to Alex Zenla, CTO of Edera. Unlike traditional open-source software, where developers can audit and inspect code, AI-generated code lacks transparency and accountability. Dan Fernandez, Head of AI Products at Edera, emphasizes that AI code does not provide traceability like GitHub, where contributors and changes are documented. This absence of oversight raises concerns about the security and reliability of AI-generated code.
The Training Data Challenge
The core issue revolves around how these models initially learn to write code.
“If AI is trained on outdated, vulnerable, or low-quality software that exists out there, then all the vulnerabilities that have been present can resurface and new issues can be introduced,” explains Alex Zenla, CTO of the cloud security firm Edera.
This highlights a significant distinction from traditional open-source practices, where developers can at least review and audit the code they are using.
“AI-generated code lacks transparency,” states Dan Fernandez, Head of AI Products at Edera.
“In repositories like GitHub, you can at least view elements such as pull requests and commit messages to understand who made changes to the code, allowing for traceability of contributions,” he adds.
“However, with AI-generated code, there is no equivalent accountability regarding what has been included and whether it has been reviewed by a human.”



