Knowing what good looks like in agentic security
The peak is higher, but the climb is faster
I’ve had this nagging desire to write about my personal thoughts on agentic workflows and security operations for several months. I’ve expertly procrastinated on getting these thoughts on paper. Two reasons: I wanted to understand AI in security operations more deeply first, and, frankly, you’re probably exhausted by the marketing hype around agentic security takes.
The issue with point two is that this level of AI hype detracts from the pragmatism of using these technologies in our day-to-day work. This hype tires everyone out in security because, at the same time, leaders (including me) are literally asking all our organizations what they are doing with coding agents and other LLM technologies. The hype creates a state of “AI poverty” for those who yearn to try these technologies but cannot because of the individual cost and the barrier to entry for firms that can afford them.
So, when I think of current security experts who can’t use a technology that is cost-prohibitive, or aspiring security experts who already have a bunch of knowledge outside of AI they need to demonstrate, I start to feel conflicted. I am privileged to be in a position where I can use this technology and be at the cutting edge. I am also privileged because I Know What Good Looks Like with the expected outcomes of this technology. But how is someone going to break into this industry with a peak even higher than ever before?
I’ve seen this exact industry circumstance happen before, and I hope this piece serves as a reminder of the risks of agentic coding and LLMs to experts in our field, and how it will likely save it.
Knowing What Good Looks Like
2015 was a special time in my career, especially at Hacker Summer Camp:
I got to fly out on company dime to BlackHat and do booth duty, talk to security people, attend talks, and find parties that can give me free food and alcohol
I gave my first mainstage DEFCON talk
This was also the year that one of my favorite security vendors, OpenDNS, was swallowed by the monstrous Cisco machine, and one of my favorite tools, booths, and T-Shirts of all time withered away. Rest in peace
When walking the floor at BlackHat, I could see the last three years of cyber marketing peeing in the “ML Security” pool. This was the time of the hype around endpoint startups like Cylance & Endgame, who were pushing the idea that Machine Learning & Statistics can find attacks that rules cannot find, and it can save you hours of work using their tools for security operations.
The general reaction of most security professionals to this marketing-speak was to scoff. And I felt like we were all justified in doing so, because we are all professionally paranoid. We knew what separated a good alert from a bad alert, so even if we didn’t have that expertise, we still had that going for us.
But what those companies did was lay the groundwork for the availability of knowledge of machine learning and AI to the masses. Their moat was expertise, but that expertise rippled through the rest of the industry, and we all began using it in our daily lives.
Resistance is futile: You can’t stop the spread of security expertise
Anomaly detection, linear and logistic regression, binary classification, and clustering were all advanced concepts for a typical security engineer. Within a handful of years, the concepts became accessible through open-source libraries, which led to open-source SIEM and SOAR technologies, and the moat dissipated. We started to understand what good looked like with this tech. It had a lot of sharp edges, it sometimes created more work when it didn’t work, and it certainly sucked at most things besides very specific implementations.
Does this sound familiar?
Eleven years ago, I went through this cycle, and I think this is happening again. At the time, I justified using “ML Security” with this thought experiment.
You have 5 alerts that take 4 hours each = 20 hours of manual work
You run these alerts through an ML pipeline and
2 alerts succeed → you spend 5 minutes on each (8 hours saved)
3 alerts fail → still 4 hours each (12 hours remaining)
Result: 12 hrs of work with 2 good alerts and 3 bad alerts. 12 < 20, so isn’t this a net benefit?
And that 8-hour gain? It compounds across every security engineer. Our industry has more time to work on harder problems. We all became better.
There was one problem: you can’t easily verify correctness in “ML Security”. These techniques were essentially black boxes. Linear algebra drew lines on a multi-dimensional feature vector, Calculus provided weights to each feature, error correcting algorithms smoothed the weights out, and all you saw was these scores added up to some concept of a confidence percentage of 0-100%.
This is what I think is happening right now with LLMs in our field. The problems are different, the solution is WAY different, but the fundamentals for the most part remain. There are risks, and this is why I’m so obsessed with the space right now.
The Starry Night Problem with Agentic Security: Lossy Compression
LLMs and AI are a very lossy kind of compression. Some of these 2024 blogs, written during the explosive growth and use of ChatGPT, compare it to a pixelated, compressed JPEG image reduced to pixel art. Take a look at the photo (this one is pulled from Aboard’s newsletter):
It’s van Gogh’s The Starry Night reduced to a few dozen kilobytes. Humans can see this and know it’s The Starry Night, but also understand that it’s not the high-resolution, accurate version. LLMs take a massive corpus of training data, the equivalent of a super high-resolution image of The Starry Night, and compress it like the above picture. You query the LLM; it performs a bunch of math on the context of your conversation and your prompt, and it tries to reverse-engineer the high-resolution image to give you something that resembles the image above.
Now take this in the context of cybersecurity, and my thought experiment above. An expert will know what good looks like: you can ask an LLM to investigate an alert, and when it shows the response and the reasoning behind it, you can quickly verify if it’s B.S. or not. But if this hallucinates and gives a non-expert a crappy, low-res picture, or even more so, another LLM, will it know its low-res?
The Expertise Gap
This is what worries me about the expertise gap in security investigations and engineering. If we stick to this old model of “you must learn how we learned and painstakingly execute a runbook until you get a decision”, then yes, this will eventually create security experts. But we’ll also set up a new generation of experts for failure, since painstaking tasks are what LLMs are really good at solving.
But when do you become an expert? How many hours? And weren’t we all wildly inefficient in that learning process?
I banged my head against my keyboard for hours just to get efficient at vim. That was 50% building expertise and 50% struggling-by-doing. Separately, it certainly wasn’t efficient for me to review an alert generated by Logistic Regression with a feature vector weight file attached. I couldn’t ask the regression model questions. I couldn’t interrogate the model’s reasoning. I just had to accept the score or reject it.
But with LLMs, I can ask it a shit ton of questions, such as: “What does MITRE ATT&CK’s Detection Strategy say about this rule?”, or “Can you check that this field name actually exists?” The feedback loop is immediate, iterative, and bidirectional. It matters less that it hallucinated, because you can keep reverse-engineering the van Gogh picture with human prompts rather than reading a statistics book.
This trust and expertise calibration in the industry will take years, but I think it’ll be much less than with previous technologies.
Learning to see the Pixels to get more, not less, security experts
Here’s where I land: this technology (unlike DeFi/Blockchain and the Metaverse, lol) is here to stay and will make a material impact on our lives in terms of security. I know this because it’s fundamentally changing how I work and how my organization does work. Knowledge gaps are closing fast, and when they close, productivity begins to skyrocket.
This is all excellent for those breaking into our field, because the things that differentiated us (time in seat) aren’t gatekeeping others as much as before.
You can eventually reverse engineer The Starry Night if you ask Claude/ChatGPT enough times. The image will suck the first few times, but after 10 or 20 times, your human brain can piece together the original image. Learning isn’t about cramming the TLS 1.2 RFC in your brain to remember the pseudo-random function for generating secrets that is seeded with the literal secret master secret. It’s asking Claude to tell you about the RFC and pull out random facts that you can spend 15 mins reading about and laughing like I just did.
The Mental Model for Learning Security Needs to Change
If we assume that LLM use is here to stay, and people need to use LLMs in their day-to-day security work, then the mental model for learning and operations needs to change. For the sake of this exercise, I propose three non-negotiables to follow:
Store and trust human artifacts outside the LLM boundary
We will need full-resolution pictures of architecture diagrams, runbooks, code, policies, and incident timelines. These artifacts should augment your LLM use rather than be thrown at it
Technologies like RAGs are helpful here, as well as asking the LLM to give you references for you to check its work if it references one of these human artifacts
Make the LLM defend itself
If you don’t understand a decision the LLM makes, keep asking it questions. Explain its reasoning. Tell it to take its time. You’d rather spend a few minutes doing this than several hours
There are all kinds of agentic architectures to do this. Anthropic’s Building Effective Agents has some fantastic examples of this
Learn the Fundamentals, then accelerate
It’s probably good for you to learn to write Sigma rules or perform investigations manually before you can be the expert for an LLM. Remember, you want to know what good looks like
Once you get the fundamentals down, you can learn from mistakes faster than ever, which makes you more of an expert. The 10,000 Hours Rule ceiling to become an expert drops
Why I’m Optimistic
Look, the peak is higher, but the climb is faster. I have very little time between my personal life, work, and this newsletter, and coding agents have brought back the joy of coding that I haven’t had for years, because it requires time and dedication. I have this joy because I remember how hard it was to balance pushing code to production services, building rules, and performing operational work.
I think this transforms how we work, not who we are. So let’s move forward with a healthy skepticism, because we all know what good looks like.



