DEW #159 - Synthetic Logs for Detection, Device Code Phishing Detection & Mythos 5 release
Welcome to Issue #159 of Detection Engineering Weekly!
✍️ Musings from the life of Zack:
I just got back from a work trip to NY and got to hang out with people in my org who traveled from all over the world (Europe, Asia, and around the US!) Sorry for not putting out an issue last week, but I definitely needed to maximize my time to spend with folks in the office :D
I hosted an internal security fireside chat where Ayoub Fandi, who runs the GRC Engineer Newsletter, and he crushed it. It was great to hear his perspective on modernizing GRC functions and on the emergence of amazing tech applied to age-old security problems
It’s almost summer here in New England! I know it’s close because every week, I walk outside with my kids and pick wildflowers. There is almost always a new type of flower blooming every week that we get to bring back home
Sponsor: runZero
Are You Writing Detections for an Attack Surface You Can’t See?
Your detections are only as good as your asset coverage. If runZero isn't in your stack, you're writing rules for an attack surface you can't fully see… and AI-generated exploits are built to find exactly those gaps.
Know every asset, every exposure, every attack path—across IT, OT, IoT, cloud, and mobile. Validate your segmentation before the exploit drops. No agents. No authentication. No appliances.
💎 Detection Engineering Gem 💎
Introducing EvidenceForge: Synthetic security logs that don’t look (as) fake by David Bianco
I’ve worked in security product startups for years and have procured vendors and software related to security. The single biggest problem with the procurement process, IMHO, is evaluating the theoretical scenarios in a demo environment to see how accurate they would be in a real scenario. In the ideal state, you turn a product on, and it starts finding the bad stuff, but in my experience, it rarely happens.
The same thing happens with detection rules and capturing a truly malicious, compelling event to guarantee the alert will fire. As a community, we helped alleviate this pain with excellent Purple Teaming tools such as Stratus Red Team and Atomic Red Team, as well as attack-range data, including the Botsv3 dataset. The problem at its core is realism: isolated attack chains work well for sequenced rules, but they don’t take into account the legitimate traffic that can harden the necessary vs sufficient assumptions you need to deploy a rule before it hits prod.
This is where Talos’ EvidenceForge project fills that niche. It’s an LLM-backed synthetic telemetry generation platform that preserves the causal relationships that detection engineers should expect to see during an attack chain. Through realistic scenario generation via prompting, it tries to build scenarios and telemetry that you’d expect to see in a live environment.
EvidenceForge first creates a shared event model to guide synthetic log generation without forcing the user to reason about every downstream format up front. Once generation occurs, it ties scenarios together across host and network telemetry, introduces benign traffic and prerequisite events for the attack, and adds a timing layer. Detection engineers can then use the resulting dataset to validate rule efficacy, and Bianco notes that it can also support threat hunting, analyst training, and incident response exercises.
Projects like this are where I am bullish on agentic technology because they focus on harnessing and acknowledging what does and doesn’t work in model prompting at specific steps. Adding guardrails around the model, such as the Shared Event Model and the realism layer, helps scope the task for the agents, ensuring they don’t make assumptions beyond the prompt's bounds.
🔬 State of the Art
The pressure by Daniel Stenberg
I’ve included a ton of Stenberg’s blog posts on how the cURL team has dealt with surges of vulnerability reports due to LLM use. Many of these posts have shed light for me on how tiring open-source software maintenance work is, just to fix vulnerabilities. The problem in cURL’s case, and I’m sure many others, is the tension between popularity, attention and capacity. The project is already popular and gets lots of attention from software engineers and vulnerability researchers alike, so the volume of bug fixes and vulnerability reports puts a lot of strain on their day-to-day. Add LLMs on top of this; it reads as if it’s unbearable.
I am jealous of those projects that shipped a horrible bug at some point in the past that made the world burn for a while. They got attention and some of them then got funding and financial muscles to get them staff and hire multiple full time engineers. I sometimes think we would be better off if we also had one of those.
This is a sobering and, in my opinion, sad quote from Stenberg. It highlights how underwater he feels while maintaining the cURL project. If I had to relate this, it’s very similar to what we deal with in the security industry. Don’t get it twisted: I think building and maintaining cURL is way more important than my day-to-day in many ways, but the pressure of delivering security initiatives is so much higher than it’s ever been in my career.
Be careful of burnout and take care of yourselves and your loved ones. I think OSS development and security are altruistic in so many ways, but it shouldn’t interfere with your health. I’ll put the cURL donation page link below if you all want to help out a great project.
Detecting Tycoon 2FA AiTM attacks across Entra ID and Google Workspace by Samir Bousseaden & Terrance DeJesus
The evolution of phishing has been fascinating to stay on top of since my days of first catching phishing threat actors in 2014. Typically, I’ve associated phishing kits with targeting users to steal passwords and credit cards for fraud. I’m sure most of your friends and family have asked you about spam texts for unpaid tolls, missing packages or a crypto company “threatening” to turn off their accounts. These target the everyday person.
Advanced “attacker-in-the-middle” kits, like Tycoon, target enterprise and cloud accounts. In this post, DeJesus and Bousseaden showcase opportunities for post-phish detection in Tycoon 2FA. Tycoon 2FA is a prolific phishing framework that targets cloud accounts, especially Entra ID and Google Workspace, to provide operators with post-compromise access to sensitive corporate environments. It moves the question from “how do we detect phishing attacks” to “how do we find identity compromise in cloud environments?” Check out the flow below:
It looks eerily similar to how authentication flows, both for OAuth apps and individual users, because the attackers understand the identity model. Hunting for phishing pages and infrastructure is a strong indicator, but when paired with control plane logs from Entra ID and Google Workspace, you can generate high-fidelity alerts.
Device Code Lab (DCL) — Deep Dive into a Device Code Phishing Toolkit by Paul Newton
Continuing the phishing kit theme from Elastic’s post above, Newton dissects the Device Code Lab (DCL) phishing kit to demonstrate its functionality. This is a great piece for comparing and contrasting with other phishing kits. DCL, for example, has unique properties that make it more of a full-service offering, including integration with residential proxy networks, expired domain hunting, and an operator/post-operator model similar to a Cobalt Strike Team Server.
For the most part, the phishing workflow is the same, but uses some Cloudflare Worker shenanigans instead of Socket.io:
There are three interesting features that I listed above:
Residential proxy integration lets operators route device-code traffic through consumer ISPs rather than obvious cloud ASNs, making sign‑in patterns look much more like real users.
The expired domain hunting feature continuously discovers and recycles aged domains, so links land on infrastructure with an existing reputational history rather than freshly registered junk.
The operator / post‑operator model separates the initial phish from the later tenant abuse, giving you two distinct activity phases to hunt for in identity logs, similar to how you’d reason about beacons and operators in Cobalt Strike
Look for post-exploitation activity similar to what Elastic recommended above. These seem to be noisy when they gain access to a tenant and perform enumeration. Look for rogue registered devices as well, because invalidating refresh or persistent tokens can only stop immediate access.
After The GitHub VSCode Supply Chain Attack: Is Your Extension Block Actually Working? by Aman Khurana
This post-incident analysis of the VSCode incident in GitHub by TeamPCP is a clear example of why we need to understand how supply chain tools actually install.
Khurana shows that VS Code uses marketplace.visualstudio.com only to resolve extension metadata, then pulls the real VSIX from gallerycdn.vsassets.io , Microsoft’s CDN, from which the poisoned Nx Console payload originated. Blocking the Marketplace API felt like an immediate win, but it left the CDN “warehouse” untouched, so anyone who knows the deterministic URL pattern can still fetch and install the extension directly. This is the same pattern for pypi (https://pythonhosted.org) and npm (registry.npmjs.org and popular 3rd parties like jsdelivr).
This is the pattern you should expect everywhere supply chain software meets cloud: app frontends, APIs, and CDNs can all be different domains, which can mess up detection opportunities if attackers start using legitimate cloud hosting services to blend in.
I’ve joked about this before, but detection engineers are now becoming supply chain security experts. We now need to understand how to map the full software installation flow: which domains resolve metadata, which domains deliver binaries, and which local directories auto‑load content without a visible installer. Otherwise, you are tuning detections against the storefront rather than the delivery path that attackers actually use.
☣️ Threat Landscape
Claude Fable 5 and Claude Mythos 5 by Anthropic
Anthropic released a Mythos-class frontier model dubbed Fable 5. It has extensive capabilities across its usual benchmarks and enhanced capabilities in cybersecurity tasks. The cool thing here is that it’ll route certain prompts to Opus if it deems them unsafe for use. The most random (but most important) benchmark is that it beat Pokémon FireRed only using vision. It’s an 18-second video, so I’m unsure if they just accelerated the playthrough or if it actually went so fast pressing buttons that it did everything in 18 seconds.
Cool down before you install: give new gems a few days to be vetted by Hiroshi Shibata
The RubyGems ecosystem just released a “cooldown” feature for its bundle tool. Effective supply chain attacks, such as those carried out by TeamPCP or DPRK, rely on publishing a malicious package after a CI/CD pipeline or a user has been compromised. Once that package is pushed out, people start installing it, and both the open-source and security communities begin flagging it as malicious.
Cooldowns help combat this type of attack by allowing users to specify a cooldown period during which the package version must remain in the ecosystem before it can be installed. The premise here is that, for example, if you set a cooldown timer for 24 hours, you won’t install new compromised packages.
It’s a great governance control for engineering teams, but given the hyper-configurability that open-source software tools offer teams, it will be harder to pull off in practice. This, IMHO, will work in controlled CI/CD environments, but will likely be bypassed on developer laptops.
21 Zero-Days in FFmpeg by Zhenpeng (Leo) Lin
This blog by the Depthfirst research team shows how much latent risk lies within “plumbing” style open source software. Kind of like what I called out in Stenberg’s cURL blog above, the thankless work that goes into these pieces of software that literally touch our everyday lives is astronomical. So, helping the ecosystem become more secure by performing vulnerability research on these projects is overall helpful, but it still raises questions about the toil on maintainers.
That being said, this blog is a great demonstration on the scale and velocity of vulnerability analysis using agents. Their security agent rescanned a heavily audited FFmpeg tree and still found 21 new zero-days, including heap and stack overflows that had been quietly reachable for 10–20 years in demuxers, depacketizers, and RTP paths.
There’s a great explanation of a vulnerable code path in RTSP processing that potentially leads to remote code execution. All the Depthfirst team needs to do is set up a rogue stream of a period piece movie and/or series, and they can totally pwn me.
Incident Notification to the Maine Attorney General by Meta AI
Meta suffered an AI-related incident in which its “high-touch support” chat agent was abused to grant attackers unauthorized access to social media accounts via password resets. But, my analysis isn’t just about this incident; it’s about how awesome Maine (the state I live in) should be your early warning system.
Basically, when a security breach occurs at companies like Meta AI, they go through a notification process to make sure affected users are aware of the incident. This is especially important when trying to comply with federal and state laws. Maine has some of the strictest requirements for breach notification if any Mainer (yep, that's what we call ourselves) is affected by a breach.
Here’s what Perplexity cooked up as it did some research on this for me:
Companies must notify affected Maine residents and the Attorney General “as expediently as possible and without unreasonable delay,” which in practice means they cannot sit on a breach for months while they polish a narrative.
If more than 1,000 people are notified, they also have to notify consumer reporting agencies and file a detailed notice with the AG that becomes part of a public record, even if only a handful of those people actually live in Maine.
Notice has to go out as soon as law enforcement says it will not compromise an investigation, with guidance converging on roughly a 30‑day outer bound from discovery, so Maine filings often appear before a company’s national comms and investor disclosures are fully baked.
If you want a live view of the feed, here’s the “Breach Portal” for Maine:
https://www.maine.gov/agviewer/content/ag/985235c7-cb95-4be2-8792-a1252b4f8318/list.html
🔗 Open Source
GitHub link to Bianco’s EvidenceForge project listed in the Gem above. Check out the AGENTS.md file since this is run as a skill within Claude / Codex.
Open-source, Python-based asset discovery framework. It handles host discovery, port scanning, service fingerprinting, and enrichment in a single path, then organizes the results into a single web app. Good fit for attack surface inventory, lab mapping, and quick scoping when you need a clean picture of a subnet or environment.
Handy is essentially a push-to-talk assistant for transcribing your speech into text. It uses local models for the transcription itself. Good for note-taking, but might be super interesting for prompting your coding agent so you can be totally hands-off keyboard.
CarterPerez-dev/exs-cyberjob-scraper
Elixir-based scraper that pulls cybersecurity job postings, filters out the generic “IT” noise, and surfaces which certs show up most often in real roles. If you mentor juniors or run hiring, this is a handy way to sanity‑check which certification signals actually appear in current job ads instead of guessing based on vendor marketing.



