DEW #156 - Putting Detection Coverage on Blast, SOC Metrics Stink and the decline of cool Windows Malware

Long car rides make me question everything in security

May 13, 2026

Welcome to Issue #156 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

Another Allen first! My daughter had her first T-Ball game and after trepidation in practice, she surprised all of us with some very competitive plays and dogpiled on the baseball with other 4-5 year olds
I booked my hacker summer camp Vegas travel so I’ll see you all at BlackHat & DEFCON. I’ll be trying to host a Detect & Response Happy Hour again, so stay tuned. If any current or future sponsors want to do an event, or sponsor the Happy Hour, feel free to reach out to me! techy@detectionengineering.net
I’m starting to kick the tires on Codex and have been impressed with it so far. I’ve also had an itch to resurrect my OpenClaw instance for more malware research so wish me luck with my wallet :)

The 3rd edition of Datadog Detect is next week! Datadog Detect is a FREE, fully-online security operations focused event hosted by the Security Research team here at Datadog. We’ll have three presentations from practitioners and a short panel discussion afterward. The first two were excellent and I hope to see you all on this one. Checkout the schedule and the registration link below!
Register for Datadog Detect

💎 Detection Engineering Gem 💎

Coverage Is Not Detection by Gabriel Abdelgawa

I read a ton of blogs and thought pieces about detection coverage. MITRE ATT&CK is the industry standard for explaining this concept, and it has its uses. It answers the question “am I covered?” and is a useful tool for communicating to others that you don’t have a visibility or detection gap. It has also been abused by product marketing, especially when vendors claim 100% coverage or more coverage than their competitors. Detection engineers know this, but they don’t necessarily know why it’s a double-edged sword, beyond the nuances of rules and how they can be bypassed.

Abdelgawa’s blog post on this is the cleanest and most concise description of the inherent problem with detection coverage as a metric. The problem is that they are defensible because it’s a number you are communicating. They aptly reference a concept from Theodore Porter on mechanical objectivity. The idea is that quantifying a problem, such as with a percentage, is harder to challenge than a qualitative approach. Presenting 90% coverage on an ATT&CK heatmap biases the audience because they assume the measurement that yielded that 90% used sound methodologies.

We all know that’s not true :).

Abdelgawa cites several examples within detection scenarios where coverage starts to fail. The examples are rooted in three categories:

Depth: The idea of coverage is binary, so if you detect obfuscated PowerShell usage, and you have one rule, are you covered?
Relevance: What is relevant to your environment, and are you assigning equal weight to a Windows technique if you run a full macOS shop?
Independence: Focusing a detection opportunity on a singular telemetry source can fail you if that telemetry source, such as EDR, doesn’t generate the correct telemetry or find the activity at all

I’ve argued ad nauseam about coverage to folks, and it’s never the detection engineers who disagree with me. It’s mostly leaders and vendors. And TBH, I don’t blame leaders. We need a better way to measure the readiness of detection rules and security incidents in general. So, use a number as an anchor point, but make sure you provide context and methodology so it can be challenged.

🔬 State of the Art

ATT&CK Coverage Ratio Evaluation (ACRE) by Jordan Anderson

This blog release timing is excellent with Abdelgawa’s Gem post above! The gem above diagnoses why coverage maps fail as a measurement device, and Anderson's blog is essentially the constructive counterpart. Anderson’s ACRE measurement fairly well addresses the two problems of relevance and independence in detection coverage, and, self-admittedly, depth is much harder to solve.

The “ratio” in ACRE helps teams derive a denominator of techniques more likely to be mapped to your environment, forming a subset of the full MITRE ATT&CK map. They leverage MITRE’s CAR Catalog, a set of rules compiled across several vendors, to build a realistic roadmap of detections you should implement to achieve coverage. This can fail, IMHO, if this catalog is not kept up to date, but just building the roadmap for coverage alone is worth using CAR.

ACRE also solves the “everything carries the same weight” problem by adding a weighting mechanism. Techniques that live in the middle of ATT&CK, such as persistence, should have more telemetry and detection opportunities than those in the beginning and end stages. So, when ACRE computes the full score, it assigns greater weight to coverage for these middle-stage techniques and then splits them out by OS. You then have a much cleaner numerical score on coverage and weights within each environment.

Could your choice of metrics be harming your SOC? by Dave Chismon

As we move downstream from detection metrics to SOC metrics as a whole, we return to the question of how we measure success in security and the pitfall of mechanical objectivity. Chismon rightly calls out that efficacy in security operations is rooted in an IT-centric productivity culture. This focuses on units-of-work measurements, such as the number of tickets processed (alerts triaged), meeting KPIs or SLAs (mean time to touch an alert), and other human-based tasks.

Chismon calls out four metrics that fall into the mechanical objectivity trap: the number of tickets processed, the time to close a ticket, the number of detection rules, and the volume of logs. We tend to take metrics like this and optimize around them because we assume they measure SOC performance. But, much like the pieces above by Anderson and Abdelgawa, they don’t answer the question of “how ready are we for a security incident?”

Chismon suggests seven metrics at the end of the post, and what I love about these is that they are a mix of metrics we know and love, and human-based metrics. A successful SOC, for example, is one full of people who are satisfied with their jobs and engaged, so why not measure that? The same thing applies to engagement beyond the security team and to making sure the analysts know the environment they are trying to protect.

Appearing Productive in The Workplace by No One’s Happy

This blog isn’t necessarily detection or even security-related, but it’s especially relevant for all of us operating in a coding-agent world where everyone seems to be building code. It offers a meek look at what coding agents can do well for anyone, but also what they don’t do for those who understand the underlying technologies, idiomatic techniques, and gotcha’s associated with software engineering. It plays nicely into the security-relevant piece on the cognitive rust belt by St-Meyers in a previous issue.

These risks are real, and people are concerned about untrained people writing software. Putting the security piece aside, the reliability of these systems and the likelihood of strange architectures that don’t fit practices built and honed over the last 30 years mean they risk being dumped on the software teams who did not build them.

On the contrary, it does throw the baby out with the bathwater, in the sense that I don’t see anything about how this technology can be used for good to help these folks. I tried writing about this in my piece about agentic coding x security below:

Knowing what good looks like in agentic security

Zack Allen

Feb 19

Knowing what good looks like in agentic security

I’ve had this nagging desire to write about my personal thoughts on agentic workflows and security operations for several months. I’ve expertly procrastinated on getting these thoughts on paper. Two reasons: I wanted to understand AI in security operations more deeply first, and, frankly, you’re probably exhausted by the marketing hype around agentic se…

Read full story

Inadvertent Injections by sud0woodo

In this post, sud0woodo hunted for PoisonedRefresh samples first discovered and posted by ESET on Twitter. PoisonedRefresh is a piece of malware that targets F5 devices and creates an implant within the Apache process to execute a webshell. The x64 variant they found on VirusTotal is more portable than the original sample and includes a peculiar design decision that could potentially expose researchers to accidentally compromising a device.

The implant hooks read() and runs a KMP string search against every GET and POST request. The upside of KMP is that the magic trigger can live anywhere in the request, such as a cookie value, a referrer, or a JWT payload, which makes it much harder to fingerprint the implant from the outside.

According to sud0woodo, if security companies or researchers are actively scanning for this variant and their probes carry the magic string, they can trigger the webshell injection on a box that the operator has not yet activated. The author calls this an inadvertent injection, and it can mess up forensic investigations and create false alarms for those hunting for these payloads in their own environments.

☣️ Threat Landscape

Where Have All the Complex Windows Malware and Their Analyses Gone? by R136a1

I remember when Mandiant first released its APT1 report and how blown away I was by the technical aptitude of both the researchers and the threat actors themselves. I was just out of college, and after reading it, I knew I wanted to pursue threat research as my career. Windows was the operating system used to build malware because of Microsoft’s market share, the lack of cloud computing at the time, and the lack of commoditization of attack tools and security knowledge.

Fast forward 14 years (!) later, and the industry and landscape have changed so much that reports like this no longer come out. It’s not for the lack of effort, as R136a1 puts it, but more so several factors around access to intelligence, the advent of GitHub and open-source post-exploitation toolsets, and the cloud and SaaS apps changing the focus of threat actors. Social media, IMHO, plays a huge part here too, because we can all build independent followings without having to worry about throwing time and money into marketing our research.

dirtyfrag by V4bel

Dirtyfrag is a Linux kernel LPE vulnerability that draws inspiration from DirtyPipe. It’s a reliable exploit in the sense that it doesn’t cause kernel panics, and you don’t need to rely on any time windows or race conditions. It relies on exploiting page caches in the Linux kernel, which are privileged kernel memory that help optimize memory reads and writes by preventing repeated disk access.

In DirtyPipe, you can achieve LPE by timing a splice() syscall to gain arbitrary write. Dirtyfrag achieves the same outcome by corrupting a pointer in sk_buff, the Linux kernel’s core network packet structure. The frag member of sk_buff holds a reference to a page-cache page used to build packet fragments.

By corrupting that reference to point to a read-only page backing a privileged file, you get an arbitrary 4-byte write through the network path, allowing you to write suid binaries, overwrite entries in a password file, or do some fun container escape shenanigans.

Mini Shai-Hulud Is Back: npm Worm Hits over 160 Packages, including Mistral and Tanstack by Raphael Silva

The Aikido Research team uncovered a new campaign Shai-Hulud targeting several npm packages, with Tanstack and Mistral’s npm packages being the more popular packages hit inside the campaign. Tanstack’s post-mortem describes a PwnRequest style initial access vector, which then pivoted into their CI/CD pipeline. They abused the lack of trusted publishing inside their environment, and began propagating 80+ malicious versions into the ecosystem that performed the same worm-like attacks that looked like it eventually hit Mistral.

When prompts become shells: RCE vulnerabilities in AI agent frameworks by Uri Oren, Amit Eliahu and Dor Edry

This is an interesting vulnerability research piece that examines how AI agent applications leveraging orchestration layers, such as LangChain, can be vulnerable to traditional command injection and remote code execution. Much like the well-understood attacks against web applications that exploit code paths leading to unsafe deserialization or risky eval() calls, popular AI agent libraries can have the same vulnerable code paths.

Microsoft researchers Oren, Eliahu, and Edry demonstrate this vulnerable code path within Microsoft’s Semantic Kernel project. In CVE-2026-26030, the In-Memory Vector Store feature passes AI model output directly into a Python eval() call to filter search results. They circumvented some filtering logic in the path to break out and achieve a shell.

The second vulnerability, CVE-2026-25592, abuses a helper function that was accidentally exposed to the model as a callable tool. The localFilePath parameter, which controls where the file is written on the host filesystem, lacked validation. The attack was unique here: they generated the payload in the sandbox, then made a tool call to achieve host compromise.

Student’s hack prompts THSRC review by Shelley Shan (Taipei Times)

A student in Taiwan used a software-defined radio to analyze, reverse engineer, and replay signals on Taiwan High Speed Rail’s radio communication system. They sent a spoofed general alarm to the control center that halted trains for 20 minutes. Police found nearly a dozen two-way radios at his residence and confirmed he had also cracked into the New Taipei City Fire Department and Taoyuan Airport MRT frequencies.

I’m unsure what the motivation for the hacks here were other than morbid curiosity and the hacker mindset. I do think this points out several security flaws in Taiwan’s rail system, so they could have done some good and tried to report this. It makes it particularly sensitive as the tensions in Taiwan are higher than ever with PRC, and a “hack” causing physical consequences makes the government particularly sensitive to these types of crimes.

🔗 Open Source

ddamenova/IRQL - Incident Response Query Language

IRQL is a domain specific language that sits on top of KQL to help alleviate both the barrier-to-entry of running KQL and to reduce the complexities of large KQL queries. Analysts use 5 primary functions that help build queries: selectors, extractors, enrichers, graph-listed variants and external enrichments. These look to be re-usable components and idiomatic queries found in KQL functions, and the example KQL <> IRQL query shows an impressive transformation that looks much cleaner and easier to read.

microsoft/EventLogExpert

Windows Event Log viewer built by Microsoft themselves! It looks like a Wireshark but for Event Log analysis, which is nice because parsing these logs does require some expertise not only with the structure of EventLogs but some PowerShell kung-fu to correctly parse them.

Mihir-Choudhary/EventHawk

EventHawk’s latest release has some good timing with Microsoft’s EventLogExpert listed right above this entry. EventHawk is also a Windows Event Log parsing tool, but it has a lot more bells and whistles and a much cleaner user interface for analysis. I love the ATT&CK integration module, so it provides a much more enriched view into EventLogs both for quick analysis and for learning.

beenuar/AiSOC

Self-contained AI SOC you can run locally or on a hosted cloud environment such as Fly, AWS or K8s. It ships an MCP server and exposed tools to call the models and perform the translation layer. These repositories generally look like lab environments or someone’s research project, but this is the first one I’ve found that has a seriously impressive amount of features, toolsets, integrations and a healthy development community.

Knowing what good looks like in agentic security

Discussion about this post

Ready for more?