Summary of METR's predeployment evaluation of GPT-5.6 Sol

tomhipwell.co · Jun 26 · ✨ AI ·

I’m a massive nerd but I thought this was a really fun post. These security tilted models are very aware and know how to interact with/exploit the evaluation harness, the juice is about halfway…

What Students Say About Their AI Use

derekbruff.org · Jun 26 · ✨ AI ·

Cross-posted from my Intentional Teaching newsletter. The other day on Bluesky, education professor Jon Becker shared a few highlights from a panel he attended featuring high school students talking…

New Paper Alert (plus a fun talk involving youth + AI)

zephoria.org · Jun 26 · ✨ AI ·

The Project of AI is a world-building endeavor, wherein those who fund and develop AI systems both operate through and seek to sustain networks of power and wealth. Janet Vertesi, Alex Taylor, Ben…

Premium: Notes From The Bubble, Volume 1

wheresyoured.at · Jun 26 · ✨ AI ·

It’s been an incredibly long few weeks, and as a result my previously-planned Hater’s Guide just isn’t possible within what little time I have left in this week, which is why…

AI Made the Call, but Your Company Still Owns the Failure

kylereddoch.me · Jun 26 · ✨ AI ·

When an AI security tool misses an intrusion, blocks production, leaks sensitive data, or runs the wrong remediation, liability does not disappear into the algorithm. It follows the people and…

When AI Performs Causality Instead of Practicing It

testerstories.com · Jun 26 · ✨ AI ·

In my previous post, I talked about the cautionary aspect of AI hallucinating its own hallucinations. There’s a deeper element … More When AI Performs Causality Instead of Practicing It…

On Capital Market Constraints, Historical Parallels to the Current AI Moment, and More

paulkedrosky.com · Jun 26 · ✨ AI ·

I did a lengthy interview a little while back, and it is now out. Folks may find it a helpful restatement of some of my views on the current situation, from historical parallels to the AI paroxysm,…

California’s about to learn that you can’t tax a moving target

cautiousoptimism.news · Jun 26 · ✨ AI ·

And: OpenAI's delayed IPO, worker discontent, and Chinese AI! The post California’s about to learn that you can’t tax a moving target appeared first on Cautious Optimism.

While everyone talks about AI, design is gaining power

rogerwong.me · Jun 26 · ✨ AI ·

AI has made product work feel weirdly cheap in the middle and expensive at the edges. Another screen, prototype, or feature is easier to produce than it used to be. The harder work is deciding what…

Grok AI is reportedly a porn platform now, with over half its traffic tied to adult content

blog.quintarelli.it · Jun 26 · ✨ AI ·

perche’ non sono sorpreso ? il modello culturale e’ Biff Tannen Grok AI is for porn. Two former xAI employees estimate that well over half of all Grok traffic goes to pornographic images,…

Warning: Why You Must Delete Your Cursor Data Before the SpaceX/xAI Acquisition

christopherspenn.com · Jun 26 · ✨ AI ·

SpaceX/xAI has acquired Cursor. What does that mean for your data privacy? If you are NOT an enterprise customer, and you are using private data with Cursor, now is the time to find a competing…

Anthropic’s Cyber Research Suggests AI Is Reducing the Time Between a Patch and an Exploit

eido-askayo.blogspot.com · Jun 26 · ✨ AI ·

On May 22, June 3, and June 8, 2026 , Anthropic published three cyber research posts that looked like different stories. One was about exploit benchmarks. One mapped malicious AI use to the MITRE…

Rude awakenings: how users design wake words when virtual assistants fail

saulalbert.net · Jun 26 · ✨ AI ·

Download Transcript | Keywords: Virtual Assistants; AI; Wake Words; Summons; Turn design ‘Wake words’ such as “Alexa”, “OK Google” and “Hey Siri” have become commonplaces within everyday talk,…

关于规范使用人工智能工具:致《文艺研究》编辑部的一封信

hsu.cy · Jun 26 · ✨ AI ·

徐贲: 学术共同体真正应当追究的是责任(responsibility),而不是来源(origin)。因为来源从来都是混合的,而责任必须是明确的。

Build an OKF brain like mine!

mariehaynes.com · Jun 26 · ✨ AI ·

Standardizing knowledge for the future of AI agents My last piece on Google’s Open Knowledge Format (OKF) was one of the most popular I’ve ever published. Since then, I have been heads down working…

Paul Graham Flagged For AI Use

ninjasandrobots.com · Jun 26 · 🕸️ Web & Internet ·

Let me short-circuit the flames. He wasn’t using AI, but my attempts at trying to rid myself of AI slop in my feed reader flagged him as the worst offender. Is he? No. It just points out how hard…

And the Winner Is... (Best AI Award)

blog.elmerdata.ai · Jun 26 · ✨ AI ·

There is no clear best AI today, but one earns my award for professional knowledge work. Artificial intelligence has become the technology industry's favorite spectator sport. Every few months a new…

AI Makes Bad Product Decisions Look Like Finished Software

vincentschmalbach.com · Jun 26 · ✨ AI ·

AI tools make one software failure mode much easier to miss: a bad product decision can now arrive wrapped in a working...Read More... Source

Making something out of nothing

jackyan.com · Jun 26 · ✨ AI ·

I have admittedly taken a softer line on some “AI” gen since our clients use it, and combined with our awareness of how desktop publishing unfolded, and where Medinge Group’s own thinking is heading,…

I built a colleague who lives in my terminal

farrant.me · Jun 26 · 🧩 Programming ·

A couple of months ago I moved to a new team at work. The team had been running for a while before I joined, and there was a lot of context I didn't have — issues, discussions, strategy docs, repos…

What Will Save Design?

umber.me · Jun 26 · 🌊 Art & Design ·

UI execution is being outsourced, teams are shrinking. The industry is more of a mess than it looks. In uncertain times, shrinking the team and increasing efficiency through AI is precisely what will…

Your Agent Deserves Logs

build.ms · Jun 26 · ✨ AI ·

How structured logs helped Codex fix a year-old bug, and why logs are the key to unlocking autonomous workflows.

How to Build a Memory Your AI Agents Can Actually Reuse

louisbouchard.ai · Jun 26 · ✨ AI ·

The useful part is not giving agents more context. It is making your research, notes, and sources available again in the next session.

OpenAI Enters the Chip Race, and Alibaba Allegedly Cheated!

brianchristner.io · Jun 26 · ✨ AI ·

OpenAI reveals its first custom AI chip, IBM extends Moore's Law, Anthropic accuses Alibaba of stealing Claude, and $27M spent on one congressional race.

The Endgame

peterbraden.co.uk · Jun 26 · ✨ AI ·

The Endgame We live in interesting times. Like many people, I work in an industry that has become absorbed by an existential angst - the robots have invaded, proved surprisingly capable, and it is…

Local Open-Weight LLMs in Coding Harnesses

sebastianraschka.com · Jun 26 · ✨ AI ·

Short note on trying local open-weight LLMs across Qwen-Code, Codex, and Claude Code harnesses.

AI Does Not Replace the Work. It Moves It.

matthiasroder.com · Jun 26 · ✨ AI ·

AI does not make creative work disappear. It moves the work toward framing, judgment, iteration, and responsibility.

Disruptive Technologies in the Digital Economy, Week 5 – Bias? In my AI?

blog.katemonkey.com · Jun 26 · ✨ AI ·

Where the entire discussion about AI isn't about how great it is, but how biased and destructive it can be. I know, I'm shocked too.

Neural What? My LLM bill is down to a sixth - by no longer paying per token.

coinerella.com · Jun 26 · ✨ AI ·

You might have read recently on this blog that my procurement preferences for hank.parts are basically EU,(self hosted) open source, UK/CH,Rest of the world,in this order. This article is a…

PII guardrails for .NET applications - Part 2: Agent Framework agents

strathweb.com · Jun 26 · 🧩 Programming ·

In part one of this little series I introduced TasmanianDevil, a standalone, offline PII detection and de-identification engine for .NET. We saw it on its own - detecting and validating PII,…

Lara Won’t Promise Not to Train on Your Translations

loekalization.com · Jun 26 · ✨ AI ·

I almost shipped it. The integration was finished. Cattitude, our own CAT tool, had full support for Lara, the AI translation engine from Translated that is replacing the old ModernMT. The SDK was…

A Quick Thought About Brain Augmentation Tools

scottnesbitt.online · Jun 26 · ✨ AI ·

A short musing on what I find wrong with so-called second brain/brain augmentation tools

The Ever-Agreeing Genie

schrottner.at · Jun 26 · ✨ AI ·

In folklore, a genie grants wishes without judgment — it gives you exactly what you asked for, whether or not it is what you needed. The danger was never the genie. It was the wish. Anthropic’s…

How I use Generative AI in My Work

hendrik-erz.de · Jun 26 · ✨ AI ·

Locally-running generative AI has made considerable jumps in quality in the past three years. I think it is finally time to evaluate such models in terms of how they can help researchers do their…

PII guardrails for .NET applications - Part 1: TasmanianDevil library

strathweb.com · Jun 26 · 🧩 Programming ·

A few months ago I introduced AgentGuard, a library for declarative guardrails and safety controls for .NET AI agents. One of the rules it shipped with from day one was PII redaction, but back then…

Still Holds: Gall’s Law

jarango.com · Jun 26 · ✨ AI ·

AI took away the constraints that brought discipline to MVPs. You must impose them yourself.

Sometimes the Cheap Model Costs More

blog.jaystuart.dev · Jun 26 · 🧩 Programming ·

I’ve been using an AI orchestration framework I built called Mozart.You can check it out here: https://github.com/jstuart0/mozart-orchestrationThe idea behind Mozart is pretty simple. Instead…

<![CDATA[Signs of AI writing]]>

its.mw · Jun 26 · ✨ AI ·

One has to be aware that human speech and writing is being influenced by LLMs, and thus they are becoming more similar. This was already evident in 2024, as shown by a study that detected a…

Incident Report: CVE-2026-LGTM

nesbitt.io · Jun 26 · 🛡️ Sysadmin & Security ·

A series of unfortunate agents.

The Thing We All Obviously Want

kmicinski.com · Jun 26 · ✨ AI ·

Generated by AI&mdash;notice the perspective. Over the past year, we have seen the rapid development of AI-assisted programming to an astounding degree. Even five years ago, fully-automated program…

Evals: a plain-English map of the types worth knowing

mager.co · Jun 26 · ✨ AI ·

Everyone says 'evals' and means ten different things. Here's a quick tour of the main types — what each one checks, and when it's worth the cost.

Do NOT Hallucinate!

aimakesmesad.com · Jun 26 · ✨ AI ·

It is common knowledge amongst AI-enhanced superhuman programmers that the best way to prevent your AI coding agents from hallucinating is to tell them not to hallucinate . The 10x programmers also…

Why extreme risk cannot be measured

modelsandrisk.org · Jun 26 · 🎲 Economy ·

Can we measure extreme financial risk? Is financial stability only a question of technology and data? Many seem to think so. I disagree.

Frontier AI Models Evaluation Benchmarks

kharshit.github.io · Jun 26 · ✨ AI ·

A guide to frontier AI model benchmarks in 2026, covering MMLU, GPQA Diamond, HLE, SWE-bench, ARC-AGI-2, MMMU, Arena Elo, etc. What each benchmark measures, which models lead, why scores saturate.

Inside the Git Hooks: Tagging Every AI Agent Commit (Part 3)

jonnyzzz.com · Jun 26 · 🧩 Programming ·

Part 3 — the build for the lighter solution. How a set of git hooks stamps a session id on every commit an AI Agent makes, survives squash and rebase, and captures each push — all best-effort, all…

AI inference is obviously profitable

seangoedecke.com · Jun 26 · ✨ AI ·

Many people claim that AI inference is unprofitable to serve, and thus must be subsidized by an ocean of dumb money from investors who believe that some future AI model will come to dominate the…

AI Observability Review for LLM, RAG, and Agent Systems

soumendrak.com · Jun 26 · ✨ AI ·

A focused review for teams shipping LLM, RAG, and agent systems: trace coverage, evaluation gaps, token cost visibility, failure modes, and OpenTelemetry instrumentation plan.

AI and greener choices

hidde.blog · Jun 26 · ✨ AI ·

The earth is heating up and AI isn't helping. It drives major increases in electricity use, water use and CO2 emissions. Yet, industry and governments alike seem keen to leverage the latest tech. Can…

What One Year in AI Security and Governance Changed About How I See AI

codebynight.dev · Jun 26 · ✨ AI ·

After one year working around AI security and governance, I trust flashy AI demos less and pay more attention to data, permissions, discovery, and the boring systems around AI.