
Sign up to save your podcasts
Or


Flaky test data is now accessible through the sem-ai API, and this week also brings a skill that uses that data to fix the tests automatically.
Flaky test data is now generally available through the API. You can query your flakiest tests programmatically, sorted by number of disruptions, with failure timestamps and logs included. Find it on github.
A new sem-ai skill can identify and fix flaky tests end-to-end. The agent pulls the highest-disruption tests, gathers failure context, determines the root cause, and writes a fix. It then verifies the result, first by running tests locally, and if that’s not possible, by spinning up Semaphore test boxes to run the test across multiple machines in parallel. Running the test repeatedly across machines is especially important for flaky tests, since a single passing run isn’t enough to confirm a fix. Benchmarking on a real open source project with Claude Opus on high effort showed a cost of $1 to $1.50 per fix.
Four existing skills were improved with additional examples. Agents were occasionally not following skill instructions due to a lack of examples. Adding concrete examples improves adherence and makes sem-ai’s output more consistent across runs.
What’s Coming
User and organization management will be added to the API in an upcoming release. The team is also continuing to refine skills and commands based on usage patterns.
* Try sem-ai
* Try Semaphore Cloud
* All product news
Till the next time,
Pete Miloravac https://semaphore.io
By Semaphore5
22 ratings
Flaky test data is now accessible through the sem-ai API, and this week also brings a skill that uses that data to fix the tests automatically.
Flaky test data is now generally available through the API. You can query your flakiest tests programmatically, sorted by number of disruptions, with failure timestamps and logs included. Find it on github.
A new sem-ai skill can identify and fix flaky tests end-to-end. The agent pulls the highest-disruption tests, gathers failure context, determines the root cause, and writes a fix. It then verifies the result, first by running tests locally, and if that’s not possible, by spinning up Semaphore test boxes to run the test across multiple machines in parallel. Running the test repeatedly across machines is especially important for flaky tests, since a single passing run isn’t enough to confirm a fix. Benchmarking on a real open source project with Claude Opus on high effort showed a cost of $1 to $1.50 per fix.
Four existing skills were improved with additional examples. Agents were occasionally not following skill instructions due to a lack of examples. Adding concrete examples improves adherence and makes sem-ai’s output more consistent across runs.
What’s Coming
User and organization management will be added to the API in an upcoming release. The team is also continuing to refine skills and commands based on usage patterns.
* Try sem-ai
* Try Semaphore Cloud
* All product news
Till the next time,
Pete Miloravac https://semaphore.io