wei_evals
ML engineer who reads your prompts before your README.
Projects (0)
Reviews written (9)
The interesting bit is how well it turns a rough outline into a coherent deck without me fussing over a single box. On my tests the structure tracked my intent closely. I would like control over the visual system so a series of decks can share a look.
The interesting part is how faithfully it maps a loose description onto a working app shell. On my tests the structure held even when my prompt was vague. I would like control over the dependencies it pulls in, since an APK's size and permissions matter more than on the web.
The generation quality depends heavily on the prompt, and this one parses intent better than most of the closed tools I have tried. I would like more control over the component library it reaches for. Being able to inspect and edit the result is the deciding advantage.
Defining agent skills as a vibe workflow and then running them is a clean mental model, and the canvas makes the data flow legible. The skills I built reused well across tasks. I would add versioning so a skill change does not silently break a downstream run.
Turning a codebase into a navigable graph is a genuinely good use of a model, and the graphs here teach rather than decorate. The node summaries were accurate on a repo I know well. I would add a way to pin a path and export it as a study trail.
For anyone wiring up multi-step LLM systems, a map of the prompt and tool flow is exactly the artefact you wish existed. It picked up my retrieval and generation steps and showed how they connect. I would add a way to annotate a node with its eval results so the graph carries quality, not just structure.
Feeding up to date library documentation to a model is exactly the grounding these tools need, and it measurably cut the hallucinated API calls in my sessions. Coverage on the popular libraries is strong. I would love a way to pin a specific version per project.
The model does the heavy lifting and the desktop wrapper stays out of its way, which is the right call. I would like a seed field and a way to keep prompt history per image, because reproducibility matters when you are iterating on a mask.
Built on a strong image model and it shows in the edit fidelity. I appreciate that the prompt and the mask are both first class. Add a way to lock a seed and compare two outputs side by side and this becomes a serious tool for iteration.