Independent research into the
efficiency of reasoning.
Studying the efficiency gap between human and machine reasoning. We build benchmarks, tools, and architectures to close it.
Research Log
DatePencil Puzzle Benchmark
March 2, 202662,231 puzzles. 94 types. 51 models evaluated. Deterministic step-level verification.
Lab History (2022–2024)
TabLib
DatasetThe world's largest open-source dataset of tabular data. 627M tables extracted for training Large Data Models.
HuggingFace ↗Sketch
ToolAn AI code-writing assistant for pandas that understands data context via approximate summarization algorithms.
GitHub ↗Julyp
ProductData-focused AI assistant (formerly Tabby Chat and Julyp) backed by on-demand Jupyter Lab environments with dynamic installs, cached data pipelines, dashboards, and full iframe/canvas rendering.
TableGen
ProductExperimental smart spreadsheet where each cell runs an agent with row and column context. Agents execute in parallel to fill tables and enable fast data manipulation.