Common Lisp for Data Scientists: A Field Report From My “Tidyverse Weekend” (Powered by Antigravity)

Gwang-Jin Kim

15 Jan 2026 • 9 min read

Rebuilding Tidyverse with Common Lisp

There’s a very specific kind of feeling you get when you open a dataset and your brain immediately wants to do the same ten moves it always does:

peek at the columns
clean up a few names
filter down to the interesting subset
reshape wide ↔ long
join in some reference table
summarize
export something a colleague can open without installing a toolchain from 2009

If you’re a data person, that muscle memory usually spells R + Tidyverse (or Python + pandas, depending on how much pain you enjoy per line of code).

And yet… I keep wanting to do this work in Common Lisp.

Not because I’m nostalgic. Not because I enjoy suffering.

But because Lisp has a superpower that data tooling rarely gets to exploit properly:

The language itself is the extension mechanism.

Macros. DSLs. Real composability. No fake metaprogramming. No ritualistic “eval(parse(text=…))” energy. Just the actual thing.

So I ran an experiment:

What if we rebuild “enough Tidyverse” in Common Lisp to be useful for daily data work — and we do it fast, openly, and iteratively?

I used antigravity for vibecoding, and this article is a field report: what I built, how it fits together, how to try it, and how you can jump in without having to decode my brain first.

The premise: we don’t need 100% Tidyverse — we need “daily useful”

Tidyverse is huge. And it’s not just APIs; it’s culture. It’s a workflow that feels like a fluent conversation with your data.

But here’s the truth most re-implementations miss:

You don’t need full coverage.

You need:

the 20% of verbs you use 80% of the time
a consistent data model
pipelines that don’t feel like punishment
IO that handles real-world files
extensibility so the community can grow it naturally

So my goal is not “Tidyverse, but in Common Lisp, perfect and complete.”

My goal is: Common Lisp that a data scientist can actually use.

If a function is rarely used, we skip it.

If nobody misses it, it stays omitted.

If somebody misses it, we add it.

That’s the whole strategy.

What I built in a few days

Here’s the current set of packages (all under my GitHub account):

cl-excel — read/write Excel tables
cl-vctrs-lite — a small core inspired by vctrs
cl-tibble — “tibbles” (pleasant data frames)
cl-dplyr — data manipulation verbs
cl-tidyr — reshaping / preprocessing
cl-readr — CSV/TSV read/write
cl-forcats — categorical helpers
cl-lubridate — date/time convenience
cl-stringr — smoother string work

And each repo includes SPEC.md (what the package tries to be) and AGENTS.md (how to vibecode/contribute in a structured way).

So if you want to improve something, you don’t have to reverse-engineer intent from a pile of commits. You get the “why” and the “how” up front.

The vibe: a “data stack” that feels like Lisp, not a ported museum exhibit

A port can easily become uncanny: it looks like Tidyverse, but it behaves like a translation.

I’m aiming for something else:

Keep the mental model (verbs, pipelines, tidy data ergonomics)
Use Lisp strengths (macros, extensible generic functions, real DSLs)
Stay practical (IO, table printing, simple entry points)

Also: I’m not translating rlang.

In R, rlang exists because R needs a lot of scaffolding to simulate what Lisp can do natively.

In Common Lisp, we already have the big guns. If we need non-standard evaluation patterns, we can build them directly and cleanly.

A quick tour: what “daily data work” looks like

Let’s do the most boring, realistic thing imaginable:

read a CSV
clean a column or two
filter
group & summarize
export to Excel

That boring workflow is exactly what decides whether a toolchain lives or dies.

The workflow (conceptually)

cl-readr gets data in and out of delimited files
cl-tibble gives you a pleasant tabular object
cl-dplyr manipulates it with verbs
cl-stringr / cl-lubridata / cl-forcats provide the “small sharp tools”
cl-excel writes the result into something your collaborator will actually open

A “shape” of code you can expect

I’m keeping this hands-on, so here’s a representative sketch of the style I’m building toward. The exact function names may evolve as the repos settle, but the idea is stable:

;; Pseudo-realistic example of the intended workflow style

(defparameter *df*
  (readr:read-csv "sales.csv"))

(defparameter *clean*
  (-> *df*
      (dplyr:mutate :region (stringr:str-to-upper :region))
      (dplyr:filter (>= :revenue 1000))
      (dplyr:group-by :region)
      (dplyr:summarise :n (dplyr:n)
                       :total (dplyr:sum :revenue))
      (dplyr:arrange (dplyr:desc :total))))

That -> pipeline style (and placeholder-like ergonomics) is intentional: you should be able to read it top-to-bottom like a recipe, not like a nested-parentheses archaeological dig.

Then exporting:

(excel:write-table "report.xlsx" *clean* :sheet "Summary")

This is the “data scientist reality test.”

If this feels smooth, people will use it. If it feels like homework, they won’t.

Why antigravity made this feasible

Let’s be honest: building a family of coherent libraries is usually a slow grind because you’re constantly context-switching:

design decisions
naming
docs
tests
edge cases
“what should the idiomatic API be?”
“how do I keep it consistent across packages?”

This is where vibecoding (with antigravity) shines if you constrain it correctly.

My rule was simple:

Write SPEC.md first (what the package is, what it is not)
Write AGENTS.md (how contributions should happen, conventions, tests, style)
Generate code iteratively in small steps
Keep APIs boring and composable
Prefer “add” over “replace” (backwards compatibility is a feature)

The result is not “AI wrote a library.”

The result is: I could move at the speed of design, not the speed of boilerplate.

And that is the only reason this many repos exist after a couple of days.

The hidden engineering decision: a small core first (vctrs-lite → tibble → verbs)

If you’ve ever looked at Tidyverse internals, you know the magic isn’t only in dplyr.

A lot of stability comes from foundational ideas:

vectors that behave consistently
predictable recycling rules
well-defined missingness
robust type conversions
a data frame structure that prints nicely and doesn’t surprise you

That’s why cl-vctrs-lite exists.

Then cl-tibble builds on that, and the rest can treat “a table” as a stable thing.

This is the difference between:

“a bunch of functions that kind of work”and
“a stack that can grow without collapsing under its own edge cases”

How you can try it quickly

Everything is on GitHub under this account:

https://github.com/gwangjinkim/

And the repos include installation notes (Quicklisp/local projects, ASDF load paths, etc.). The intent is: clone → load system → run examples → file an issue when something feels off.

If you’re a Common Lisp person, you already know the fastest way to help isn’t heroic refactors.

It’s:

run it on your machine
try a realistic dataset
hit a rough edge
open an issue with a tiny reproduction
optionally add a test + PR

Which is why I’m also adding an Issues template, because “comment bait” is not just marketing — it’s how you surface real use cases early.

What I want from the community (and what I don’t)

I want:

people to try it on real data
complaints about missing verbs you genuinely miss
suggestions for a more lispy DSL (as additions, not replacements)
small PRs that improve consistency, tests, docs, or edge cases
package authors to tell me where I’m reinventing something that already exists

I don’t want:

perfection paralysis
ideology wars about parentheses
“this should be rewritten completely” drive-by comments
a purity contest that makes the project unshippable

The point is to get a usable toolkit into people’s hands, then evolve it in public.

The pitch (without hype): why this could actually matter

Common Lisp already has the ingredients:

a serious, highly optimized compiler (SBCL)
interactive development
macros that can create truly ergonomic data DSLs
generic functions that make extensibility feel natural
a community that understands “build small, compose, iterate”

What it’s missing is a default daily-driver story for data work.

A “Tidyverse-ish” stack for Common Lisp doesn’t have to be perfect.

It just has to be good enough that a data scientist can say:

“Yes. I can do my normal workflow here. And it feels… surprisingly nice.”

That’s the bar.

And honestly, it’s a very reachable bar.

If you want to join: pick one sharp improvement

If you’re curious and want to contribute, here are high-leverage targets:

one real dataset test (CSV → transform → export)
printing / formatting improvements (tibbles live or die by ergonomics)
edge cases around missing values
joins (always joins)
a small set of “most-missed” helpers you personally use weekly

Or just file issues. Seriously. Issues are how this becomes real.

Repos

Here’s the hub again:

If you want a starting point, the simplest on-ramp is usually:

cl-readr + cl-tibble + cl-dplyrand then cl-excel for the “export to humans” step.

Appendix: How to try it in 5 minutes (Roswell-only, fastest path)

The fastest way from zero to hero (running SBCL with all the packages installed) is via Roswell, a Common Lisp version and package manager.

1) Install Roswell

# macos
brew install roswell

# linux (ubuntu/debian) - Windows users - use WSL2 (ubuntu)
sudo apt-get update
sudo apt-get install -y roswell

verify:

ros --version

2) Install the newest SBCL (via Roswell)

Roswell can install SBCL - the newest version - in two common ways:

# fast binary (recommended):
ros install sbcl-bin

# pure from source:
ros install sbcl

Select the sbcl then:

ros use sbcl-bin
# or: ros use sbcl

Roswell will also bootstrap Quicklisp automatically the first time it needs it.

3) Install the packages

Installing GitHub repository packages from Roswell is super easy:

ros install gwangjinkim/cl-readr 
ros install gwangjinkim/cl-tibble 
ros install gwangjinkim/cl-dplyr 
ros install gwangjinkim/cl-excel 
ros install gwangjinkim/cl-forcats 
ros install gwangjinkim/cl-stringr 
ros install gwangjinkim/cl-tidyr 
ros install gwangjinkim/cl-lubridate

Create a tiny CSV in your shell:

cat > /tmp/mini.csv <<'CSV'
region,revenue
eu,1200
eu,50
eu,1000
us,2000
us,700
us,1000
CSV

4) Start a SBCL session in Roswell

For convenience install rlwrap :

# macos
brew install rlwrap

# linux
sudo apt install rlwrap

And start your SBCL session via Roswell by:

rlwrap ros run

ros run starts SBCL inside Roswell. But SBCL doesn't have some niceties (that you can move around using arrows and get previous commands). Rlwrap wraps this terminal and provides the nicer interface.

The much better experience would be to have a SBCL session running in an IDE like Emacs. I described the setup here:

https://medium.com/data-science/how-to-set-up-common-lisp-ide-in-2021-5be70d88975b?sk=b178a230dc2ba72c2a5af7ea86857fc0

Try it in 2 minutes: from CSV to Excel, with a tiny “tidy” pipeline

Here’s the quickest way to see what the stack feels like. We’ll:

read a CSV
upper-case the region column
keep only rows with revenue >= 1000
group by region
compute n (row count) and total (sum of revenue)
sort by total descending
write the result to an .xlsx

1) Load the packages

(ql:quickload '(:cl-dplyr :cl-readr :cl-stringr :cl-tibble))

2) Run the pipeline

(defparameter *df*
  (readr:read-csv "/tmp/mini.csv"))

(defparameter *clean*
  (dplyr:-> *df*
    (dplyr:mutate :region (stringr:str-to-upper :region))
    (dplyr:filter (>= :revenue 1000))
    (dplyr:group-by :region)
    (dplyr:summarise :n (dplyr:n)
                     :total (dplyr:sum :revenue))
    ;; sorting helper may evolve; if desc isn't available yet,
    ;; arrange also accepts a spec like '(:total :desc)
    (dplyr:arrange '(:total :desc))))

3) Export to Excel

(excel:write-xlsx *clean* #p"~/Downloads/report.xlsx" :sheet "Summary")

At this point you should have a report.xlsx you can open immediately. It contains exactly what the pipeline says: rows filtered by revenue >= 1000, aggregated by region, with total being the sum and n counting how many rows contributed — then ordered by total descending.

(Yes, I'm including a screenshot because nothing builds trust like an actual file that opens.)

Excelfile report.xlsx - screenshot by me.

“Can I make it look even more like Tidyverse?”

You can, but with the usual Lisp tradeoff: convenience vs. namespace hygiene.

If you really want the ultra-compact style, you can import symbols into your current package:

(ql:quickload '(:cl-dplyr :cl-readr :cl-stringr :cl-tibble :cl-excel))
(use-package '(:cl-dplyr :cl-stringr :cl-excel))

Then the same pipeline can look like this:

(defparameter *df* (readr:read-csv "/tmp/mini.csv"))

(defparameter *clean*
  (-> *df*
      (mutate :region (str-to-upper :region))
      (filter (>= :revenue 1000))
      (group-by :region)
      (summarise :n (n)
                 :total (sum :revenue))
      (arrange '(:total :desc))))

(write-xlsx *clean* #p"~/Downloads/report1.xlsx" :sheet "Summary")

That’s the “I’m doing data work like with R”.

But: importing lots of common names can collide with other packages (or even implementation-provided symbols). You’ve already seen real examples like cl-readr:read-file vs. cl-excel:read-file, or cl-dplyr:rename colliding with implementation symbols (sb-ext:rename).

The direction from here: one integration package, clean namespaces, no package name conflicts

The next step is an integration system: cl-tidyverse.

The job of cl-tidyverse won’t be “yet another layer.” It will be the boring, valuable thing:

load the whole stack in one shot
provide a curated user-facing package that resolves name conflicts intentionally
keep the underlying libraries honest (small, composable, and safe to load together)

Inside cl-dplyr, I’m already steering toward a conflict-proof approach by keeping the truly safe public API in dotted verb forms (.mutate, .filter, .group-by, etc.) and dotted helpers (.sum, .min, .max, …). The pipeline macro can then rewrite “pretty” verbs into the dotted ones internally. That lets you write clean code while keeping the exported surface conservative.

That’s the overall philosophy here: make the common path delightful, and make the system robust enough to grow.

If you’re curious, try the snippet above on a real CSV you care about. If something feels off, file an issue with the smallest reproduction you can manage. That feedback is the entire point of doing this in public.

Common Lisp doesn’t need a miracle to be useful for data work. It needs a stack that’s practical, iterable, and slightly shameless about optimizing for daily ergonomics.

That’s what I’m trying to build in this project.