Python API¶
pluckit's Python API is built around three types: Plucker (the entry
point), Selection (a lazy query chain), and Pluckin (the extension
point). Everything else is either a mutation class or a convenience
wrapper around these.
Plucker¶
The entry point. Wraps a DuckDB connection, loads the sitting_duck
extension on first use, and exposes methods for finding, viewing, and
mutating code.
Constructor¶
Plucker(
code: str | list[str] | None = None,
*,
plugins: list[Pluckin | type[Pluckin]] = (),
repo: str | None = None,
db: duckdb.DuckDBPyConnection | None = None,
cache: bool | str = False,
)
| Parameter | Description |
|---|---|
code |
Glob pattern(s) or explicit file list for the source corpus |
plugins |
Pluckin classes or instances to register |
repo |
Repository root for relative paths (default: current working directory) |
db |
An existing DuckDB connection to reuse (default: create a fresh one) |
cache |
Persistent AST cache. True → .pluckit.duckdb in repo root. str → custom path. |
Persistent AST caching. When cache=True, pluckit opens a
persistent DuckDB file (.pluckit.duckdb by default) and materializes
read_ast output into per-pattern tables. Subsequent queries against
the same pattern skip re-parsing. File-stat mtime checks drive
incremental invalidation — only modified files are re-parsed. See
[tool.pluckit] cache = true in pyproject.toml for the config path.
Methods¶
find(selector: str) -> Selection¶
Run a selector against the configured code corpus and return a lazy
Selection:
view(selector: str, *, format: str = "markdown") -> str¶
Render matched nodes as markdown (the AstViewer plugin must be
registered). Returns a :class:View object — see below.
source(glob: str) -> Source¶
Create a Source handle for ad-hoc queries against a different glob
without creating a whole new Plucker.
fts_collection(name: str) -> FtsCollection¶
Get a handle to a named FTS collection. Requires fledgling. Returns an
FtsCollection with .create(query) and .search(query) methods:
col = pluck.fts_collection("tools")
col.create("""
SELECT 'search' AS id,
'full-text BM25 search over code and docs' AS text,
map{'kit': 'fledgling'} AS metadata
""")
results = col.search("search")
for id, text, metadata, score in results:
print(f"{score:.2f} {id}: {text}")
Each collection gets its own BM25 index with independent IDF statistics.
The fixed schema is (id TEXT, text TEXT, metadata MAP(TEXT, TEXT)).
Creating a collection is idempotent — it replaces the existing table and
index if the collection already exists.
pluckins (property) -> list[Pluckin]¶
The loaded pluckin instances for this Plucker, in load order. Public since 0.13.0
(replaces reaching into the private _registry). Use it to introspect which plugins
are active or to pull tool definitions a pluckin contributes:
for p in pluck.pluckins:
print(p.name)
# e.g. a pluckin may expose squackit_tools for MCP integration
Selector¶
A validated, serializable CSS-over-AST selector string. Subclasses
str so it's backward-compatible everywhere a bare selector string
is used today — all existing code like pluck.find(".fn:exported")
keeps working.
from pluckit import Selector
s = Selector(".fn:exported")
assert isinstance(s, str)
assert s.is_valid
# Invalid selector resolving to nothing
Selector(".nonexistent_taxonomy_class").validate() # raises PluckerError
Supports the standard serialization protocol:
| Method | Purpose |
|---|---|
.to_dict() / .from_dict(d) |
{"selector": "..."} dict form |
.to_json() / .from_json(s) |
JSON string |
.to_argv() / .from_argv(tokens) |
CLI token list |
.validate() / .is_valid |
Compile-time check |
Selection¶
A lazy DuckDB relation. Every method on Selection returns another
Selection — nothing materializes until you call a terminal method.
Query composition¶
# Refine a selection
tests = pluck.find(".fn[name^=test_]")
without_try = tests.filter(".fn:not(:has(.try))")
# Navigate
classes = pluck.find(".cls")
methods = classes.descendants(".fn")
| Method | Description |
|---|---|
find(sel) |
Refine the selection with another selector |
filter(sel) |
Alias for find; semantic clarity |
descendants(sel) |
Matches anywhere under the selection |
children(sel) |
Direct children only |
ancestors(sel) |
Walk up the AST |
siblings(sel) |
Nodes sharing a parent |
first(), last(), nth(n) |
Positional selection |
limit(n), offset(n) |
Slice the result set |
Terminal methods¶
These materialize the relation and return Python data:
| Method | Returns | Description |
|---|---|---|
count() |
int |
Number of matched nodes |
names() |
list[str] |
Identifier names (deduplicated) |
files() |
list[str] |
Distinct source files containing matches |
rows() |
list[Node] |
Full AST rows with all sitting_duck cols |
read() |
list[str] |
Raw source text of each matched node |
to_df() |
pd.DataFrame |
Pandas DataFrame (requires pandas) |
Mutation methods¶
Every mutation method returns a refreshed Selection (so you can chain
further queries, though most callers don't). All mutations are
transactional at the invocation level — the enclosing call is atomic,
and multiple fluent mutations are independent transactions.
| Method | Description |
|---|---|
replaceWith(text) |
Replace entire matched node |
replaceWith(old, new) |
String-level replace within matched node |
prepend(text) |
Prepend lines to the matched body |
append(text) |
Append lines to the matched body |
insertBefore(anchor, text) |
Insert lines before an anchor selector |
insertAfter(anchor, text) |
Insert lines after an anchor selector |
wrap(before, after) |
Wrap with surrounding text |
unwrap() |
Inverse of wrap |
addParam(param) |
Add a parameter to every matched function |
removeParam(name) |
Remove a parameter by name |
addArg(expr) |
Add an argument to every matched call |
removeArg(name) |
Remove a keyword argument by name |
rename(new_name) |
Rename the first name occurrence |
clearBody() |
Replace body with pass / {} |
remove() |
Delete the matched node |
patch(content) |
Apply a unified diff or raw replacement |
Example:
pluck.find(".fn#validate_token").replaceWith(
"return None",
"raise ValueError('token required')",
)
pluck.find(".fn:exported").addParam("timeout: int = 30")
# Apply a unified diff
diff_content = open("refactor.patch").read()
pluck.find(".fn#handler").patch(diff_content)
# Apply raw replacement text (like replaceWith, but from external content)
new_code = open("patches/new_handler.py").read()
pluck.find(".fn#handler").patch(new_code)
patch(content) auto-detects unified diffs (by leading --- or
diff --git) vs raw replacement text. For diffs, context lines must
match exactly or a PluckerError is raised.
Reading matched source¶
for node in pluck.find(".fn#validate").rows():
print(f"{node.file_path}:{node.start_line}")
print(node.source_text)
Selection.rows() returns Node dataclasses with all of sitting_duck's
columns — node_id, type, semantic_type, name, start_line,
end_line, parent_id, flags, and the native extraction columns
(signature_type, parameters, modifiers, annotations).
Module-level shortcuts¶
For one-shot queries you don't need a persistent Plucker for:
from pluckit import view, find
print(view(".fn#main { show: outline; }", code="src/**/*.py"))
for path, line, name in find(".fn:exported", code="src/**/*.py"):
print(f"{path}:{line}:{name}")
These create an ephemeral Plucker, run the query, and tear it down.
View and ViewBlock¶
Plucker.view() and the module-level pluckit.view() return a View
object — not a plain string. A View behaves like a string for the
common "print the rendered markdown" case, but also exposes structured
metadata about the blocks it contains.
from pluckit import Plucker, AstViewer, View, ViewBlock
pluck = Plucker(code="src/**/*.py", plugins=[AstViewer])
result: View = pluck.view(".fn:exported { show: signature; }")
# Rendered output — backward compatible with the v0.1 bare-string return
print(result) # prints the markdown
print(str(result)) # same thing
print(result.markdown) # explicit accessor
assert "def authenticate" in result # __contains__ checks the markdown
# Structured access
print(result.files) # ['src/auth.py', 'src/users.py', ...]
print(len(result)) # number of blocks
for block in result: # iterate as ViewBlock
print(block.name, block.start_line, block.show)
# JSON export
import json
print(json.dumps(result.to_dict(), indent=2))
View methods and properties¶
| Member | Type | Description |
|---|---|---|
markdown |
str |
Full rendered output |
blocks |
list[ViewBlock] |
Fresh list of contained blocks |
files |
list[str] |
Distinct file paths, in first-seen order |
query |
str |
The query string that produced this view |
format |
str |
Output format (markdown in v0.1) |
to_dict() |
dict |
JSON-serializable representation |
str(v) / print |
str |
Same as .markdown |
len(v) |
int |
Number of blocks |
bool(v) |
bool |
False for empty views |
for b in v |
Iterator[ViewBlock] |
Iterate blocks in render order |
v[i] / v[a:b] |
ViewBlock / list |
Indexing and slicing |
"s" in v |
bool |
Substring check against .markdown |
ViewBlock fields¶
Each ViewBlock is a frozen dataclass with:
| Field | Type | Description |
|---|---|---|
markdown |
str |
Rendered content for this block |
rule |
Rule |
The query rule that produced it |
show |
str |
Resolved show mode (body, signature, …) |
file_path |
str \| None |
Source file — None for aggregates |
start_line |
int \| None |
Start line — None for aggregates |
end_line |
int \| None |
End line — None for aggregates |
name |
str \| None |
Identifier name, if any |
node_type |
str \| None |
AST node type (function_definition, …) |
language |
str \| None |
Source language |
is_aggregate |
bool |
True for multi-match signature tables and such |
Aggregate blocks. When a rule like .fn { show: signature; } matches
many nodes, the viewer auto-collapses the output into a single markdown
table. That collapse produces a single ViewBlock with is_aggregate =
True and file_path, start_line, end_line all None. Use
block.is_aggregate (or block.file_path is None) to distinguish
per-node blocks from aggregates.
Isolated¶
A scope-aware extraction of a code block with its dependencies.
Returned by Selection.isolate(). Identifies which identifiers the
block reads from outside its own scope, classifies each as imported /
parameter / builtin, and renders the result as a standalone function
or a Jupyter cell.
from pluckit import Plucker
pluck = Plucker(code="src/**/*.py")
iso = pluck.find(".fn#outer").isolate()
iso.params # ['helper'] — free variables → function params
iso.imports # ['import json'] — import statements to prepend
iso.builtins_used # ['len'] — builtins used (informational)
iso.body # original block text
print(iso.as_function("extracted")) # imports + def extracted(helper): ...
print(iso.as_jupyter_cell()) # imports + "# Required in scope: helper" + body
Fields¶
| Field | Type | Description |
|---|---|---|
body |
str |
Source text of the extracted block |
file_path |
str |
Original source file |
start_line |
int |
Start line (1-indexed) |
end_line |
int |
End line (1-indexed, inclusive) |
language |
str |
Source language (e.g., "python") |
params |
list[str] |
Free-variable names → function parameters |
imports |
list[str] |
Import statements the block depends on |
builtins_used |
list[str] |
Python builtins the block uses |
Renderers¶
as_function(name="extracted")— standalone function: imports +def name(params)+ bodyas_jupyter_cell()— imports +# Required in scope: ...comment + inline body (no function wrap)
Serialization¶
to_dict / from_dict / to_json / from_json for MCP transport.
Limitations (v1)¶
- Handles the first match only; for multi-match selections,
iterate calls with
.limit(1)narrowing - Detects module-level imports but not conditional / relative imports in edge cases
- Assumes Python semantics for builtins (
dir(builtins)+self/cls)
Chain¶
The Chain class is the programmatic equivalent of the CLI's chain
syntax. It represents a source, a list of steps, and optional plugin
configuration. Chains can be built from Python dicts, JSON strings, or
parsed directly from sys.argv-style token lists.
ChainStep¶
A single operation in a chain:
from pluckit.chain import ChainStep
step = ChainStep(op="find", args=[".fn:exported"])
step = ChainStep(op="filter", kwargs={"min_lines": 10})
step = ChainStep(op="count")
| Field | Type | Description |
|---|---|---|
op |
str |
Operation name (e.g. find, count) |
args |
list[str] |
Positional arguments (default: []) |
kwargs |
dict |
Keyword arguments (default: {}) |
Chain¶
from pluckit.chain import Chain, ChainStep
chain = Chain(
source=["src/**/*.py"],
steps=[
ChainStep(op="find", args=[".fn:exported"]),
ChainStep(op="count"),
],
plugins=["AstViewer"],
)
| Field | Type | Description |
|---|---|---|
source |
list[str] |
File paths or glob patterns |
steps |
list[ChainStep] |
Ordered list of operations |
plugins |
list[str] |
Plugin names to load (default: []) |
dry_run |
bool |
Preview changes without writing (default: False) |
diff |
bool |
Output mutations as unified diff (default: False) |
Chain.MUTATION_OPS (class attribute) -> frozenset[str]¶
The public, stable set of operation names that mutate source (public since 0.13.0;
was _MUTATION_OPS). A chain containing any of these is a mutating chain — callers that
gate writes (e.g. squackit blocks mutations unless allow_mutations=True) check membership
rather than hard-coding the list:
from pluckit.chain import Chain
is_mutation = any(step.op in Chain.MUTATION_OPS for step in chain.steps)
The set: wrap, unwrap, append, prepend, insertBefore, insertAfter, replaceWith,
remove, rename, patch, addArg, removeArg, addParam, removeParam.
Construction methods¶
Chain.from_dict(data: dict) -> Chain¶
Build a chain from a dictionary (the same structure as the JSON I/O format described in the CLI reference):
chain = Chain.from_dict({
"source": ["src/**/*.py"],
"steps": [
{"op": "find", "args": [".fn:exported"]},
{"op": "count"},
],
})
Chain.from_json(json_string: str) -> Chain¶
Parse a JSON string into a chain:
chain = Chain.from_json('{"source": ["src/**/*.py"], "steps": [{"op": "find", "args": [".fn:exported"]}, {"op": "count"}]}')
Chain.from_argv(tokens: list[str]) -> Chain¶
Parse a CLI-style token list into a chain. This is the same parsing the CLI entry point uses:
Execution¶
chain.evaluate() -> Any¶
Run the chain and return the result. The return type depends on the
terminal operation: int for count, list[str] for names, and
so on.
chain = Chain.from_argv(["src/**/*.py", "find", ".fn:exported", "count"])
result = chain.evaluate()
print(result) # e.g. 42
Serialization¶
chain.to_dict() -> dict¶
Convert the chain to a JSON-serializable dictionary:
data = chain.to_dict()
# {"source": ["src/**/*.py"], "plugins": [], "steps": [{"op": "find", "args": [".fn:exported"]}, {"op": "count"}]}
chain.to_json() -> str¶
Serialize the chain as a JSON string:
Pagination¶
Chains support limit, offset, and page as ordinary chain ops.
When any of them appear in a chain, evaluate() attaches pagination
metadata to the result:
chain = Chain(
source=["src/**/*.py"],
steps=[
ChainStep(op="find", args=[".fn"]),
ChainStep(op="page", args=["0", "20"]), # page 0, size 20
ChainStep(op="names"),
],
)
result = chain.evaluate()
result["page"]
# {
# "offset": 0,
# "limit": 20,
# "total": None, # lazy — call with_total() to fill in
# "has_more": True, # heuristic — True if data length >= limit
# }
result["source_chain"] # the chain with pagination ops stripped — for "give me more"
has_more heuristic¶
data_length < limit→ definitivelyFalse(got fewer than asked — no more)data_length >= limit→ conservativelyTrue(might be the last page, but we can't know withouttotal)limitisNone→has_moreisNone(unknown)
Chain.with_total(result) — compute the exact total on demand¶
Chain.with_total(result) # mutates result in place, returns it
result["page"]["total"] # now an int
result["page"]["has_more"] # now exact
Runs one extra SQL query against the source_chain. No-op if the
result has no pagination metadata.
Navigation helpers¶
Each returns a new Chain ready to evaluate, or None when
navigation isn't possible (no more pages / already at offset 0 /
result wasn't paginated).
| Method | Returns |
|---|---|
Chain.next_page(result) |
Chain for the next page (or None) |
Chain.prev_page(result) |
Chain for the previous page (or None) |
Chain.goto_page(result, n) |
Chain for page n (0-indexed) |
result = chain.evaluate()
if next_chain := Chain.next_page(result):
next_result = next_chain.evaluate()
Edge cases¶
page N SIZE+ subsequentlimit/offset—pagesets both offset and limit; a laterlimitoroffsetoverrides the corresponding value. Well-defined but confusing — use one pattern or the other, not both.limitbefore a mutation —find .fn limit 5 rename barrenames only the first 5 functions. The Selection contains 5 rows at mutation time, so the mutation applies to those 5. Correct but may surprise callers who expectedlimitto apply only to terminal output.
Round-trip example¶
from pluckit.chain import Chain
# Build from CLI tokens
chain = Chain.from_argv(["src/**/*.py", "find", ".fn:exported", "names"])
# Inspect as JSON
print(chain.to_json())
# Reconstruct from the dict form
chain2 = Chain.from_dict(chain.to_dict())
# Execute
result = chain2.evaluate()
for name in result:
print(name)
Plugins¶
pluckit is composable. Core capabilities live on Selection; anything
that depends on extra infrastructure moves into a plugin.
from pluckit import Plucker, AstViewer, Calls, History, Scope
pluck = Plucker(
code="src/**/*.py",
plugins=[
AstViewer, # viewer with { show: ... } declarations
Calls, # call graph (callers / callees / references)
History, # git history via duck_tails
Scope, # scope-aware queries (defs / refs / enclosing scope)
],
)
Writing a plugin¶
A plugin is a subclass of pluckit.pluckins.Pluckin:
from pluckit.pluckins import Pluckin
class WordCount(Pluckin):
name = "wordcount"
methods = {
"word_count": lambda self: sum(
len(text.split()) for text in self.read()
),
}
pseudo_classes = {
":long": "end_line - start_line > 50",
}
| Class attribute | Purpose |
|---|---|
name |
Unique plugin identifier |
methods |
Dict of method name → function to install on Selection |
pseudo_classes |
Dict of :name → SQL WHERE fragment |
upgrades |
Dict of method name → function to override an existing method |
setup(ctx) |
Optional hook called when the plugin is registered |
Plugins can also register new semantic-type aliases by updating
pluckit.selectors.ALIASES, but that's considered advanced — most
plugins only need methods and pseudo_classes.
History — git history on AST selections¶
from pluckit import Plucker, History
pluck = Plucker(code="src/**/*.py", plugins=[History])
fn = pluck.find(".fn#validate_token")
# Every commit that touched the function's file, most-recent-first
for commit in fn.history():
print(f"{commit.hash[:8]} {commit.author_name}: {commit.message}")
# Distinct authors (email) for those commits
print(fn.authors())
# The function's body as it was at an old revision — AST-aware, so
# it matches by (name, type), not by today's line range.
print(fn.at("v0.1.0")[0])
# Unified diff between HEAD and the old revision, per matched node.
print(fn.diff("v0.1.0")[0])
| Method | Returns | Notes |
|---|---|---|
history() |
list[Commit] |
Deduplicated, sorted by date descending |
authors() |
list[str] (emails) |
Sorted |
at(rev) |
list[str] |
One entry per matched node; "" if not found |
diff(rev) |
list[str] |
Unified diff per matched node |
blame() |
(raises) | Deferred — upstream-blocked on duck_tails |
Dependencies. History requires the duck_tails DuckDB community
extension (for git_read) and the git binary on PATH (for git log
--follow). pluckit auto-installs duck_tails on first use; run
pluckit init to provision eagerly.
Rename handling. history() uses git log --follow, so commits
that touched a file under a previous name are included. at(rev) /
diff(rev) locate the node at the historical revision by name+type,
so a pure rename is tracked as long as the node's name survives.
Structural refactors (a method being pulled out of a class, a
function being split) are not automatically tracked.
Calls — call-graph operations on selections¶
from pluckit import Plucker, Calls
pluck = Plucker(code="src/**/*.py", plugins=[Calls])
# Who calls validate_token?
callers = pluck.find(".fn#validate_token").callers()
print(callers.names())
# What does authenticate call?
callees = pluck.find(".fn#authenticate").callees()
# All references to a name (call sites + bare uses)
refs = pluck.find(".fn#config").references()
| Method | Returns | Description |
|---|---|---|
callers() |
Selection |
Functions that call matched nodes |
callees() |
Selection |
Functions called by matched nodes |
references() |
Selection |
All references to matched nodes |
Dependencies. Calls wraps sitting_duck's ::callers /
::callees / ::references pseudo-elements. No extra extensions
needed.
Scope — scope-aware queries¶
from pluckit import Plucker, Scope
pluck = Plucker(code="src/**/*.py", plugins=[Scope])
# Enclosing scope chain (module → class → function)
scope_chain = pluck.find(".fn#inner").scope()
# Names DEFINED in the scope containing each match
defs = pluck.find(".fn#outer").defs()
# Name REFERENCES within the scope containing each match
refs = pluck.find(".fn#outer").refs()
| Method | Returns | Description |
|---|---|---|
scope() |
Selection |
Enclosing scope hierarchy for each match |
defs() |
Selection |
Definitions in the scope containing each match |
refs() |
Selection |
References in the scope containing each match |
Dependencies. Uses sitting_duck's ::scope pseudo-element and
the scope_id / scope_stack columns on read_ast.
Error handling¶
Every recoverable error raises PluckerError:
from pluckit import Plucker, PluckerError
try:
pluck = Plucker(code="src/**/*.py")
pluck.find(".fn").replaceWith("def broken(:::")
except PluckerError as e:
print(f"Mutation failed: {e}")
# All affected files have already been rolled back to their
# pre-mutation state.
PluckerError is raised for:
- Failed extension installation (
pluckit initwill reproduce this) - Selector compilation errors
- Mutation syntax errors (with automatic rollback)
- Invalid paths, missing files, parse failures