Skip to content

Python API

pluckit's Python API is built around three types: Plucker (the entry point), Selection (a lazy query chain), and Pluckin (the extension point). Everything else is either a mutation class or a convenience wrapper around these.

from pluckit import Plucker, AstViewer

pluck = Plucker(code="src/**/*.py", plugins=[AstViewer])

Plucker

The entry point. Wraps a DuckDB connection, loads the sitting_duck extension on first use, and exposes methods for finding, viewing, and mutating code.

Constructor

Plucker(
    code: str | list[str] | None = None,
    *,
    plugins: list[Pluckin | type[Pluckin]] = (),
    repo: str | None = None,
    db: duckdb.DuckDBPyConnection | None = None,
    cache: bool | str = False,
)
Parameter Description
code Glob pattern(s) or explicit file list for the source corpus
plugins Pluckin classes or instances to register
repo Repository root for relative paths (default: current working directory)
db An existing DuckDB connection to reuse (default: create a fresh one)
cache Persistent AST cache. True.pluckit.duckdb in repo root. str → custom path.

Persistent AST caching. When cache=True, pluckit opens a persistent DuckDB file (.pluckit.duckdb by default) and materializes read_ast output into per-pattern tables. Subsequent queries against the same pattern skip re-parsing. File-stat mtime checks drive incremental invalidation — only modified files are re-parsed. See [tool.pluckit] cache = true in pyproject.toml for the config path.

Methods

find(selector: str) -> Selection

Run a selector against the configured code corpus and return a lazy Selection:

fns = pluck.find(".fn:exported")

view(selector: str, *, format: str = "markdown") -> str

Render matched nodes as markdown (the AstViewer plugin must be registered). Returns a :class:View object — see below.

print(pluck.view(".fn#main { show: signature; }"))

source(glob: str) -> Source

Create a Source handle for ad-hoc queries against a different glob without creating a whole new Plucker.

fts_collection(name: str) -> FtsCollection

Get a handle to a named FTS collection. Requires fledgling. Returns an FtsCollection with .create(query) and .search(query) methods:

col = pluck.fts_collection("tools")
col.create("""
    SELECT 'search' AS id,
           'full-text BM25 search over code and docs' AS text,
           map{'kit': 'fledgling'} AS metadata
""")
results = col.search("search")
for id, text, metadata, score in results:
    print(f"{score:.2f}  {id}: {text}")

Each collection gets its own BM25 index with independent IDF statistics. The fixed schema is (id TEXT, text TEXT, metadata MAP(TEXT, TEXT)). Creating a collection is idempotent — it replaces the existing table and index if the collection already exists.

pluckins (property) -> list[Pluckin]

The loaded pluckin instances for this Plucker, in load order. Public since 0.13.0 (replaces reaching into the private _registry). Use it to introspect which plugins are active or to pull tool definitions a pluckin contributes:

for p in pluck.pluckins:
    print(p.name)
    # e.g. a pluckin may expose squackit_tools for MCP integration

Selector

A validated, serializable CSS-over-AST selector string. Subclasses str so it's backward-compatible everywhere a bare selector string is used today — all existing code like pluck.find(".fn:exported") keeps working.

from pluckit import Selector

s = Selector(".fn:exported")
assert isinstance(s, str)
assert s.is_valid

# Invalid selector resolving to nothing
Selector(".nonexistent_taxonomy_class").validate()  # raises PluckerError

Supports the standard serialization protocol:

Method Purpose
.to_dict() / .from_dict(d) {"selector": "..."} dict form
.to_json() / .from_json(s) JSON string
.to_argv() / .from_argv(tokens) CLI token list
.validate() / .is_valid Compile-time check

Selection

A lazy DuckDB relation. Every method on Selection returns another Selection — nothing materializes until you call a terminal method.

Query composition

# Refine a selection
tests = pluck.find(".fn[name^=test_]")
without_try = tests.filter(".fn:not(:has(.try))")

# Navigate
classes = pluck.find(".cls")
methods = classes.descendants(".fn")
Method Description
find(sel) Refine the selection with another selector
filter(sel) Alias for find; semantic clarity
descendants(sel) Matches anywhere under the selection
children(sel) Direct children only
ancestors(sel) Walk up the AST
siblings(sel) Nodes sharing a parent
first(), last(), nth(n) Positional selection
limit(n), offset(n) Slice the result set

Terminal methods

These materialize the relation and return Python data:

Method Returns Description
count() int Number of matched nodes
names() list[str] Identifier names (deduplicated)
files() list[str] Distinct source files containing matches
rows() list[Node] Full AST rows with all sitting_duck cols
read() list[str] Raw source text of each matched node
to_df() pd.DataFrame Pandas DataFrame (requires pandas)

Mutation methods

Every mutation method returns a refreshed Selection (so you can chain further queries, though most callers don't). All mutations are transactional at the invocation level — the enclosing call is atomic, and multiple fluent mutations are independent transactions.

Method Description
replaceWith(text) Replace entire matched node
replaceWith(old, new) String-level replace within matched node
prepend(text) Prepend lines to the matched body
append(text) Append lines to the matched body
insertBefore(anchor, text) Insert lines before an anchor selector
insertAfter(anchor, text) Insert lines after an anchor selector
wrap(before, after) Wrap with surrounding text
unwrap() Inverse of wrap
addParam(param) Add a parameter to every matched function
removeParam(name) Remove a parameter by name
addArg(expr) Add an argument to every matched call
removeArg(name) Remove a keyword argument by name
rename(new_name) Rename the first name occurrence
clearBody() Replace body with pass / {}
remove() Delete the matched node
patch(content) Apply a unified diff or raw replacement

Example:

pluck.find(".fn#validate_token").replaceWith(
    "return None",
    "raise ValueError('token required')",
)
pluck.find(".fn:exported").addParam("timeout: int = 30")

# Apply a unified diff
diff_content = open("refactor.patch").read()
pluck.find(".fn#handler").patch(diff_content)

# Apply raw replacement text (like replaceWith, but from external content)
new_code = open("patches/new_handler.py").read()
pluck.find(".fn#handler").patch(new_code)

patch(content) auto-detects unified diffs (by leading --- or diff --git) vs raw replacement text. For diffs, context lines must match exactly or a PluckerError is raised.

Reading matched source

for node in pluck.find(".fn#validate").rows():
    print(f"{node.file_path}:{node.start_line}")
    print(node.source_text)

Selection.rows() returns Node dataclasses with all of sitting_duck's columns — node_id, type, semantic_type, name, start_line, end_line, parent_id, flags, and the native extraction columns (signature_type, parameters, modifiers, annotations).


Module-level shortcuts

For one-shot queries you don't need a persistent Plucker for:

from pluckit import view, find

print(view(".fn#main { show: outline; }", code="src/**/*.py"))

for path, line, name in find(".fn:exported", code="src/**/*.py"):
    print(f"{path}:{line}:{name}")

These create an ephemeral Plucker, run the query, and tear it down.


View and ViewBlock

Plucker.view() and the module-level pluckit.view() return a View object — not a plain string. A View behaves like a string for the common "print the rendered markdown" case, but also exposes structured metadata about the blocks it contains.

from pluckit import Plucker, AstViewer, View, ViewBlock

pluck = Plucker(code="src/**/*.py", plugins=[AstViewer])
result: View = pluck.view(".fn:exported { show: signature; }")

# Rendered output — backward compatible with the v0.1 bare-string return
print(result)                    # prints the markdown
print(str(result))               # same thing
print(result.markdown)           # explicit accessor
assert "def authenticate" in result   # __contains__ checks the markdown

# Structured access
print(result.files)              # ['src/auth.py', 'src/users.py', ...]
print(len(result))               # number of blocks
for block in result:             # iterate as ViewBlock
    print(block.name, block.start_line, block.show)

# JSON export
import json
print(json.dumps(result.to_dict(), indent=2))

View methods and properties

Member Type Description
markdown str Full rendered output
blocks list[ViewBlock] Fresh list of contained blocks
files list[str] Distinct file paths, in first-seen order
query str The query string that produced this view
format str Output format (markdown in v0.1)
to_dict() dict JSON-serializable representation
str(v) / print str Same as .markdown
len(v) int Number of blocks
bool(v) bool False for empty views
for b in v Iterator[ViewBlock] Iterate blocks in render order
v[i] / v[a:b] ViewBlock / list Indexing and slicing
"s" in v bool Substring check against .markdown

ViewBlock fields

Each ViewBlock is a frozen dataclass with:

Field Type Description
markdown str Rendered content for this block
rule Rule The query rule that produced it
show str Resolved show mode (body, signature, …)
file_path str \| None Source file — None for aggregates
start_line int \| None Start line — None for aggregates
end_line int \| None End line — None for aggregates
name str \| None Identifier name, if any
node_type str \| None AST node type (function_definition, …)
language str \| None Source language
is_aggregate bool True for multi-match signature tables and such

Aggregate blocks. When a rule like .fn { show: signature; } matches many nodes, the viewer auto-collapses the output into a single markdown table. That collapse produces a single ViewBlock with is_aggregate = True and file_path, start_line, end_line all None. Use block.is_aggregate (or block.file_path is None) to distinguish per-node blocks from aggregates.


Isolated

A scope-aware extraction of a code block with its dependencies. Returned by Selection.isolate(). Identifies which identifiers the block reads from outside its own scope, classifies each as imported / parameter / builtin, and renders the result as a standalone function or a Jupyter cell.

from pluckit import Plucker

pluck = Plucker(code="src/**/*.py")
iso = pluck.find(".fn#outer").isolate()

iso.params         # ['helper']           — free variables → function params
iso.imports        # ['import json']      — import statements to prepend
iso.builtins_used  # ['len']              — builtins used (informational)
iso.body           # original block text

print(iso.as_function("extracted"))   # imports + def extracted(helper): ...
print(iso.as_jupyter_cell())          # imports + "# Required in scope: helper" + body

Fields

Field Type Description
body str Source text of the extracted block
file_path str Original source file
start_line int Start line (1-indexed)
end_line int End line (1-indexed, inclusive)
language str Source language (e.g., "python")
params list[str] Free-variable names → function parameters
imports list[str] Import statements the block depends on
builtins_used list[str] Python builtins the block uses

Renderers

  • as_function(name="extracted") — standalone function: imports + def name(params) + body
  • as_jupyter_cell() — imports + # Required in scope: ... comment + inline body (no function wrap)

Serialization

to_dict / from_dict / to_json / from_json for MCP transport.

Limitations (v1)

  • Handles the first match only; for multi-match selections, iterate calls with .limit(1) narrowing
  • Detects module-level imports but not conditional / relative imports in edge cases
  • Assumes Python semantics for builtins (dir(builtins) + self/cls)

Chain

The Chain class is the programmatic equivalent of the CLI's chain syntax. It represents a source, a list of steps, and optional plugin configuration. Chains can be built from Python dicts, JSON strings, or parsed directly from sys.argv-style token lists.

ChainStep

A single operation in a chain:

from pluckit.chain import ChainStep

step = ChainStep(op="find", args=[".fn:exported"])
step = ChainStep(op="filter", kwargs={"min_lines": 10})
step = ChainStep(op="count")
Field Type Description
op str Operation name (e.g. find, count)
args list[str] Positional arguments (default: [])
kwargs dict Keyword arguments (default: {})

Chain

from pluckit.chain import Chain, ChainStep

chain = Chain(
    source=["src/**/*.py"],
    steps=[
        ChainStep(op="find", args=[".fn:exported"]),
        ChainStep(op="count"),
    ],
    plugins=["AstViewer"],
)
Field Type Description
source list[str] File paths or glob patterns
steps list[ChainStep] Ordered list of operations
plugins list[str] Plugin names to load (default: [])
dry_run bool Preview changes without writing (default: False)
diff bool Output mutations as unified diff (default: False)

Chain.MUTATION_OPS (class attribute) -> frozenset[str]

The public, stable set of operation names that mutate source (public since 0.13.0; was _MUTATION_OPS). A chain containing any of these is a mutating chain — callers that gate writes (e.g. squackit blocks mutations unless allow_mutations=True) check membership rather than hard-coding the list:

from pluckit.chain import Chain

is_mutation = any(step.op in Chain.MUTATION_OPS for step in chain.steps)

The set: wrap, unwrap, append, prepend, insertBefore, insertAfter, replaceWith, remove, rename, patch, addArg, removeArg, addParam, removeParam.

Construction methods

Chain.from_dict(data: dict) -> Chain

Build a chain from a dictionary (the same structure as the JSON I/O format described in the CLI reference):

chain = Chain.from_dict({
    "source": ["src/**/*.py"],
    "steps": [
        {"op": "find", "args": [".fn:exported"]},
        {"op": "count"},
    ],
})

Chain.from_json(json_string: str) -> Chain

Parse a JSON string into a chain:

chain = Chain.from_json('{"source": ["src/**/*.py"], "steps": [{"op": "find", "args": [".fn:exported"]}, {"op": "count"}]}')

Chain.from_argv(tokens: list[str]) -> Chain

Parse a CLI-style token list into a chain. This is the same parsing the CLI entry point uses:

chain = Chain.from_argv(["src/**/*.py", "find", ".fn:exported", "count"])

Execution

chain.evaluate() -> Any

Run the chain and return the result. The return type depends on the terminal operation: int for count, list[str] for names, and so on.

chain = Chain.from_argv(["src/**/*.py", "find", ".fn:exported", "count"])
result = chain.evaluate()
print(result)  # e.g. 42

Serialization

chain.to_dict() -> dict

Convert the chain to a JSON-serializable dictionary:

data = chain.to_dict()
# {"source": ["src/**/*.py"], "plugins": [], "steps": [{"op": "find", "args": [".fn:exported"]}, {"op": "count"}]}

chain.to_json() -> str

Serialize the chain as a JSON string:

json_str = chain.to_json()

Pagination

Chains support limit, offset, and page as ordinary chain ops. When any of them appear in a chain, evaluate() attaches pagination metadata to the result:

chain = Chain(
    source=["src/**/*.py"],
    steps=[
        ChainStep(op="find", args=[".fn"]),
        ChainStep(op="page", args=["0", "20"]),  # page 0, size 20
        ChainStep(op="names"),
    ],
)
result = chain.evaluate()

result["page"]
# {
#   "offset": 0,
#   "limit": 20,
#   "total": None,        # lazy — call with_total() to fill in
#   "has_more": True,     # heuristic — True if data length >= limit
# }
result["source_chain"]    # the chain with pagination ops stripped — for "give me more"

has_more heuristic

  • data_length < limit → definitively False (got fewer than asked — no more)
  • data_length >= limit → conservatively True (might be the last page, but we can't know without total)
  • limit is Nonehas_more is None (unknown)

Chain.with_total(result) — compute the exact total on demand

Chain.with_total(result)  # mutates result in place, returns it
result["page"]["total"]   # now an int
result["page"]["has_more"] # now exact

Runs one extra SQL query against the source_chain. No-op if the result has no pagination metadata.

Each returns a new Chain ready to evaluate, or None when navigation isn't possible (no more pages / already at offset 0 / result wasn't paginated).

Method Returns
Chain.next_page(result) Chain for the next page (or None)
Chain.prev_page(result) Chain for the previous page (or None)
Chain.goto_page(result, n) Chain for page n (0-indexed)
result = chain.evaluate()
if next_chain := Chain.next_page(result):
    next_result = next_chain.evaluate()

Edge cases

  • page N SIZE + subsequent limit/offsetpage sets both offset and limit; a later limit or offset overrides the corresponding value. Well-defined but confusing — use one pattern or the other, not both.
  • limit before a mutationfind .fn limit 5 rename bar renames only the first 5 functions. The Selection contains 5 rows at mutation time, so the mutation applies to those 5. Correct but may surprise callers who expected limit to apply only to terminal output.

Round-trip example

from pluckit.chain import Chain

# Build from CLI tokens
chain = Chain.from_argv(["src/**/*.py", "find", ".fn:exported", "names"])

# Inspect as JSON
print(chain.to_json())

# Reconstruct from the dict form
chain2 = Chain.from_dict(chain.to_dict())

# Execute
result = chain2.evaluate()
for name in result:
    print(name)

Plugins

pluckit is composable. Core capabilities live on Selection; anything that depends on extra infrastructure moves into a plugin.

from pluckit import Plucker, AstViewer, Calls, History, Scope

pluck = Plucker(
    code="src/**/*.py",
    plugins=[
        AstViewer,   # viewer with { show: ... } declarations
        Calls,       # call graph (callers / callees / references)
        History,     # git history via duck_tails
        Scope,       # scope-aware queries (defs / refs / enclosing scope)
    ],
)

Writing a plugin

A plugin is a subclass of pluckit.pluckins.Pluckin:

from pluckit.pluckins import Pluckin

class WordCount(Pluckin):
    name = "wordcount"

    methods = {
        "word_count": lambda self: sum(
            len(text.split()) for text in self.read()
        ),
    }

    pseudo_classes = {
        ":long": "end_line - start_line > 50",
    }
Class attribute Purpose
name Unique plugin identifier
methods Dict of method name → function to install on Selection
pseudo_classes Dict of :name → SQL WHERE fragment
upgrades Dict of method name → function to override an existing method
setup(ctx) Optional hook called when the plugin is registered

Plugins can also register new semantic-type aliases by updating pluckit.selectors.ALIASES, but that's considered advanced — most plugins only need methods and pseudo_classes.

History — git history on AST selections

from pluckit import Plucker, History

pluck = Plucker(code="src/**/*.py", plugins=[History])
fn = pluck.find(".fn#validate_token")

# Every commit that touched the function's file, most-recent-first
for commit in fn.history():
    print(f"{commit.hash[:8]} {commit.author_name}: {commit.message}")

# Distinct authors (email) for those commits
print(fn.authors())

# The function's body as it was at an old revision — AST-aware, so
# it matches by (name, type), not by today's line range.
print(fn.at("v0.1.0")[0])

# Unified diff between HEAD and the old revision, per matched node.
print(fn.diff("v0.1.0")[0])
Method Returns Notes
history() list[Commit] Deduplicated, sorted by date descending
authors() list[str] (emails) Sorted
at(rev) list[str] One entry per matched node; "" if not found
diff(rev) list[str] Unified diff per matched node
blame() (raises) Deferred — upstream-blocked on duck_tails

Dependencies. History requires the duck_tails DuckDB community extension (for git_read) and the git binary on PATH (for git log --follow). pluckit auto-installs duck_tails on first use; run pluckit init to provision eagerly.

Rename handling. history() uses git log --follow, so commits that touched a file under a previous name are included. at(rev) / diff(rev) locate the node at the historical revision by name+type, so a pure rename is tracked as long as the node's name survives. Structural refactors (a method being pulled out of a class, a function being split) are not automatically tracked.

Calls — call-graph operations on selections

from pluckit import Plucker, Calls

pluck = Plucker(code="src/**/*.py", plugins=[Calls])

# Who calls validate_token?
callers = pluck.find(".fn#validate_token").callers()
print(callers.names())

# What does authenticate call?
callees = pluck.find(".fn#authenticate").callees()

# All references to a name (call sites + bare uses)
refs = pluck.find(".fn#config").references()
Method Returns Description
callers() Selection Functions that call matched nodes
callees() Selection Functions called by matched nodes
references() Selection All references to matched nodes

Dependencies. Calls wraps sitting_duck's ::callers / ::callees / ::references pseudo-elements. No extra extensions needed.

Scope — scope-aware queries

from pluckit import Plucker, Scope

pluck = Plucker(code="src/**/*.py", plugins=[Scope])

# Enclosing scope chain (module → class → function)
scope_chain = pluck.find(".fn#inner").scope()

# Names DEFINED in the scope containing each match
defs = pluck.find(".fn#outer").defs()

# Name REFERENCES within the scope containing each match
refs = pluck.find(".fn#outer").refs()
Method Returns Description
scope() Selection Enclosing scope hierarchy for each match
defs() Selection Definitions in the scope containing each match
refs() Selection References in the scope containing each match

Dependencies. Uses sitting_duck's ::scope pseudo-element and the scope_id / scope_stack columns on read_ast.


Error handling

Every recoverable error raises PluckerError:

from pluckit import Plucker, PluckerError

try:
    pluck = Plucker(code="src/**/*.py")
    pluck.find(".fn").replaceWith("def broken(:::")
except PluckerError as e:
    print(f"Mutation failed: {e}")
    # All affected files have already been rolled back to their
    # pre-mutation state.

PluckerError is raised for:

  • Failed extension installation (pluckit init will reproduce this)
  • Selector compilation errors
  • Mutation syntax errors (with automatic rollback)
  • Invalid paths, missing files, parse failures