bioquik package

Submodules

bioquik.cli module

bioquik.cli.count(patterns: str = <typer.models.OptionInfo object>, seq_dir: Path = <typer.models.OptionInfo object>, workers: int = <typer.models.OptionInfo object>, out_dir: Path = <typer.models.OptionInfo object>, json_out: bool = <typer.models.OptionInfo object>, plot: bool = <typer.models.OptionInfo object>) → None[source]

bioquik.fasta_worker module

Per-file FASTA motif counter (intended for parallel use via concurrent.futures).

bioquik.fasta_worker.process_fasta_file(fasta_path: str | PathLike, pattern_to_motifs: dict[str, list[str]], *, out_dir: str | PathLike = 'bioquik_results') → str[source]

Count motifs in fasta_path and save CSV → out_dir.

Returns the output CSV filepath.

bioquik.fmindex module

FM-index backed by pydivsufsort and WaveletTree.

class bioquik.fmindex.FMIndex(seq: str, *, sa_sample_rate: int = 32)[source]

Bases: object

Succinct FM-index supporting count and locate queries.

C

alphabet

bwt

count(pattern: bytes) → int[source]: Return the number of occurrences of pattern in seq.

locate(pattern: bytes) → List[int][source]: Return all start positions of pattern (0-based).

sa_sample_rate

sa_samples: Dict[int, int]

seq

seq_b

wt

bioquik.motifs module

Pattern → motif expansion utilities.

bioquik.motifs.build_pattern_to_motifs(patterns: list[str]) → Dict[str, List[str]][source]: Return mapping pattern→[motifs] using N as wildcard stand-in.

bioquik.motifs.generate_motifs(patterns: list[str])[source]: Expand wildcard * patterns into concrete motifs (deduplicated, sorted).

bioquik.plotter module

bioquik.plotter.plot_distribution(df: DataFrame, out_dir: Path) → None[source]: Bar chart of total counts per motif.

bioquik.plotter.plot_heatmap(df: DataFrame, out_dir: Path) → None[source]: Heatmap of motif counts by file.

bioquik.processor module

bioquik.processor.run_count(pattern_list: List[str], seq_dir: Path, out_dir: Path, workers: int) → None[source]: Expand patterns, then process every .fasta in parallel.

bioquik.reports module

bioquik.reports.combine_csv(out_dir: Path) → DataFrame[source]: Read all *_motif_counts.csv in out_dir and concatenate in sorted order.

bioquik.reports.write_summary(df: DataFrame, out_dir: Path, json_out: bool = False) → None[source]: Always write combined_counts.csv. Optionally write JSON summary.

bioquik.validate module

bioquik.validate.validate_dir(path: Path, name: str) → None[source]: Ensure that path exists and is a directory.

bioquik.validate.validate_patterns(patterns: str) → List[str][source]: Ensure at least one pattern includes ‘CG’ and split into a list.

bioquik.wavelettree module

Succinct binary-recursive wavelet tree with rank-support.

class bioquik.wavelettree.WaveletTree(data: bytes, alphabet: bytes, *, sample_rate: int = 32)[source]

Bases: object

Wavelet tree over a bytes sequence providing rank1 queries.

The implementation stores a bitvector at each internal node and samples prefix ranks every sample_rate bits (default = 32) for O(1) rank queries. Memory usage ≈ n log σ bits.

alphabet

bitvec

left

length

prefix_ranks

rank(symbol: int, i: int) → int[source]: Return #(symbol) in [0, i).

right

sample_rate

Module contents

Top-level package for bioquik.

class bioquik.FMIndex(seq: str, *, sa_sample_rate: int = 32)[source]

Bases: object

Succinct FM-index supporting count and locate queries.

C

alphabet

bwt

count(pattern: bytes) → int[source]: Return the number of occurrences of pattern in seq.

locate(pattern: bytes) → List[int][source]: Return all start positions of pattern (0-based).

sa_sample_rate

sa_samples: Dict[int, int]

seq

seq_b

wt

class bioquik.WaveletTree(data: bytes, alphabet: bytes, *, sample_rate: int = 32)[source]

Bases: object

Wavelet tree over a bytes sequence providing rank1 queries.

The implementation stores a bitvector at each internal node and samples prefix ranks every sample_rate bits (default = 32) for O(1) rank queries. Memory usage ≈ n log σ bits.

alphabet

bitvec

left

length

prefix_ranks

rank(symbol: int, i: int) → int[source]: Return #(symbol) in [0, i).

right

sample_rate

bioquik.build_pattern_to_motifs(patterns: list[str]) → Dict[str, List[str]][source]: Return mapping pattern→[motifs] using N as wildcard stand-in.

bioquik.generate_motifs(patterns: list[str])[source]: Expand wildcard * patterns into concrete motifs (deduplicated, sorted).

bioquik.process_fasta_file(fasta_path: str | PathLike, pattern_to_motifs: dict[str, list[str]], *, out_dir: str | PathLike = 'bioquik_results') → str[source]

Count motifs in fasta_path and save CSV → out_dir.

Returns the output CSV filepath.