bioquik package

Submodules

bioquik.cli module

bioquik.cli.count(patterns: str = <typer.models.OptionInfo object>, seq_dir: Path = <typer.models.OptionInfo object>, workers: int = <typer.models.OptionInfo object>, out_dir: Path = <typer.models.OptionInfo object>, json_out: bool = <typer.models.OptionInfo object>, plot: bool = <typer.models.OptionInfo object>) None[source]

bioquik.fasta_worker module

Per-file FASTA motif counter (intended for parallel use via concurrent.futures).

bioquik.fasta_worker.process_fasta_file(fasta_path: str | PathLike, pattern_to_motifs: dict[str, list[str]], *, out_dir: str | PathLike = 'bioquik_results') str[source]

Count motifs in fasta_path and save CSV → out_dir.

Returns the output CSV filepath.

bioquik.fmindex module

FM-index backed by pydivsufsort and WaveletTree.

class bioquik.fmindex.FMIndex(seq: str, *, sa_sample_rate: int = 32)[source]

Bases: object

Succinct FM-index supporting count and locate queries.

C
alphabet
bwt
count(pattern: bytes) int[source]

Return the number of occurrences of pattern in seq.

locate(pattern: bytes) List[int][source]

Return all start positions of pattern (0-based).

sa_sample_rate
sa_samples: Dict[int, int]
seq
seq_b
wt

bioquik.motifs module

Pattern → motif expansion utilities.

bioquik.motifs.build_pattern_to_motifs(patterns: list[str]) Dict[str, List[str]][source]

Return mapping pattern→[motifs] using N as wildcard stand-in.

bioquik.motifs.generate_motifs(patterns: list[str])[source]

Expand wildcard * patterns into concrete motifs (deduplicated, sorted).

bioquik.plotter module

bioquik.plotter.plot_distribution(df: DataFrame, out_dir: Path) None[source]

Bar chart of total counts per motif.

bioquik.plotter.plot_heatmap(df: DataFrame, out_dir: Path) None[source]

Heatmap of motif counts by file.

bioquik.processor module

bioquik.processor.run_count(pattern_list: List[str], seq_dir: Path, out_dir: Path, workers: int) None[source]

Expand patterns, then process every .fasta in parallel.

bioquik.reports module

bioquik.reports.combine_csv(out_dir: Path) DataFrame[source]

Read all *_motif_counts.csv in out_dir and concatenate in sorted order.

bioquik.reports.write_summary(df: DataFrame, out_dir: Path, json_out: bool = False) None[source]

Always write combined_counts.csv. Optionally write JSON summary.

bioquik.validate module

bioquik.validate.validate_dir(path: Path, name: str) None[source]

Ensure that path exists and is a directory.

bioquik.validate.validate_patterns(patterns: str) List[str][source]

Ensure at least one pattern includes ‘CG’ and split into a list.

bioquik.wavelettree module

Succinct binary-recursive wavelet tree with rank-support.

class bioquik.wavelettree.WaveletTree(data: bytes, alphabet: bytes, *, sample_rate: int = 32)[source]

Bases: object

Wavelet tree over a bytes sequence providing rank1 queries.

The implementation stores a bitvector at each internal node and samples prefix ranks every sample_rate bits (default = 32) for O(1) rank queries. Memory usage ≈ n log σ bits.

alphabet
bitvec
left
length
prefix_ranks
rank(symbol: int, i: int) int[source]

Return #(symbol) in [0, i).

right
sample_rate

Module contents

Top-level package for bioquik.

class bioquik.FMIndex(seq: str, *, sa_sample_rate: int = 32)[source]

Bases: object

Succinct FM-index supporting count and locate queries.

C
alphabet
bwt
count(pattern: bytes) int[source]

Return the number of occurrences of pattern in seq.

locate(pattern: bytes) List[int][source]

Return all start positions of pattern (0-based).

sa_sample_rate
sa_samples: Dict[int, int]
seq
seq_b
wt
class bioquik.WaveletTree(data: bytes, alphabet: bytes, *, sample_rate: int = 32)[source]

Bases: object

Wavelet tree over a bytes sequence providing rank1 queries.

The implementation stores a bitvector at each internal node and samples prefix ranks every sample_rate bits (default = 32) for O(1) rank queries. Memory usage ≈ n log σ bits.

alphabet
bitvec
left
length
prefix_ranks
rank(symbol: int, i: int) int[source]

Return #(symbol) in [0, i).

right
sample_rate
bioquik.build_pattern_to_motifs(patterns: list[str]) Dict[str, List[str]][source]

Return mapping pattern→[motifs] using N as wildcard stand-in.

bioquik.generate_motifs(patterns: list[str])[source]

Expand wildcard * patterns into concrete motifs (deduplicated, sorted).

bioquik.process_fasta_file(fasta_path: str | PathLike, pattern_to_motifs: dict[str, list[str]], *, out_dir: str | PathLike = 'bioquik_results') str[source]

Count motifs in fasta_path and save CSV → out_dir.

Returns the output CSV filepath.