Skip to content

Sequence Analysis

AI Scientist provides a suite of sequence analysis tools that help you characterize DNA and protein sequences before and after cloning. These analyses run directly on sequences in your current session, producing interactive visualizations and downloadable results.

Search for restriction enzyme cut sites within any sequence. You can:

  • Specify individual enzymes (e.g., EcoRI, BamHI, NotI) to check for specific sites
  • Use predefined enzyme sets (e.g., common cloning enzymes, Type IIS enzymes) to screen for compatible sites
  • Identify sites that need to be removed for domestication workflows (e.g., Golden Gate assembly)

Results show the exact cut positions, overhang sequences, and fragment sizes produced by each enzyme.

Analyze the overall GC percentage and visualize the distribution across the sequence using a sliding window plot. This is useful for:

  • Identifying GC-rich or AT-rich regions that may affect cloning, sequencing, or expression
  • Evaluating codon-optimized sequences for balanced GC distribution
  • Spotting potential issues for gene synthesis (extreme GC regions are often flagged by synthesis providers)

Perform a six-frame search for open reading frames across the entire sequence. ORFs are sorted by length and displayed with their start/stop codon positions, frame, and strand. This helps you verify that your intended CDS is the dominant ORF and detect any unintended reading frames.

Calculate codon frequency statistics for any coding sequence. Compare the codon usage profile against the target host’s preferred codon table to identify rare codons that may limit translation efficiency.

Detect sequence features that could cause problems during cloning or synthesis:

  • Repeat sequences — direct and inverted repeats that may cause recombination
  • Homopolymer runs — long stretches of a single nucleotide that challenge sequencing accuracy

Translate DNA sequences to protein in all six reading frames or in a specified frame. This is useful for verifying that your construct encodes the intended protein after modifications such as codon optimization, mutations, or assembly.