Sequence Analysis
AI Scientist provides a suite of sequence analysis tools that help you characterize DNA and protein sequences before and after cloning. These analyses run directly on sequences in your current session, producing interactive visualizations and downloadable results.
Restriction Enzyme Analysis
Section titled “Restriction Enzyme Analysis”Search for restriction enzyme cut sites within any sequence. You can:
- Specify individual enzymes (e.g., EcoRI, BamHI, NotI) to check for specific sites
- Use predefined enzyme sets (e.g., common cloning enzymes, Type IIS enzymes) to screen for compatible sites
- Identify sites that need to be removed for domestication workflows (e.g., Golden Gate assembly)
Results show the exact cut positions, overhang sequences, and fragment sizes produced by each enzyme.
GC Content Analysis
Section titled “GC Content Analysis”Analyze the overall GC percentage and visualize the distribution across the sequence using a sliding window plot. This is useful for:
- Identifying GC-rich or AT-rich regions that may affect cloning, sequencing, or expression
- Evaluating codon-optimized sequences for balanced GC distribution
- Spotting potential issues for gene synthesis (extreme GC regions are often flagged by synthesis providers)
ORF Prediction
Section titled “ORF Prediction”Perform a six-frame search for open reading frames across the entire sequence. ORFs are sorted by length and displayed with their start/stop codon positions, frame, and strand. This helps you verify that your intended CDS is the dominant ORF and detect any unintended reading frames.
Codon Usage Analysis
Section titled “Codon Usage Analysis”Calculate codon frequency statistics for any coding sequence. Compare the codon usage profile against the target host’s preferred codon table to identify rare codons that may limit translation efficiency.
Sequence Complexity Analysis
Section titled “Sequence Complexity Analysis”Detect sequence features that could cause problems during cloning or synthesis:
- Repeat sequences — direct and inverted repeats that may cause recombination
- Homopolymer runs — long stretches of a single nucleotide that challenge sequencing accuracy
Translation
Section titled “Translation”Translate DNA sequences to protein in all six reading frames or in a specified frame. This is useful for verifying that your construct encodes the intended protein after modifications such as codon optimization, mutations, or assembly.