You are a bioinformatics researcher working on analysing DNA sequences. Your task is to write a Python program that can perform various analyses on a given DNA sequence. The program should be able to count nucleotides, find complementary strands, and identify specific patterns within the DNA sequence.
What is DNA?
DNA, or deoxyribonucleic acid, is a molecule that carries the genetic instructions used in the growth, development, functioning, and reproduction of all known living organisms and many viruses. DNA is often referred to as the “blueprint of life” because it contains the information needed to build and maintain an organism.
What is a DNA Sequence Made Of?
A DNA sequence is a chain of nucleotides, which are the basic building blocks of DNA. There are four types of nucleotides bases in DNA with different chemical composition::
- Adenine (A)
- Thymine (T)
- Cytosine (C)
- Guanine (G)
The sequence of these nitrogenous bases along the DNA strand encodes genetic information. The bases pair specifically: Adenine pairs with Thymine (A-T), and Cytosine pairs with Guanine (C-G). This pairing is crucial for the replication and transcription of DNA.
DNA has a double-helix structure, which means it is made up of two long strands that twist around each other like a spiral staircase. Each strand is composed of a sequence of nucleotides, and the two strands are held together by hydrogen bonds between the paired bases.
Python Challenges
Use the tabs below to access 6 Python challenges, all based on using string handling techniques to analyse a DNA sequence/strand.
Task 1: Counting Nucleotides
Write a function called count_nucleotides that takes a DNA sequence as input and returns a dictionary with the counts of each nucleotide (A, T, C, G).
Example:
dna_sequence = "ATGCGATCCATGACAAT" nucleotides = count_nucleotides(dna_sequence) print(nucleotides) # Output: {'A': 6, 'T': 4, 'C': 4, 'G': 3}
Task 2: Finding the Complementary Strand
Write a function called complementary_strand() that takes a DNA sequence as input and returns its complementary strand. In DNA, the complementary base pairs are A-T and C-G.
Example:
dna_sequence = "ATGCGATTCA" complementatry_dna_sequence = complementary_strand(dna_sequence) print(complementatry_dna_sequence) # Output: "TACGCTAAGT"
Task 3: Identifying Patterns
Within the DNA sequence we can identify patterns for the different amino acids that makes up the DNA. Each amino acid can be identified using a sequence of 3 nucleotides (called a codon).
From the diagram below we can see that the codons GTA, GTC, GTT and GTG are all valid codons for the Valine amino acid whereas AGC and AGT are the codons for Serine.
Write a function called find_pattern() that takes a DNA sequence and a pattern (e.g. a three-letter codon) as input, and returns the starting indices of all occurrences of the pattern in the DNA sequence.
Example:
dna_sequence = "ATAGCGATATCGAGCTAC" pattern = "AGC" positions = find_pattern(dna_sequence, pattern) print(positions) # Output: [2, 12]
Task 4: Transcribing DNA to RNA
DNA (Deoxyribonucleic Acid) is a double-stranded molecule with a deoxyribose sugar and the nitrogenous bases Adenine (A), Thymine (T), Cytosine (C), and Guanine (G). It stores genetic information long-term and is primarily found in the nucleus of cells. RNA (Ribonucleic Acid), on the other hand, is typically single-stranded, contains a ribose sugar, and uses Uracil (U) instead of Thymine. RNA is involved in the transmission and expression of genetic information, playing roles in protein synthesis and gene regulation. DNA is more stable, making it suitable for long-term storage, while RNA is less stable and more reactive, fitting its dynamic roles in cellular processes.
Write a function called transcribe_DNA_to_RNA() that takes a DNA sequence as input and returns the corresponding RNA sequence. In RNA, thymine (T) is replaced by uracil (U).
Example:
dna_sequence = "ATGCTAGCT" rna_sequence = transcribe_DNA_to_RNA(dna_sequence)) print(rna_sequence) # Output: "AUGCUAGCU"
Task 5: Calculating GC Content
Write a function called gc_content() that takes a DNA sequence as input and returns the GC content as a percentage. The GC content is the percentage of nucleotides in the DNA sequence that are either G or C.
Example:
dna_sequence = "ATGCGAT" gc = gc_content(dna_sequence) print(gc) # Output: 57.14 %
Python Code
Complete the code for the 6 functions describes above in the following Python IDE:
Here si the expected output for your code:
Expected Output:
DNA Sequence: ATAGCGATCGTAGTTCATAGCTACGTGCGATAGCTCAA Count Nucleotides: {'A': 11, 'T': 10, 'C': 8, 'G': 9} Complementary Strand: TATCGCTAGCATCAAGTATCGATGCACGCTATCGAGTT Find Pattern 'AGC': [2, 18, 31] Transcribe DNA to RNA: AUAGCGAUCGUAGUUCAUAGCUACGUGCGAUAGCUCAA GC Content: 44.73684210526316 %

Solution...
The solution for this challenge is available to full members!Find out how to become a member:
➤ Members' Area