Biosim lab tips

Generate ssh key for UB CCR

To use the secure shell (SSH) to connect to CCR’s login servers, you will need to generate a new SSH key on your local machine:

ssh-keygen -t ed25519 -C "email@buffalo.edu"

The -t option specifies which type of secure key to generate. ed25519 refers to speicifc elliptic curve algorithm that you can read more about here. The -C option expects you to add a comment to the key, here you should put your Buffalo email address.

Here is an example: image

At the end, in the ~/.ssh directory, you should find the public key and the private key. In my case, I have testing.pub as my public key and testing as my private key Copy the content of the public key to the CCR. Wait for approval and then you can login to CCR using ssh. You should do this process for every machine that you wish to connect to CCR from.

Here is an example of my CCR account, I have three machine registered: image

References: Generate New SSH Key

Login to UB CCR and nevigate to our group folder

Now you have uploaded your public key to CCR and your matching private key in the .ssh folder, you can fire up a terminal window and log into CCR with the command below!


ssh -i your-key-name your-username@vortex-future.ccr.buffalo.edu -o ServerAliveInterval=60

The -o ServerAliveInterval=60 option sends a keepalive message to the server every 60 seconds, prevent you from disconnected when idel.

I recommend to add the ssh command to your bash resource file .bashrc or .zshrc if you are on Mac. Nevigate to the file by typing: vim ~/.bashrc, add alias sshccr=ssh -i your-key-name your-username@vortex-future.ccr.buffalo.edu -o ServerAliveInterval=60', then activate it by typing source ~/.bashrc or restart the terminal. Now you can log into CCR by typing sshccr instead of the whole command. Be sure to not delete or change anything else in your bash resource file.

To get to our group project directory:

cd /projects/academic/nguyenh

To get to our scratch directory:

cd /vscratch/grp-nguyenh

Trajectory Alignment using MD Analysis

Sometimes, you might need to align a trajectory to a reference frame and save it for analysis. This can come in handy when calculating things like root-mean-sqaure deviation (RMSD) or root-mean-sqaure fluctuation (RMSF). Here, I’ve shared a script that aligns every frame of an RNA trajectory to its center of mass to make these analyses easier.

import os
import numpy as np
import MDAnalysis as mda
from MDAnalysis import transformations as trans
from MDAnalysis.analysis import align

def align_COM(directory, pdb, dcd)
    os.chdir(directory)
    
    # Create an Universe with the trajectory(dcd) and topology file(pdb)
    u = mda.Universe(pdb, dcd)
    
    # Define your box size (here all edge is 720 angstrom with 90 degree angles)
    dim = np.array([720, 720, 720, 90, 90, 90])
    
    # Select the molecule you want to align (I am working with RNA, so Adenine in my case, you can print out your selection to see if you are selecting your intended molecule)
    ADE = u.select_atoms('resname ADE')

    # Define your workflow, here I set my box dimension, unwrap all atoms, center ADE's (I defined earlier) center of mass to the center of box, then I wrap all the atoms.
    workflow = [trans.boxdimensions.set_dimensions(dim),
                trans.unwrap(u.atoms),
                trans.center_in_box(ADE, center='mass'),
                trans.wrap(u.atoms)]
    
    # use add_transformations to apply the workflow you defined earlier to the trajectory
    u.trajectory.add_transformations(*workflow)

    # now your trajectory is aligned, to save it I use the align.AlignTraj function. Here I save my new trajectory as "md-align-wrap.dcd" 
    align.AlignTraj(u, u, select='resname ADE', filename="md-align-wrap.dcd", match_atoms=True).run()

References: Aligning a trajectory to a reference, On-the-fly transformations, Centering a trajectory in the box

Alignment to principal axes

For flexible molecules, it can be useful to align the molecule to its principal axes. I have used this technique to visualize the ion distribution surrounding the flexible single stranded RNA. I do this by utilizing the orient package in VMD, and save the principally aligned trajectory for analysis.

  1. Download: orient and la. You will need la, the linear algebra package to run orient.

  2. unpack the package:

gunzip orient.tar.gz
tar -xf orient.tar

gunzip la101psx.tar.gz
tar -xf la101psx.tar
  1. Open VMD and go to VMD TkConsole:

Go to la1.0 folder you got from previous step

cd la1.0
source la.tcl

Go to orient folder you got from previous step

cd orient
source orient.tcl
  1. I provide the script I have to align the principal axes of my RNA to the z-axis of the box
namespace import Orient::orient
# Get the number of frames in the trajectory
set num_frames [molinfo top get numframes]

# Select all atoms
set sys [atomselect top "all"]

# Set the molecule you are working with, here I select everything that is not Magnesium
set rA [atomselect top "all not name Mg"]

# Iterate over each frame in the trajectory
for {set i 0} {$i < $num_frames} {incr i} {
    # Set the frame
    animate goto $i
    
    # Compute the principal axes and align the molecule
    set I [draw principalaxes $rA]

    # I align the first principal axis to the z axis
    set A [orient $rA [lindex $I 2] {0 0 1}]
    $sys move $A

    # I align the second principal axis to the y axis
    set I [draw principalaxes $rA]
    set A [orient $rA [lindex $I 1] {0 1 0}]
    $sys move $A

    # Update the coordinates for the current frame
    $sys update
    $rA update
}

# Write the modified trajectory to a new file
animate write dcd principal-aligned.dcd

References: Alignment to principal axes in VMD

Tips for using Arena to map coarse-grained structures to atomistic structures

The coarse-grained structures can be backmapped to the corresponding atomistic structures with Arena. There are many ways you can use this program to fit your needs, here I provide some of my scripts and procedures for you to use as a guidance.

I extracted 10,000 random-frames from my coarse-grained simulations with the script below - my goal was to eventually calculate 10,000 SAXS profiles to get the ensemble average.

# writen by Peter Zhang - 09/18/2024

# Load the required packages
package require pbctools
package require topotools
package require Tcl

# Define trajectory and output directory
set traj "/home/pz/remote/RNA-homopolymers/rA30/experiment-comparison/Na20/Mg2/rA30-4-1/md-align-wrap.dcd"
set output_dir "/home/pz/Projects/SAXS-calculation-homopolymer/rA30-2mM-calculation/cg-10000-frames"

# Load the trajectory
mol new /home/pz/remote/RNA-homopolymers/rA30/experiment-comparison/Na20/Mg2/rA30-4-1/rA30-Mg2-Na20-1.pdb type pdb
mol addfile $traj type dcd waitfor all first 3000 last -1

# Get the total number of frames in the traj
set num_frames [molinfo top get numframes]

# Set the number of random frames to select
set num_random_frames 10000

# Initialize a list to store the selected frames
set selected_frames {}

# Ensure that the number of requested frames does not exceed the total number of frames
if {$num_random_frames > $num_frames} {
    set num_random_frames $num_frames
}

# Randomly select frames without replacement
for {set i 0} {$i < $num_random_frames} {incr i} {
    set frame [expr {int(rand() * $num_frames)}]
    
    # Ensure no duplicate frames are selected
    while {[lsearch -exact $selected_frames $frame] != -1} {
        set frame [expr {int(rand() * $num_frames)}]
    }
    lappend selected_frames $frame
}

# Export each selected frame as a PDB file with a unique name
foreach frame $selected_frames {
    # Set the current frame
    molinfo top set frame $frame

    set sel [atomselect top "all not name Mg"]

    # Define the output file name
    set output_file [format "%s/frame_%05d.pdb" $output_dir $frame]

    # Write the selected atoms to the PDB file
    $sel writepdb $output_file

    # Delete the atom selection to free up memory
    $sel delete
}

puts "Completed exporting to $output_dir"

Arena will only recognize standard RNA atom naming system (P, OP1, OP2, O5’, C5’, C4’, O4’, C3’, O3’, C2’, O2’, C1’, N9, C8, N7, C5, C6, N6, N1, C2, N3, C4) - so if you named your coarse-grained beads differently, you need to change the bead name. For my three bead system, I chose to map the phosphate bead to the phosphate atom (P), the sugar bead to C2’, the nucleobase to C4. If you also named your residue differently in the coarse-grained pdb file, you should change them to the one letter code (A, U, C, or G).

The naming change can be done using the stream editor (sed) from your terminal, below is an example:

sed -i 's/\(P___\)ADE/\|_____A/' *.pdb

If after those changes, Arena cannot recognize your naming, make sure to check the spacing.

Now, once you input those coarse-grained structures and have the output from Arena. You can load them to vmd to visualize them. And you will likely see “bad bonds”, they are either too long or they are merged with some other atoms. To correct those “bad bonds”, we will need to perform an energy minimization step. I use GROMACS and Amber force field χOL3. Here is the script:

#!/bin/bash

# Written by Peter Zhang
# 07/23/24
# Referenced from Magdalena A. Jonikas (RNA gromacs tips)

FILE_PATTERN=$1 # pattern to match pdb files
NP=$2 # number of ions to add

export GMX_MAXBACKUP=-1

for FILENAME in $FILE_PATTERN; do
    NAME="${FILENAME%.pdb}"

    if [ ! -e $NAME-em-final.pdb ]; then
        echo "Running $NAME"
        
        echo "Checking for needed files"
        
        if [ ! -e em.mdp ]; then
            echo "Missing em.mdp"
            exit 1
        fi
        
        if [ ! -e $NAME.pdb ]; then
            echo "Missing $NAME.pdb"
            exit 1
        fi
        
        echo "Fixing the pdb file syntax"
        if head -n 3 $NAME.pdb | grep -E '(P|OP1|OP2)' > /dev/null; then
            echo "Deleting the first three lines of $NAME.pdb"
            sed -i '1,3d' $NAME.pdb
        fi
        
        echo "Making topology file from pdb file"
        gmx pdb2gmx -f $NAME.pdb -p $NAME.top -o $NAME.gro -ff amber03 -water tip3p -ignh >& out-pdb2gmx-$NAME.txt
        
        if [ ! -e $NAME.gro ]; then
            echo "FAIL: pdb2gmx"
            exit 1
        fi
        
        echo "Making box and adding water"
        gmx editconf -f $NAME.gro -o -d 2.0
        gmx solvate -cp out.gro -cs spc216.gro -p $NAME.top -o $NAME-solv.gro >& genbox-solvate-$NAME.txt
        
        if [ ! -e $NAME-solv.gro ]; then
            echo "FAIL: editconf or solvate"
            exit 1
        fi
        
        echo "Performing first energy minimization"
        gmx grompp -v -f em.mdp -c $NAME-solv.gro -o $NAME-em -p $NAME.top -maxwarn 10 >& grompp-em-$NAME.txt
        
        if [ ! -e $NAME-em.tpr ]; then
            echo "FAIL: grompp em"
            exit 1
        fi
        
        gmx mdrun -v -s $NAME-em -o $NAME-em -g $NAME-emlog >& out-mdrun-em-$NAME.txt
        
        if [ ! -e $NAME-em.trr ]; then
            echo "FAIL: mdrun em"
            exit 1
        fi
        
        # Add ions (30 for homopolymers with 30 nucleotides)
        echo "Adding ions"
        echo 3 | gmx genion -s $NAME-em.tpr -o $NAME-ion -p $NAME.top -np $NP >& out-genion-$NAME.txt
        
        if [ ! -e $NAME-ion.gro ]; then
            echo "FAIL: genion"
            exit 1
        fi
        
        echo "Performing final energy minimization"
        gmx grompp -v -f em.mdp -o $NAME-em-final -c $NAME-ion.gro -p $NAME.top -maxwarn 10
        
        if [ ! -e $NAME-em-final.tpr ]; then
            echo "FAIL: grompp em-final"
            exit 1
        fi
        
        gmx mdrun -v -s $NAME-em-final -o $NAME-em-final
        
        if [ ! -e $NAME-em-final.trr ]; then
            echo "FAIL: mdrun em-final"
            exit 1
        fi
        
        echo "Converting trajectory to pdb"
        
        echo 1 | gmx trjconv -f $NAME-em-final.trr -s $NAME-em-final.tpr -o $NAME-em-final.pdb >& out-convert-$NAME.txt
        
        if [ ! -e $NAME-em-final.pdb ]; then
            echo "FAIL: em convert"
            exit 1
        fi
        
        echo "Completed processing $NAME"
    fi
done

For additional details, please check out these references:

  1. Magdalena A. Jonikas’s RNA GROMACS Tips
  2. Arena Github.
  3. χOL3 force field

results matching ""

    No results matching ""