l
Welcome to the Software Carpentry at PSU
June 22-23, 2015
Instructors: Brad Taber-Thomas and Emily Davenport
Helpers:
------------------------------------------------------------
Links:
Bootcamp webpage: http://swcarpentry.github.io/matlab-novice-inflammation/
Bootcamp repository: https://github.com/erdavenport/2015-06-22-psu
Shell history: https://www.dropbox.com/s/5cywdtvhvwhj3z4/history.txt?dl=0 # This has been removed, but see Emily's terminal output below:
Emily's terminal output: https://www.dropbox.com/s/cl9ibo20awl6j4z/shell_terminal_output.txt?dl=0
git history: https://www.dropbox.com/s/5cywdtvhvwhj3z4/history.txt?dl=0 # This has alos been moved, it is now located here:
https://www.dropbox.com/s/c94htqqgs4se5hu/git_history.txt?dl=0
The plan:
Day 1:
- Shell
- Break
- More Shell
- Lunch
- Hammer intro by ICS
- Break
- Matlab
Day 2:
- More Matlab
- Break
- Even more Matlab
- Lunch
- Git
- Break
- More Git
- Wrap-up
Further reading:
Shell:
Set up:
- Download files: http://swcarpentry.github.io/shell-novice/shell-novice-data.zip
- Place files into a folder called "shell-novice" on your desktop
- unzip file -> should see a folder called "data"
- cd
- echo Goodmorning sunshine
- whoami (shows the user's current identity)
- pwd (present working directory)
- ls (list contents of folder)
- ls -F (adds a / to folders; letters behind dash are flags)
- ls -F molecules/ (lists contents of molecules folder)
- ls -a (shows hidden files)
- ls -lh (combine flags, l="long format, way too much info", h="human readable")
- ls -t (list by time, last modified)
- ls -R (list contents of directory recursively)
- cat (display contents, also concatenate file contents)
- nano (text editor software in the shell)
- mkdir (make a directory)
- rm (delete file)
- rm -i (delete, with prompt to confirm deletion)
- rm -r folder_name (delete files/folders recursively, that folder and everything in it)
- mv (move or rename a file)
- cp (make a copy of a file)
- wc filename (give the lines, words & character count for the file)
- wc -l file_name (give just the number of lines in a file)
- * is the wildcard character. it can be used in combination with lots of other commands e.g., mv *.pdf (move all the pdf files to a certain place).
- sort (sorts files alphabetically -n flag sorts numerically
- | allows you to concatenate commands in a single line (e.g., wc -l * | sort -n )
- [] allow you to specify 'or' in a command e.g., cp *[AB].txt will copy all files that end in A.txt or B.txt
- > sends to a file (and will overwrite what's there if the file already exists)
- >> appends to the file (adds to what's already there)
- Notes on naming files - try to stick with alphanumeric characters, stay away from spaces in file names, or special characters (e.g., @, $)
- You can preview what your script is doing by using the echo command - which will output the commands that will run.
Lesson Notes:
References:
http://software-carpentry.org/v5/novice/ref/01-shell.html
http://swcarpentry.github.io/matlab-novice-inflammation/
When naming files with dates use the format: YYYY/MM/DD
but no slashes, right? <--- Correct, don't use slashes in file names (because the shell will interpret those as folder names)
(The following code is for GitBash users only)
To color your files in windows enter:
cd
ls -a
nano .bash_profile
in new line enter:
LS_COLORS='di=1:fi=0:ln=31:pi=5:so=5:bd=5:cd=5:or=31:mi=0:ex=35:*.rpm=90'
export LS_COLORS
alias ls='ls -F --color --show-control-chars'
Save
Exit text file
Restart GitBash
For terminal users
cd
ls -a
nano .bash_profile
add new line with:
alias ls='ls -F <-- adds color only -FG<--Adds color and folder slash
save and restart terminal
Matlab
Go to the bootcamp webpage in Hammer: http://erdavenport.github.io/2015-06-22-psu/
Under the Matlab section you'll see a link to the data: http://erdavenport.github.io/2015-06-22-psu/matlab_data.zip
Save that into your home folder (if you're downloading it via firefox, it'll automatically download into your "Downloads" folder. In your terminal, cd to your home directory, then "mv Downloads/matlab_data.zip ~") <- relative path!! shell makes an entrance into our matlab lesson
Open a terminal.
cd work
ls
mkdir swc_workshop
cd swc_workshop
ls
cd ../..
ls
Is your data file there? If not, move it to your home directory.
Make sure you're in the same directory, then unzip:
unztip matlab_data.zip
If you see a matlab_data folder now, you're good to go.
Challenge! Move the matlab_folder into the swc_workshop folder (which is in the work directory)
rm matlab_data.zip # (we don't need it anymore)
cd work/swc_workshop
Load a module:
module avail
module avail matlab
module load matlab
module list
Each time you open a new terminal window, you'll need to load matlab.
Why matlab??
- Operates on large matricies well
- Default for neuroscience and other fields for both data collection and presentations
- Drawbacks: $$$$
matlab
(matlab should've openned up)
Go to Desktop in the menu bar and uncheck everything but "Command window"
question mark button up at the top will open up the documentation/help files (or go to "Help" on the menu -> "Product Help"
Go back to the terminal window. Let's glance through the data we're going to work with.
matlab
We'll need to open a new terminal tab to actually navigate around the shell.
cd matlab_data
ls
(should see inflammation files)
Each row is a subject, each column is a different timepoint
head inflammation-01.csv
(should see a bunch of numbers, comma separated)
Go to that other tab that is matlab, we're going to read in the files.
Challenge! Go through the help documentation and try to find a function that will read files that are comma separated?
Let's use csvread(filename), where filename is the name of the file we want to read into matlab
In the matlab command window:
Go to File -> Set Path -> Add with Subfolders -> (add the path to the swc_workshop folder)
You should see it add two paths to that list of paths. Close out (don't need to save for the future)
Let's load one of our files:
csvread('inflammation-01.csv')
Should see a bit matrix into your screen, however, that isn't saved into matlab.
clc
clc will clear the screen of all the numbers (make it nice and pretty)
csvread is a function
functions take parameters (in our case, the file name)
Let's save the data into matlab so that we can work with it in the program:
patient_data = csvread('inflammation-01.csv');
The semi-colon suppresses any output to the screen. Allows you to run things without a ton of numbers being printed to the screen. Use the semi-colon!
Now, the variable 'patient_data' containts the contents of the array. DISPlay the variable
disp(patient_data)
Variables:
- must start with a LETTER, but can contain numbers and underscores.
weight_kg = 55;
Go up to Desktop, click on workspace. This will show you what variables you have stored in your current matlab session.
weight_lb = 2.2*weight_kg;
disp(['Weight in pounds: ', num2str(weight_lb)]);
num2str(weight_lb)
weight_lb
Variables:
- Variables in programming are just like variables in math class
- In math, you can assign any number to x in an equation (y = mx +b, for instance, for the slope of a line) to get an answer (say slope is 2 and intercept is 1: y = 2x + 1), You can enter any number for x and get y.
disp(['Weight in pounds: ', num2str(weight_lb)]);
Challenge! Change the previous command to say 'Weight in kg' and then list out weight in kg
disp(['Weight in kg" ', num2str(weight_kg)]);
weight_kg = 100
disp(['Weight in kg" ', num2str(weight_lb)]);
Ooops, haven't updated the weight_lb variable! Let's rerun that:
weight_lb = 2.2 * weight_kg;
disp(['Weight in kg" ', num2str(weight_lb)]);
Yay! It's updated!
who
This lists what variables you have in the workspace
We don't need the weight variables, so let's get rid of those:
clear weight_lb
clear weight_kg
The ans variable always is there. It always holds the output of the last command you've run
ans gets overwritten every time you run a new command.
clear all (would get rid of everything, but let's not do that!)
Exercise! http://swcarpentry.github.io/matlab-novice-inflammation/01-intro.html 1/4 the way down the page, do the Predicting Variable Values Challenge (it's green)
Answers:
mass = 95
age = 102
clear age mass
who
whos
whos will tell you a bit more info than who (size of files, the class of the variable, etc)
How much data do we have?
size(patient_data)
This displays 60 40 (these are rows and columns - always in that order!)
All data in matlab is stored as an array. If you have a list of numbers (1,2,3,4), that's a 1 dimensional array called a vector.
class(patient_data)
class tells us what the data type is. patient_data is a number that's allowed to have decimal points. If there aren't decimal points, you can make an integer (whole numbers). If you create an integer with decimals, it will round and store it just as the nearest whole number.
Sometimes we might want to make a toy dataset to play with:
magic(8)
This created an 8x8 array where all of the columns, rows, and diagonal add up to the same thing.
Indexing:
- What if we want just part of an array?
M(5,6)
This will have us grab the 5th row, 6th column data point.
- What if we want all of row 5?
M(5,:)
The colon tells matlab to grab all of the columns
- What about all of column 6?
M(:,6)
- What about rows 1 - 4 and everything in between?
M(1:4)
Ooops, what columns do we need?
M(1:4, :)
This gives us rows 1-4 and all columns.
- What if you want all rows, and then every column after the 6th column?
M(:, 6:end)
The end designates the last column
- What if we want to skip rows?
M(2:3:end, :)
We're starting at the 2nd row, taking every 3rd row, until the last row)
- What about skipping rows and columns?
m(2:3:end, 2:2:end)
We can "slice" with numbers, but we can also slice with characters (or text)
element = 'oxygen';
Challenge! "Slicing" green box on this page: http://swcarpentry.github.io/matlab-novice-inflammation/01-intro.html
Answers:
1. gen, oye, xyge
2. element(:): You're telling it to return every element of the array. Each are put on their own line. Showing the contents of a variable is not the same as indexing into a variable.
Back from coffee break.
We're about 2/3 through the lesson if you're following along.
http://swcarpentry.github.io/matlab-novice-inflammation/01-intro.html
Let's start analyzing the patient data. Let's find the average of all the datapoints in the data set:
mean(patient_data(:));
Neat tip: if you put your curser next to parenthases, an underline will show up under the matching paranthasis
mean(patient_data)
Will give you the mean of each column of the data
Display the maxiumum data point;
disp(['Max inflammation: ', num2str(max(patient_data(:)))])
Display the minimum data point:
disp(['Max inflammation: ', num2str(min(patient_data(:)))])
Display the standard deviation of all the points:
disp(['Max inflammation: ', num2str(std(patient_data(:)))])
Open up the command history tab, highlight the last thre disps
Mac - Function F9
PC - F9
Let's pull all data for patient 1
patient_1 = patient_data(1, :);
disp('["Max inflammation: ', nu2str(max(patient_1))])
max(patient_data(1,:))
What if we want the meae mean for all subjects?
mean(patient_data, 1)
We are passing two parameters here: the first is the data the second stands for dimension (1 means columns, 2 means rows)
What's the size of the data that was output by the last command above?
size(mean(patient_data, 1))
One thing matlab is really good at is visualizing data.
imagesc(patient_data)
This takes a matrix, and shows it to you in color (so you see the higher values have a different color than the lower values)
Let's calculate the means at each time point and plot that:
ave_inflammation = mean(patient_data, 1);
plot(ave_inflammation)
disp(ave_inflammation)
plot(max(patient_data, [], 1));
Why the extra bracket [] on that max function? Max had a functionality were if you give it two equally sized arrays, i'll output a new array where for each element it will return the max value out of the two original arrays.
Challenge1e! The green Plots box on this page: http://swcarpentry.github.io/matlab-novice-inflammation/01-intro.html
Answers:
1. There is no patient 0 or day 0. The lines are slightly slanted because they're jumping from day to day.
2. plot(std(pateint_data, 1));
How can we do 2 plots side by side?
subplot(1, 2, 1)
subplot sets up the plotting space, but doesn't actually plot anything
plot(max(patient_data, [], 1));
subplot(1, 2, 2)
plot(min(patient_data, [], 1));
subplot(2,2,1)
The first two parameters of subplot stands for the number of rows and the number of columns of plots
If you're following along with the notes, we're now going to make scripts:
http://swcarpentry.github.io/matlab-novice-inflammation/02-scripts.html
Go to Desktop in the top bar and editor. The scripts you save have a .m extension, but they're just text files.
Click the little page in the upper left corner to open up a new script.
% signifies that the text following it is just a comment. Matlab will ignore anything you write after that.
Comment your code a lot!! Future you will forget what you were trying to do.
Some comments to consider putting at the top of your script:
note: use rm to remove folders in MATLAB on Hammer --> goes to home file when you just delete using the 'Desktop' Tab, but it's hidden and takes up unnecessary space
%%%%%%%%%%%% This should all be in your script %%%%%%%%%%%%
% script analyze.m for inflammation patient data
% bct3 wrote this script
patient_data = csvread('inflammation-01.csv');
disp(['Analyzing inflammation-01.csv:'])
disp(['Max inflammation: ', num2str(max(patient_data(:)))])
disp(['Min inflammation: ', num2str(min(patient_data(:)))])
disp(['Standard Deviation inflammation: ', num2str(std(patient_data(:)))])
ave_inflammation = mean(patient_data, 1);
plot(ave_inflammation)
ylabel('average')
print -dpng 'average.png'
subplot(1,2,1)
plot(max(patient_data, [], 1));
ylabel('max')
subplot(1,2,2)
plot(min(patient_data, [], 1));
ylabel('min')
print -dpng 'pateint_data-01.png'
%%%%%%%%%%%% End of what is in script %%%%%%%%%%%%%%%%%%
Matlab (Day 2)
Loops: http://swcarpentry.github.io/matlab-novice-inflammation/03-loops.html
Open a terminal
mmlsquota
This function will tell you your memory quotas for different spaces on the cluster.
2 numbers to pay attention to are the "size" and the "limit". If your size is getting close to the limit, you may start to have problems running matlab.
Go to Applications (top left), Accessories, right click on terminal and it'll add the launcher to desktop.
Right click on the terminal icon, go to Properties, go to the Launcher tab. Under command it should say gnome-terminal ~working-directory=work/swc_workshop
You can set up multiple terminal launchers with different paths (if you're working on multiple projects)
In the terminal, load the module load
Then type matlab to open up matlab
It should open up to the script that you were making anyway.
Make sure to add the data folder to the path
Try running the first line of the script:
patient_data = csvread('inflammation-01.csv');
whos
Make sure the patient_data variable is in there
clear all
Run the whole script.
analyze
You should get output that displays file name, max, min, sd, and the figure
We're going to make a change to the script: We only want it to output one figure rather than two figures.
%%%%%%%%%%%% This should all be in your script %%%%%%%%%%%%
% script analyze.m for inflammation patient data
% bct3 wrote this script
patient_data = csvread('inflammation-01.csv');
disp(['Analyzing inflammation-01.csv:'])
disp(['Max inflammation: ', num2str(max(patient_data(:)))])
disp(['Min inflammation: ', num2str(min(patient_data(:)))])
disp(['Standard Deviation inflammation: ', num2str(std(patient_data(:)))])
subplot(1,3,1)
ave_inflammation = mean(patient_data, 1);
plot(ave_inflammation)
ylabel('average')
subplot(1,3,2)
plot(max(patient_data, [], 1));
ylabel('max')
subplot(1,3,3)
plot(min(patient_data, [], 1));
ylabel('min')
print -dpng 'pateint_data-01.png'
%%%%%%%%%%%% End of what is in script %%%%%%%%%%%%%%%%%%
That's great, but we only analyzed one data file. Let's write some loops to analyze everything.
Open a new script.
word = 'brain'
Let's say we wanted to print one letter at a time:
disp(word(1))
disp(word(2))
disp(word(3))
Ugh. Boring. I'd rather automate the code so that it prints out each letter. That way, it doesn't matter how long the word is, or if we add letters to the end of the word later.
word = 'ofc'
for letter = 1:4 % the 1:4 just stands for 1,2,3,4
disp(word(letter)) % we use the value of letter to index (letter will equal 1,2,3,4 as the loop runs)
end
Darn, this still isn't as flexible as we'd like it. If the word is only 3 letters, when it gets to the fourth iteration of the loop it gives us an error.
for letter = 1:length(word) % length will find the size of the variable we're looping over.
disp(word(letter))
end
Great, by using length(), we now have a flexible loop that will adjust to the word we give it.
Sometimes we'll want to add a counter inside a loop:
len = 0;
for letter = 1:length(word)
len = len + 1;
disp(word(letter))
end
How did len get to five? Each time through the loop we added 1 to the value of len. So, for each iteration of the loop it looks like this:
1 = 0 + 1
2 = 1 + 1
3 = 2 + 1
4 = 3 + 1
5 = 4 + 1
Challenge! Incrementing with loops (green box in the notes about the word aluminum).
Answer:
for letter = 1:length(word)
len = len + 1;
disp(word(1:len));
end
Striding:
disp(1:3:11)
Can go backwards too:
disp(11:-3:1)
Challenge! Display the letters of 'brain' backwards, one per line.
Answer:
for letter = length(word):-1:1
disp(word(letter));
end
Great, we now know the basic structure of a loop in matlab. Let's open a new script and start to loop over our inflammation files:
for idx = 1:12
file_name = sprintf('inflammation-%d.csv', idx); % the %d stands for a digit, that digit is the looping variable idx
disp(file_name)
end
Ok, that's close, but we need to pad the single digit numbers with a leading digit. We can add the '02' right before the d, which tells it that we want two digits, padded with a leading zero.
for idx = 1:12
file_name = sprintf('inflammation-%02d.csv', idx); % the %d stands for a digit, that digit is the looping variable idx
disp(file_name)
end
Let's go back to our analyze.m script and update it:
Highlight all your code in the script and hit tab. We're going to incase all of that code in a for loop:
%%%%%%%%%%%% This should all be in your script %%%%%%%%%%%%
% script analyze.m for inflammation patient data
% bct3 wrote this script
for idx = 1:12
file_name = sprintf('inflammation-%02d.csv', idx);
img_name = sprintf('pateint_data-%02d.png', idx);
patient_data = csvread(file_name);
disp(['Analyzing ', file_name, ':'])
disp(['Max inflammation: ', num2str(max(patient_data(:)))])
disp(['Min inflammation: ', num2str(min(patient_data(:)))])
disp(['Standard Deviation inflammation: ', num2str(std(patient_data(:)))])
subplot(1,3,1)
ave_inflammation = mean(patient_data, 1);
plot(ave_inflammation)
ylabel('average')
subplot(1,3,2)
plot(max(patient_data, [], 1));
ylabel('max')
subplot(1,3,3)
plot(min(patient_data, [], 1));
ylabel('min')
print('-dpng', img_name)
end
%%%%%%%%%%%% End of what is in script %%%%%%%%%%%%%%%%%%
gnome-open is useful on this cluster to open up the files
So, our script is pretty good, but it still isn't flexible to how many files are in the folder. What if we add more patient data? We'll need to re-write the script.
Navigate to the folder where the data are stored.
We've added the files variable below. This is a new variable type: struct. It's a variable that can have other variables in it.
files.name will show all of the names of the files.
length(files) will show how many files we have
strrep stands for string replace. Below, we're using it to replace '.csv' in the file name with '.png' so we can save the image with the same name as the input file.
%%%%%%%%%%%% This should all be in your script %%%%%%%%%%%%
% script analyze.m for inflammation patient data
% bct3 wrote this script
files = dir('inflammation*.csv');
for idx = 1:length(files)
%file_name = sprintf('inflammation-%02d.csv', idx);
file_name = files(idx).name;
%img_name = sprintf('pateint_data-%02d.png', idx);
img_name = strrep(files(idx).name, '.csv', '.png')
patient_data = csvread(file_name);
disp(['Analyzing ', file_name, ':'])
disp(['Max inflammation: ', num2str(max(patient_data(:)))])
disp(['Min inflammation: ', num2str(min(patient_data(:)))])
disp(['Standard Deviation inflammation: ', num2str(std(patient_data(:)))])
subplot(1,3,1)
ave_inflammation = mean(patient_data, 1);
plot(ave_inflammation)
ylabel('average')
subplot(1,3,2)
plot(max(patient_data, [], 1));
ylabel('max')
subplot(1,3,3)
plot(min(patient_data, [], 1));
ylabel('min')
print('-dpng', img_name)
end
%%%%%%%%%%%% End of what is in script %%%%%%%%%%%%%%%%%%
Yay! We can loop! And we can write scripts.
We've been using built in functions in matlab, but we can write our own functions.
Navigate to the swc_workshop folder in the terminal and let's so some spring cleaning.
rm patient_data*
Let's make some simple functions:
%%%%%%%%%%%% This should be in your new script %%%%%%%%%%%%%
% file fahr_to_kelvin.m
function ktemp = fahr_to_kelvin(ftemp)
ktemp = ((ftemp - 32)*(5/9)) + 273.15;
end
%%%%%%%%%%%% End of script %%%%%%%%%%%%%%%%%%%%%%%
Now run the function:
fahr_to_kelvin(32)
Let's try another function:
%%%%%%%%%% This should be another new script %%%%%%%%%%%%%%
% file kelvin_to_celsius.m
function ctemp = kelvin_to_celsius(ktemp)
ctemp = ktemp - 273.15;
end
%%%%%%%%%%% End of script %%%%%%%%%%%%%%%%%%%%%%%
Now run that function;
kelvin_to_celsius(0.0)
Let's combine a couple of the functions we already have into one last function:
%%%%%%%%%%%% Another separate script %%%%%%%%%%%%%%%%%
% file fahr_to_celsius.m
function ctemp = fahr_to_celsius(ftemp)
ktemp = fahr_to_kelvin(ftemp);
ctemp = kelvin_to_celsius(ktemp);
end
%%%%%%%%%%%% End of script %%%%%%%%%%%%%%%%%%%%%%
fahr_to_celsius(32)
Try to keep functions between 20-40 lines. Much larger than that, they become really hard to manage and typos get pretty hard to spot.
COFFEEEEE! Be back at 10:50 please.
Now that you're caffeinated, a challenge!
Start at Concatenating in a function green box here: http://swcarpentry.github.io/matlab-novice-inflammation/04-func.html
disp(['the', 'brain', 'rocks!', num2str(1)])
%%%%%%%%%%%%%% Function from the challenge %%%%%%%%%%%%%
% file fence.m
function output = fence(original, wrapper)
output = strcat(wrapper, original, wrapper);
end
%%%%%%%%%%%%%% End of script %%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%% Outer function from challenge %%%%%%%%%%%%%
function output = outer(word)
output = strcat(word(1), word(end));% now takes 1st letter of helium and last of it
end
%type into command window "output('helium;)
gives you 'hm'
%%%%%%%%%%%%%% End of script %%%%%%%%%%%%%%%%%%%%%
outer('brain')
%%%%%%%%%%%%%% center.m script %%%%%%%%%%%%%%%%%%%%
% file center.m
function out = center(data, desired)
out = (data - mean(data(:))) + desired;
end
%%%%%%%%%%% End of script %%%%%%%%%%%%%%%%%%%%%%%%
z = zeros(2,2)
center(z, 3)
z = z + 1
center(z, 3)
Let's try the center function on our data. Let's start with just one file, so that we know it's doing what we think it should be doing.
data = csvread('inflammation-01.csv');
centered = center(data(:), 0);
size(centered)
Let's add some help comments to our function. Put these right under your function definition. The function needs to be on line one:
%%%%%%%%%%%%%% center.m script %%%%%%%%%%%%%%%%%%%%
function out = center(data, desired)
% Center data around desired
% out = enter(data, desired)
% returned "out" array of centered data
out = (data - mean(data(:))) + desired;
end
%%%%%%%%%%% End of script %%%%%%%%%%%%%%%%%%%%%%%%
Challenge! "Testing a Function" question #3 on the bottom of this page: http://swcarpentry.github.io/matlab-novice-inflammation/04-func.html
%%%%%%%%%%%%%%%% Challenge script %%%%%%%%%%%%%%%%%%
function [] = run_analysis(filename)
disp(['running...', filename])
patient_data = csvread(filename);
pause(2)
close()
ave_inflammation = mean(patient_data, 1);
subplot(1,3,1)
plot(ave_inflammation)
ylabel('average')
subplot(1,3,2)
plot(max(patient_data, [], 1));
ylabel('max')
subplot(1,3,3)
plot(min(patient_data, [], 1));
ylabel('min')
end
%%%%%%%%%%%%%%%% End of script %%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%% Batch analysis script %%%%%%%%%%%%%%%%%
function [] = batch_analysis()
files = dir('inflammation*.csv');
for idx = 1:length(files)
disp(files(idx).name);
run_analysis(files(idx).name)
end
%%%%%%%%%%%%%%% End of script %%%%%%%%%%%%%%%%%%%%
batch_analysis
When we do defensive programming, we want to try to catch errors before they happen. For instance, you might want to only process samples that have at least 60 lines in the file. You can use something called an assert statement to ensure that the data is long enough.
%%%%%%%%%%%%%%%% Challenge script %%%%%%%%%%%%%%%%%%
function [] = run_analysis(filename)
disp(['running...', filename])
patient_data = csvread(filename);
assert(size(patient_data, 1) == 60, 'Files must have 60 rows!')
pause(2)
close()
ave_inflammation = mean(patient_data, 1);
subplot(1,3,1)
plot(ave_inflammation)
ylabel('average')
subplot(1,3,2)
plot(max(patient_data, [], 1));
ylabel('max')
subplot(1,3,3)
plot(min(patient_data, [], 1));
ylabel('min')
end
%%%%%%%%%%%%%%%% End of script %%%%%%%%%%%%%%%%%%%%
Control-C will cancel any loop or process running in matlab. Useful when you accidentally start something that takes a long time to run.
What's the difference between "=" and "=="
- "=" is used to assign to a variable
- "==" is used to test of two variables or objects are the same/equal
GIT
Lesson material: http://swcarpentry.github.io/git-novice/
Open terminal, type git (tells you all the git options)
on a Mac, if you get an Xcode error, install it:)
Why use git?
multiple version of the same file gets annoying!
E.g., ... http://www.phdcomics.com/comics/archive.php?comicid=1531
version1.doc, version2.doc, final.doc, final24.doc
is very good at managing files/versions through the cloud
never lose ANYTHING...EVER!
time travel (go back in time and see previous versions you presented at the World Conference 2013)
detects when you might overwrite changes and will prompt you to reconcile changes
git config --global core.editor "nano -w"
git config --global user.name "Your Name"
git config --global user.email "your@email.com"
git config --global color.ui "auto"
git config --list
shows you your configuration settings
Let's set up a git repository!
cd (to go Home)
mkdir planets
cd planets
git init
this will initialize your repository
should see message about git repository (aka "repo") being initialized
ls -a
lists all files/folders, even hidden ones (which have names that start with "." dot)
should see .git (that's your hidden git repo folder, if you delete is your git repo/version history are gonezo)
git status
tells you what's happening in your git repo
you're on branch master (we're staying on "master" for these lessons, so it should always say master today)
Problem: Is it a good idea to make a folder "mars" inside planets, and initialize mars as a git repo?
pretty much always a bad idea to make repos within repos!
nano mars.txt
type some notes in there about mars
ctrl + o to save, enter to save as mars.txt
ctrl + x to exit
ls and you'll see your mars.txt file
cat mars.txt
shows contents of file
git status
you'll see your mars.txt file as an untracked file
git add mars.txt
adds mars so git will track it
git commit -m "Starts notes on Mars as a base"
you'll see a note about what you did, files changed
that was the git add - commit cycle: you update a file, put it in the "staging area" (with git add), and then commit (changes to whatever's in your staging area is committed)
committing only what is in the staging area is helpful so you can commit only those changes you want; changes that you might not be done with can be staged/committed later
git status
shows you status of repo (should be clean, you don't have any new changes to commit)
git log
shows log of your commits, commit messages, so you can see what/when/who committed
nano mars.txt
add another line of text to edit the file
ctrl + o, enter, ctrl + x (save, accept name, exit)
git status
modified mars.txt that aren't staged for commit (we made changes, but haven't staged them for commit)
but before we commit, let's see what we changed in our file (e.g., maybe the changes we made messed up the file and we want to compare it to the old version)...
git diff
first line shows you "diff" (the command you ran) and the two files you're comparing
- indicates deletions, + additions to the file
git commit -m "add concerns about effects of mars' moons on Wolfman"
oops, something went wrong there, all we see is the git status again with mars.txt still unstaged, and message that "no changes added to commit"
we forgot to git add!
git add mars.txt
git commit -m "add concerns about effects of mars' moons on Wolfman"
you'll see nice commit message, # files changed, # lines inserted
nano mars.txt
add a third line of text
save/exit
cat mars.txt
git diff
shows line added in green with +
git add mars.txt
git diff
didn't do anything because it's in the staging area (files have to be unstaged to be diffed; once files are staged git considers that staged version as the current version so there's nothing to diff it with)
you can edit mars.txt again even though it hasn't been committed, and it will get thrown off the stage (i.e., will be unstaged)
you can still diff a staged file, "git diff -staged"
git commit -m "Discuss concerns about mars' climate for Mummy"
you'll see your nice commit message
git status
on master, nothing to commit, we're all caught up
git log
we've made three commits, first/initial commit is at bottom, most recent commit at top
Problem: Committing changes to git: which would save changes to myfiles.txt to local git repo?
$ git add myfile.txt
$ git commit -m "my recent changes"
Problem: Make "bio" git repository
cd ..
get out of planets so we don't make a repository in a repository (and get stuck in a black hole)
mkdir bio
cd bio
git init
nano me.txt
write a three line bio
git add me.txt
git commit -m "edited my life"
nano me.txt
modify one of the lines, and add a fourth line
git diff
Let's go back to planets
cd ../planets
ls
should see mars.txt
git diff HEAD~1 mars.txt
head = current commit, ~1 = go 1 commit back (from head), and compare mars.txt in that commit to current version
git diff HEAD~2 mars.txt
compare current mars.txt to mars.txt from 2 commits ago
git log
get commit id (long crazy string next to "commit") for the commit you want to compare to
git diff fae08aekje83 mars.txt
or whatever your crazy long string is:) (you usually only need to use the first 10ish characters of it, enough so you know it's going to uniquely identify the commit)
nano mars.txt
add line about needing to manufacture oxygen or whatever
save/exit
cat mars.txt
OOPS, we didn't like that change, we want to go back to the last committed version...
git checkout HEAD mars.txt
head = last commit, and then specify which file you want to checkout from that commit
the stuff we didn't want in mars.txt will be GONE FOREVER! because we didn't commit it
you can replace HEAD with a commit ID or HEAD~3 (or any number of commits ago)
PROBLEM: Recovering older versions of a file...
Which commands below will let Jennifer recover the last committed version of her Python script called data_cruncher.py? Answer is 5 (both 2 and 4)...
$ git checkout HEAD data_cruncher.py
$ git checkout <unique ID of last commit> data_cruncher.py
Let's ignore things we don't want to be version controlled by git (bit data files, non-text files, images, raw data, etc.); version controlling just SCRIPTS is common (results files can be reproduced from those scripts at any time). Here's how we ignore....
mkdir results
touch a.dat b.dat c.dat results/a.out results/b.out
ls results
see a.out and b.out
git status
lots of new files, but we don't want to see those every time and we don't want to control them
so let's make a git ignore file that lists the things we want git to ignore...
nano .gitignore
add these two lines, then save/exit...
*.dat
results/
git status
all those new files are gone, and we just see .gitignore
it's up to you if you version control .gitignore; Emily chooses to do so, Brad does too (because I mess it up sometimes and want to go back to my previous versions:)
git add .gitignore
git commit -m "added git ignore file"
git status
git add a.dat
note: you will get an error if you try to stage a file that you have in your git ignore file
you can use "git add -f" to force staging that file, but you probably just want to remove it from your git ignore
GIT HUB
sign up for an account at github.com
pros: cloud based repositories, free
cons: pretty much only allows public repositories
alternatives: https://bitbucket.org/
In web browser...
log in to git, Click + in top right corner to create a new repository "planets"
under quick setup, click "ssh" and copy LINK.git
Back in terminal window...let's connect our planets repo to the online github planets repo
cd ~/planets
git remote add origin LINK.git
Now let's "push" our repository to the github repository
git push origin master
origin = what we're calling our github repository (pretty standard to just use that name, must match what you used when you added the remote link above)
master = which branch you're on--we aren't doing any "branching" today, you're always on the master branch
Pair up with neighbor to share repository:
Person A-- on github go to your planets repository page, settings link on right, and click collaborators
add Person B's github username
Then Person B-- on your computer, go to your Desktop folder (or any folder other than where you have your own planets directory)
on github, click search bar at the top and search for Person A's username, go to their planets repo, and in lower left panel copy the https link for cloning the repo
back on your computer in the shell...
git clone https://github.com/PersonA/planets (or whatever their https link is)
ls
you'll see you have a planets directory now
cd planets
let's make a new file...
nano pluto.txt
add some text to it, save/exit
git add pluto.txt
git commit -m "added notes on pluto"
git push origin master
this pushes your new files up to Person A's planets repository on github
Person A can now get those changes Person B made into A's planets directory on A's machine
Person A, in their shell on their computer, from in their planets directory that they pushed up to github earlier...
git pull origin master
pulls down new commits that have been made to Person A's planets repo on github
ls
you should now see the new pluto.txt file
Now let's create a conflict...
Person A
make some edits to mars.txt
git add mars.txt
git commit -m "person A made some changes"
git push origin master
push those changes up to github
After that, Person B
make some edits to mars.txt
git add mars.txt
git commit -m "person B made some OTHER changes!"
git push origin master
try to push those changes up to github
Rejected! Git knows--Person A made some edits that Person B didn't have in their repository; before trying to push to github, Person B needed to git pull to get latest version of github planets repository
Person B:
git pull origin master
message about a conflict in mars.txt that you need to resolve, do it NOW!
nano mars.txt
now it looks kind of crazy, you see both Person A's version and Person B's version, and you can pick which you want (or delete both and type something totally new)
save/exit
git add mars.txt
git commit -m "resolved merge conflict"
git push origin master
now it should push just fine; git defers to you, the human, when you've resolved a conflict--that means you can mess it up, and that the person who pushes changes to a repository first has no problem, and the person who gets there second has to deal with the conflict:)
SHELL SCRIPTS:
http://swcarpentry.github.io/shell-novice/05-script.html
cd
cd Desktop/shell-novice/data/users/nelle/molecules/
ls
see cubane.pdb, ethane.pdb, etc. files
Script to take lines 10-15 of file...
head -15 octane.pdb | tail -5
nano middle.sh
type "head-15 octane.pdb | tail -5" in there, save/exit
bash middle.sh
see same output as just typing those commands in terminal
nano middle.sh
replace octane.pdb with "$1" (include quotes! necessary when there are spaces in filenames--which you should avoid at all costs :)...
head -15 "$1" | tail -5
bash middle.sh octane.pdb
same output again:)
bash middle.sh cubane.pdb
now you see lines 10-15 of cubane.pdb
nano middle.sh
head "$2" "$1" | tail "$3"
bash middle.sh octane.pdb -10 -3
Future you: Will be very confused about what the heck this script was for, so let's add comments to the beginning of the script to let future us know what this thing did...
# Select lines from the middle of a file
# Usage: middle.sh filename -end_line -num_lines
nano sorted.sh
wc -l "$@" | sort -n
save/exit; note--the $@ will take as many parameters as you want (whereas $1 or $2 can hold only a single parameter)
bash sorted.sh *.pdb ../creatures/*.dat
we're passing lots of files to the sorted.sh script! (all files matching *.pdb, and all files in the creatures directory matching *.dat)
nano sorted.sh
add some comments for future you!
#list files sorted by number of lines
#usage: sorted.sh filelist
#note: file list can contain any number of files to sort by number of lines
history | tail -4 > redo-figure-3.sh
this grabs your most recent 4 commands and dumps them into the redo-figure-3.sh file, which you can go into in a text editor and modify as desired to create a script based on