Introduction to code versioning with git¶

git

schedule¶

  • introduction to files history
  • introduction to Git, a system of code versioning
  • the Git cycle
  • branches

support¶

jupyter notebook running into the jupyter/minimal-notebook docker container

  1. docker run -it --rm -p 8888:8888 --user root -e NB_USER="tutoriel_git" -e CHOWN_HOME=yes -v "${PWD}:/home/${NB_USER}" jupyter/minimal-notebook
  2. in a browser, open the last given URL http://127.0.0.1:8888/lab?token=xxx
  3. cp notebook tutotiel_git.ipynb & images repository into the ${PWD}/tutotiel_git/ repository

note:

  • notebook with a Python kernel: use %%sh for shell (bash) in code cells
  • docker container: use cd ${PWD}/xxx/ in code cells to work into xxx
  • at the end of the notebook, the tutoriel_git repository will look like:
    ├── FAIR_bioinfo_github
    │   └── README.md
    ├── first_git_example
    │   ├── file1.txt
    │   └── file2.txt
    └── tutoriel_git.ipynb

Really need of a files history?¶

Final?

Most researchers are primarily collaborating with themselves,” Tracy Teal explains. “So, we teach it from the perspective of being helpful to a ‘future you’.”

Files history, a good practice for reproducible research¶

”Rule 4: Version Control All Custom Scripts”

PLOS Computational Biology, 2013

Code control version¶

Definition: version control, revision control, source control, or source code management: class of systems responsible for managing changes to files

Feature: each revision is associated with a timestamp and the person making the change. Revisions can be compared, restored, and merged

Software: SVN, Git, Mercurial, GNU arch, etc

We choose Git.

Git vs. GitHub¶

Git

  • will track and version your files
  • enables you to collaborate with ... yourself (and others if they have local access)
  • open source license GPL (GNU General Public License)
  • created in 2005 by Linus Torvalds for the development of the Linux kernel

GitHub

  • stores online Git repositories
  • enables you to collaborate with others (and yourself!)
  • sources belong to Microsoft
  • code deposited on it is shared with Microsoft (beware of sensitive data)
  • first commit in 2007 by Chris Wanstrath

Git concepts, Git objects¶

  • working directory: a user private copy of a whole repository of interest
  • clone: a local copy of a repository (with all commits and branches), the original repository can be local, or remote (http access)
  • commit: a git object, compressed snapshot of your entire repository; the command that saves changes by creating the snapshot
  • HEAD: pointer to your current working commit. Can be moved (git checkout) to branches, tags, or commits
  • branch: a lightweight movable pointer to a commit
  • merge: combines remote tracking branch into current local branch
  • staging area: list of files of the working directory that will be considered for next commit (ie. could be not all the modified files)
  • tag: a version you want to memorize
  • Revision graph: Revision Graph

Git setup¶

Git configuration: check the configuration of your git user.name with:

In [ ]:
%%sh
git config --list

if not yet done (nothing displayed), tell git our identity:

In [ ]:
%%sh
git config --global user.name ’clairetn’
git config --global user.email ’claire.ctn@gmail.com’
git config --list

Git repository initialisation: The initialisation (red arrow) is the creation of a .git repository:

3 ways to initialise a git repository:

  • git init: inside an existing folder (possibly containing files)
  • git init myproject: create myproject folder + initialize the .git subfolder inside it
  • git clone /gitfolder/path /new/path: copy the existing git repository to a new one

Initalise a git repository:

In [ ]:
%%sh
git init first_git_example

Observe the git folder:

In [ ]:
%%sh
ls -lah first_git_example

Git work cycle, 3 steps¶

  1. create/delete/change files
  2. place the files to follow to a special space, the staged area with add myfiles
    Git add
  3. keep the actual version of the files included in the staged area with commit -m "my reason of change"
    Git commit

The status command explains the git step of each file of the folder:

In [ ]:
%%sh
cd ${PWD}/first_git_example
git status

Now, experiment one git cycle.

Create 2 files:

In [ ]:
%%sh
cd ${PWD}/first_git_example
for i in 1 2 ; do 
   echo "text of file "${i}"\n" > file${i}.txt ;
done
ls

Check the git status:

In [ ]:
%%sh
cd ${PWD}/first_git_example
git status

Observe: the 2 new files are included in the list of untracked files.

Add file1.txt to the list of tracked files, the staged area:

In [ ]:
%%sh
cd ${PWD}/first_git_example
git add file1.txt
git status

file1.txt pass from untracked to staged (ie. to be committed).

Change again the content of file1.txt:

In [ ]:
%%sh
cd ${PWD}/first_git_example
sed 's/text/text change /' file1.txt > tmp ; mv tmp file1.txt
git status

observe the 3 states. Note that file1.txt appears in to be commited and also in not staged for commit. Why?

Stage all files:

In [ ]:
%%sh
cd ${PWD}/first_git_example
git add file?.txt
git status

And commit:

In [ ]:
%%sh
cd ${PWD}/first_git_example
git commit -m "commit with all files"
git status

note on commit message (-m):
follow convential-commits specification for adding human and machine readable meaning:

<type>[optional scope]: <description>
[optional body]
[optional footer(s)]

type examples: feat fix build chore ci docs style perf test ...
here an example in French

middle conclusion¶

So far, you've started a new project whose code is versioned by git.
You have created files and all their successives changes were tracked.

To avoid bad changes of code, it is a good practice to test a new code version before use it, and so separate development code from production code. With the Git branch concept, you may manage this separation: develop code from an initial copy of the master code.

Use branches¶

We will now create a 2nd project by copying an already existing one (from an online git project site, e.g. github):

In [ ]:
%%sh
git clone https://github.com/clairetn/FAIR_bioinfo_github.git
ls -lah FAIR_bioinfo_github/

Observe the result:

  • a new folder has been created
  • its name is directly deduced from the URL
  • it contains a .git repository and a README.md file: it is a minimal project!

To developpe a new functionality, add a branch with branch:

In [ ]:
%%sh
cd ${PWD}/FAIR_bioinfo_github
git branch branch_myfn # create a branch
git branch # list all branches

The default branch is nammed master. The star denotes the working branch.

Move to the new branch with checkout:

In [ ]:
%%sh
cd ${PWD}/FAIR_bioinfo_github
git checkout branch_myfn
git branch

Explore the branch (ls, git status):

In [ ]:
%%sh
cd ${PWD}/FAIR_bioinfo_github
ls -lah
git status

The branch branch_myfn looks at a strict copie of the origin, the master branch.

Realise a git cycle: i) change the README.md file by adding your firstname to the authors list, ii) add the file to the staged area, and iii) commit:

In [ ]:
%%sh
cd ${PWD}/FAIR_bioinfo_github
echo "- my firstname "  >> fn.txt ; 
cat < fn.txt >> README.md ; rm fn.txt # add fisrtname
more README.md ; echo "----------" # check adding
git status # check status
In [ ]:
%%sh
cd ${PWD}/FAIR_bioinfo_github
git add README.md # add to the staged area
git status
In [ ]:
%%sh
git commit -m "add firstname" # commit step
git status

Once you have check that the changes are corretcs, back to the master branch. Check the version of README.md file:

In [ ]:
%%sh
cd ${PWD}/FAIR_bioinfo_github
git checkout master
more README.md

It is the version before the change in the branch_myfn branch.

merge and delete the branch_myfn:

In [ ]:
%%sh
cd ${PWD}/FAIR_bioinfo_github
git merge branch_myfn # merge the branch to master
echo "------------"
more README.md
echo "------------"
git branch -d branch_myfn # -d = delete
git branch

Conclusion¶

You now know how to version a project with the Git commands in a Git cycle (change/add to stage/commit/push).

And you also have use a Git branch to test a new code functionality/version before save it.

References¶

  • version control, wikipedia
  • git quick guide, tutorial point
  • git getting started