The goal of this file is to handle a dataframe.
We start the notebook by loading packages of interest:
library(ggplot2) # for nice figures
.libPaths()
## [1] "/shared/ifbstor1/software/miniconda/envs/r-4.5.1/lib/R/library"
We define directory to work in:
save_dir = "/shared/projects/2538_eb3i_n1_2025/atelier_scrnaseq/cours_intro_rmd/"
We create a dataframe:
my_data = data.frame(A = c(1,1,2,4,4,4,6),
B = c(2,2,1,3,4,5,6))
my_data
## A B
## 1 1 2
## 2 1 2
## 3 2 1
## 4 4 3
## 5 4 4
## 6 4 5
## 7 6 6
In this section, we explore the dataframe.
What are the dimensions of the dataframe ?
dim(my_data)
## [1] 7 2
We make a descriptive summary of the data:
summary(my_data)
## A B
## Min. :1.000 Min. :1.000
## 1st Qu.:1.500 1st Qu.:2.000
## Median :4.000 Median :3.000
## Mean :3.143 Mean :3.286
## 3rd Qu.:4.000 3rd Qu.:4.500
## Max. :6.000 Max. :6.000
We make a histogram to visualize the distribution of the column
B. For this purpose, we use a default package always
installed with R language.
hist(my_data$B)
We make a similar histogram using the ggplot2 package,
enabling better visual aspect. This package works with layers, separated
with a +.
ggplot(my_data, aes(x = B)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
In the end of the file, we save the object. Here, this is
my_data.
We prepare output path to save file:
filename_to_save = paste0(save_dir, "my_data.csv")
filename_to_save
## [1] "/shared/projects/2538_eb3i_n1_2025/atelier_scrnaseq/cours_intro_rmd/my_data.csv"
We save the dataframe at this location:
write.csv(my_data, file = filename_to_save)
The following packages and their version were load:
sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-conda-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS/LAPACK: /shared/ifbstor1/software/miniconda/envs/r-4.5.1/lib/libopenblasp-r0.3.30.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Paris
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_4.0.0
##
## loaded via a namespace (and not attached):
## [1] vctrs_0.6.5 cli_3.6.5 knitr_1.50 rlang_1.1.6
## [5] xfun_0.54 generics_0.1.4 S7_0.2.0 jsonlite_2.0.0
## [9] labeling_0.4.3 glue_1.8.0 htmltools_0.5.8.1 sass_0.4.10
## [13] scales_1.4.0 rmarkdown_2.30 grid_4.5.1 tibble_3.3.0
## [17] evaluate_1.0.5 jquerylib_0.1.4 fastmap_1.2.0 yaml_2.3.10
## [21] lifecycle_1.0.4 compiler_4.5.1 dplyr_1.1.4 RColorBrewer_1.1-3
## [25] pkgconfig_2.0.3 rstudioapi_0.17.1 farver_2.1.2 digest_0.6.37
## [29] R6_2.6.1 tidyselect_1.2.1 dichromat_2.0-0.1 pillar_1.11.1
## [33] magrittr_2.0.4 bslib_0.9.0 withr_3.0.2 tools_4.5.1
## [37] gtable_0.3.6 cachem_1.1.0