MLBA Tools & Lab Setup
03.03.2024
Install GitHub Desktop app to help you with using GitHub. Additionally, you can see our FAQ for obtaining professional accounts.
{here}
package for easy file path management within projects.renv
)renv
is a package management tool that helps you manage the packages used in an R project.renv
)renv
project with renv::init()
.renv::restore()
to install packages from the renv.lock
file.renv::snapshot()
to occasionally update your packages.renv::status()
to see if the list in renv.lock
needs updating.reticulate
)reticulate
)reticulate
package in R.reticulate::use_python()
or reticulate::use_condaenv()
to specify the location of your python environment.reticulate::import()
to import python modules in R.reticulate::py_run_string()
to execute python code in R.{python}
at the beginning of the code chunk.r.OBJECT_NAME
.py$OBJECT_NAME
.reticulate::r_to_py()
and reticulate::py_to_r()
to explicitly change between objects.
Loading the data R
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3.0 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> 5 5.0 3.6 1.4 0.2
#> 6 5.4 3.9 1.7 0.4
Modelling in pure R
#>
#> Call:
#> lm(formula = "Sepal.Length ~. ", data = iris)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.82816 -0.21989 0.01875 0.19709 0.84570
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.85600 0.25078 7.401 9.85e-12 ***
#> Sepal.Width 0.65084 0.06665 9.765 < 2e-16 ***
#> Petal.Length 0.70913 0.05672 12.502 < 2e-16 ***
#> Petal.Width -0.55648 0.12755 -4.363 2.41e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.3145 on 146 degrees of freedom
#> Multiple R-squared: 0.8586, Adjusted R-squared: 0.8557
#> F-statistic: 295.5 on 3 and 146 DF, p-value: < 2.2e-16
Modelling in pure Python
# load the required libraries
import statsmodels.api as sm
import pandas as pd
# Fit linear regression model to iris coming from R
X = r.iris[['Sepal.Width','Petal.Length','Petal.Width']]
y = r.iris['Sepal.Length']
X = sm.add_constant(X)
py_lm_fit = sm.OLS(y, X).fit()
#print regression results
print(py_lm_fit.summary())
#> OLS Regression Results
#> ==============================================================================
#> Dep. Variable: Sepal.Length R-squared: 0.859
#> Model: OLS Adj. R-squared: 0.856
#> Method: Least Squares F-statistic: 295.5
#> Date: Fri, 31 May 2024 Prob (F-statistic): 8.59e-62
#> Time: 08:25:17 Log-Likelihood: -37.321
#> No. Observations: 150 AIC: 82.64
#> Df Residuals: 146 BIC: 94.69
#> Df Model: 3
#> Covariance Type: nonrobust
#> ================================================================================
#> coef std err t P>|t| [0.025 0.975]
#> --------------------------------------------------------------------------------
#> const 1.8560 0.251 7.401 0.000 1.360 2.352
#> Sepal.Width 0.6508 0.067 9.765 0.000 0.519 0.783
#> Petal.Length 0.7091 0.057 12.502 0.000 0.597 0.821
#> Petal.Width -0.5565 0.128 -4.363 0.000 -0.809 -0.304
#> ==============================================================================
#> Omnibus: 0.345 Durbin-Watson: 2.060
#> Prob(Omnibus): 0.842 Jarque-Bera (JB): 0.504
#> Skew: 0.007 Prob(JB): 0.777
#> Kurtosis: 2.716 Cond. No. 54.7
#> ==============================================================================
#>
#> Notes:
#> [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Modelling in R with Python libraries
#> <class 'statsmodels.iolib.summary.Summary'>
#> """
#> OLS Regression Results
#> =======================================================================================
#> Dep. Variable: y R-squared (uncentered): 0.996
#> Model: OLS Adj. R-squared (uncentered): 0.996
#> Method: Least Squares F-statistic: 1.284e+04
#> Date: Fri, 31 May 2024 Prob (F-statistic): 1.33e-177
#> Time: 08:25:18 Log-Likelihood: -61.215
#> No. Observations: 150 AIC: 128.4
#> Df Residuals: 147 BIC: 137.5
#> Df Model: 3
#> Covariance Type: nonrobust
#> ================================================================================
#> coef std err t P>|t| [0.025 0.975]
#> --------------------------------------------------------------------------------
#> Sepal.Width 1.1211 0.024 47.658 0.000 1.075 1.168
#> Petal.Length 0.9235 0.057 16.205 0.000 0.811 1.036
#> Petal.Width -0.8957 0.139 -6.439 0.000 -1.171 -0.621
#> ==============================================================================
#> Omnibus: 0.421 Durbin-Watson: 2.007
#> Prob(Omnibus): 0.810 Jarque-Bera (JB): 0.570
#> Skew: 0.026 Prob(JB): 0.752
#> Kurtosis: 2.703 Cond. No. 26.0
#> ==============================================================================
#>
#> Notes:
#> [1] R² is computed without centering (uncentered) since the model does not contain a constant.
#> [2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
#> """
renv
helps with managing packages in R, ensuring reproducibility, and making your work easier to share.reticulate
allows you to use python in R and combine the strengths of both languages.Questions?
MLBA 2024