Today, I am going to write something that I have been playing around with, and found to be extremely useful and fun: deep clojure and python interop. This is done via libpython-clj. You may find the detailed documentation here. The library is but a small part of a large project called scicloj that aims to expand the use of clojure in data science.
Let us start with the dependencies
{:deps {clj-python/libpython-clj {:mvn/version "2.024"}}}
Next, the namespace
(ns cl-py (:require [libpython-clj2.require :refer [require-python]]
[libpython-clj2.python :refer [py. py.. py.-] :as py]))
and global imports.
(require-python '[numpy :as np])
(require-python '[pandas :as pd])
(require-python '[matplotlib.pyplot :as plt])
Let me start with a simple matplotlib visualization example. I am going to use yfinance to use [Yahoo Finance API] to download the market data for a ticker then visualize it as a time-series. The data will be ingested as a pandas dataframe.
(py/from-import yfinance download)
(plt/figure :figsize [12 4])
(-> (download "AAPL" :start "2012-01-01")
(py/get-attr :Open)
(py/call-attr :plot))
(plt/savefig "result.png")
#'cl-py/download
Figure(1200x400)
AxesSubplot(0.125,0.2;0.775x0.68)
In the following example, I am going to pull passenger data from Istanbul Municipality Data Portal. The specific dataset is about number of passengers taking the sea-route in the year 2022. The columns are ‘year’, 'month’, 'company’, 'station’ and 'passenger’. I am going to display the monthly sums in order.
(let [months ["jan" "feb" "mar" "apr" "may" "jun" "jul" "aug" "sep" "oct" "nov" "dec"]
passengers (as-> "https://data.ibb.gov.tr/dataset/20f33ff0-1ab3-4378-9998-486e28242f48/resource/6fbdd928-8c37-43a4-8e6a-ba0fa7f767fb/download/istanbul-deniz-iskeleleri-yolcu-saylar.csv" $
(pd/read_csv $ :encoding "iso8859-15")
(py/call-attr $ :to_numpy)
(py/->jvm $)
(filter (fn [x] (= (x 0) 2022)) $) ;; select year 2022
(map (fn [x] {(x 1) (x 4)}) $) ;; select month and passenger number
(apply merge-with + $) ;; monthly sums
(sort $) ;; sort wrt month
(map #(% 1) $))] ;; get monthly counts only
(plt/figure :figsize [12 6])
(plt/bar months passengers)
(plt/savefig "passengers.png"))
In our next example, I am going to work with the Olivetti Faces Dataset. I am going to compute the eigen-face of a given random person.
(py/from-import sklearn.datasets fetch_olivetti_faces)
(py/from-import sklearn.decomposition PCA)
(def faces (fetch_olivetti_faces :data_home "/home/kaygun/local/data/scikit_learn_data/"))
(let [N (* 10 (rand-int 40))
X (-> faces
(py/get-attr :data)
(py/get-item (range N (+ N 10)))
(py/call-attr :transpose))
Y (-> (PCA :n_components 1)
(py/call-attr :fit_transform X)
(py/call-attr :reshape [64 64]))]
(plt/figure :figsize [4 4])
(plt/imshow Y :cmap "gray_r")
(plt/savefig "eigen-face.png"))
#'cl-py/fetch_olivetti_faces
#'cl-py/PCA
#'cl-py/faces
In our next example, I am going to construct a Decision Tree Model using scikit-learn on the Iris dataset. First, I’ll split the dataset into train and test datasets, and after I trained the model on the train set, I’ll show the confusion matrix on the test dataset.
(py/from-import sklearn.datasets load_iris)
(py/from-import sklearn.model_selection train_test_split)
(py/from-import sklearn.tree DecisionTreeClassifier)
(py/from-import sklearn.metrics confusion_matrix)
(def iris (load_iris))
(let [X (py/get-attr iris :data)
y (py/get-attr iris :target)
[X-train X-test y-train y-test] (train_test_split X y :test_size 0.2)
model (DecisionTreeClassifier)]
(py/call-attr model :fit X-train y-train)
(->> X-test (py/call-attr model :predict) (confusion_matrix y-test)))
#'cl-py/load_iris
#'cl-py/train_test_split
#'cl-py/DecisionTreeClassifier
#'cl-py/confusion_matrix
#'cl-py/iris
[[ 9 0 0]
[ 0 11 1]
[ 0 1 8]]
In our next example, I am going to write a support vector classifier using scikitlearn on the iris dataset.
(py/from-import sklearn.svm SVC)
(let [X (py/get-attr iris :data)
y (py/get-attr iris :target)
[X-train X-test y-train y-test] (train_test_split X y :test_size 0.2)
model (SVC :max_iter 1000)]
(py/call-attr model :fit X-train y-train)
(->> X-test (py/call-attr model :predict) (confusion_matrix y-test)))
#'cl-py/SVC
[[ 8 0 0]
[ 0 17 0]
[ 0 0 5]]