The Kitchen Sink and Other Oddities

Atabey Kaygun

Clojure/Python Interop Examples

Description of the problem

Today, I am going to write something that I have been playing around with, and found to be extremely useful and fun: deep clojure and python interop. This is done via libpython-clj. You may find the detailed documentation here. The library is but a small part of a large project called scicloj that aims to expand the use of clojure in data science.

The dependencies and the namespace

Let us start with the dependencies

{:deps {clj-python/libpython-clj {:mvn/version "2.024"}}}

Next, the namespace

(ns cl-py (:require [libpython-clj2.require :refer [require-python]]
                    [libpython-clj2.python :refer [py. py.. py.-] :as py]))

and global imports.

(require-python '[numpy :as np])
(require-python '[pandas :as pd])
(require-python '[matplotlib.pyplot :as plt])

A Simple Visualization Example

Let me start with a simple matplotlib visualization example. I am going to use yfinance to use [Yahoo Finance API] to download the market data for a ticker then visualize it as a time-series. The data will be ingested as a pandas dataframe.

(py/from-import yfinance download)

(plt/figure :figsize [12 4])
(-> (download "AAPL" :start "2012-01-01")
    (py/get-attr :Open) 
    (py/call-attr :plot))
(plt/savefig "result.png")

#'cl-py/download
Figure(1200x400)
AxesSubplot(0.125,0.2;0.775x0.68)

A More Complicated Visualization

In the following example, I am going to pull passenger data from Istanbul Municipality Data Portal. The specific dataset is about number of passengers taking the sea-route in the year 2022. The columns are ‘year’, 'month’, 'company’, 'station’ and 'passenger’. I am going to display the monthly sums in order.

(let [months ["jan" "feb" "mar" "apr" "may" "jun" "jul" "aug" "sep" "oct" "nov" "dec"]
      passengers (as-> "https://data.ibb.gov.tr/dataset/20f33ff0-1ab3-4378-9998-486e28242f48/resource/6fbdd928-8c37-43a4-8e6a-ba0fa7f767fb/download/istanbul-deniz-iskeleleri-yolcu-saylar.csv" $
                       (pd/read_csv $ :encoding "iso8859-15")
                       (py/call-attr $ :to_numpy)
                       (py/->jvm $)
                       (filter (fn [x] (= (x 0) 2022)) $) ;; select year 2022
                       (map (fn [x] {(x 1) (x 4)}) $)     ;; select month and passenger number
                       (apply merge-with + $)             ;; monthly sums
                       (sort $)                           ;; sort wrt month
                       (map #(% 1) $))]                   ;; get monthly counts only
    (plt/figure :figsize [12 6])
    (plt/bar months passengers)
    (plt/savefig "passengers.png"))

An Image Processing Example

In our next example, I am going to work with the Olivetti Faces Dataset. I am going to compute the eigen-face of a given random person.

(py/from-import sklearn.datasets fetch_olivetti_faces)
(py/from-import sklearn.decomposition PCA)

(def faces (fetch_olivetti_faces :data_home "/home/kaygun/local/data/scikit_learn_data/"))

(let [N (* 10 (rand-int 40))
      X (-> faces
            (py/get-attr :data)
            (py/get-item (range N (+ N 10)))
            (py/call-attr :transpose))      
      Y (-> (PCA :n_components 1)
            (py/call-attr :fit_transform X)
            (py/call-attr :reshape [64 64]))]
  (plt/figure :figsize [4 4])
  (plt/imshow Y :cmap "gray_r")
  (plt/savefig "eigen-face.png"))

#'cl-py/fetch_olivetti_faces
#'cl-py/PCA
#'cl-py/faces

A Decision Tree Model

In our next example, I am going to construct a Decision Tree Model using scikit-learn on the Iris dataset. First, I’ll split the dataset into train and test datasets, and after I trained the model on the train set, I’ll show the confusion matrix on the test dataset.

(py/from-import sklearn.datasets load_iris)
(py/from-import sklearn.model_selection train_test_split)
(py/from-import sklearn.tree DecisionTreeClassifier)
(py/from-import sklearn.metrics confusion_matrix)

(def iris (load_iris))

(let [X (py/get-attr iris :data)
      y (py/get-attr iris :target)
      [X-train X-test y-train y-test] (train_test_split X y :test_size 0.2)
      model (DecisionTreeClassifier)]
  (py/call-attr model :fit X-train y-train)
  (->> X-test (py/call-attr model :predict) (confusion_matrix y-test)))

#'cl-py/load_iris
#'cl-py/train_test_split
#'cl-py/DecisionTreeClassifier
#'cl-py/confusion_matrix
#'cl-py/iris
[[ 9  0  0]
 [ 0 11  1]
 [ 0  1  8]]

A Support Vector Classifier

In our next example, I am going to write a support vector classifier using scikitlearn on the iris dataset.

(py/from-import sklearn.svm SVC)

(let [X (py/get-attr iris :data)
      y (py/get-attr iris :target)
      [X-train X-test y-train y-test] (train_test_split X y :test_size 0.2)
      model (SVC :max_iter 1000)]
  (py/call-attr model :fit X-train y-train)
  (->> X-test (py/call-attr model :predict) (confusion_matrix y-test)))

#'cl-py/SVC
[[ 8  0  0]
 [ 0 17  0]
 [ 0  0  5]]