The Kitchen Sink and Other Oddities

Atabey Kaygun

Turkish Hyphenation in Common Lisp

Description of the problem

Unlike English, Turkish has a very regular orthography especially when it comes to hyphenation rules. Today, I am going to write a common lisp program that gives you the proper Turkish hyphenation of a Turkish word. Back in the day, I wrote the original in C for ISO8859-9 encoding. I should extend it for unicode, but dealing with unicode in C feels like cleaning hair from shower drain.

The code

First, I need a function which would tell us whether a given character is a vowel or a consonant:

(let ((vowels '(#\a #\â #\e #\ı #\i #\o #\ö #\u #\ü #\û)))
  (defun vowelp (ch)
    (member ch vowels)))

VOWELP

Next, the function that hypenates a given word:

(defun hyphen (raw)
  (let ((w (format nil "~a  " raw))
        res flag dash)
    (dotimes (i (length raw))
      (if (vowelp (elt w i))
          (setf flag nil
                dash (some #'vowelp (subseq w (1+ i) (+ i 3))))
          (when (not (or (setf flag (not flag))
                         (setf dash (not (vowelp (elt w (1+ i)))))))
            (push #\- res)))
      (push (elt w i) res)
      (when dash (push #\- res))
      (setf dash nil))
    (ppcre:split #\- (concatenate 'string (reverse res)))))

HYPHEN

And few tests:

(mapcar #'hyphen '("işkillendim" "ağrılarımsa" "erkekle" "tarımsal" "yap" "üre" "tank" "ionya" "çekoslavakyalılaştıramadıklarımızdanmışcasınaymışsa"))

((iş kil len dim) (ağ rı la rım sa) (er kek le) (ta rım sal) (yap) (ü re)
 (tank) (i on ya)
 (çe kos la vak ya lı laş tı ra ma dık la rı mız dan mış ca sı nay mış sa))