Unlike English, Turkish has a very regular orthography especially when it comes to hyphenation rules. Today, I am going to write a common lisp program that gives you the proper Turkish hyphenation of a Turkish word. Back in the day, I wrote the original in C for ISO8859-9 encoding. I should extend it for unicode, but dealing with unicode in C feels like cleaning hair from shower drain.
First, I need a function which would tell us whether a given character is a vowel or a consonant:
(let ((vowels '(#\a #\â #\e #\ı #\i #\o #\ö #\u #\ü #\û)))
(defun vowelp (ch)
(member ch vowels)))
VOWELP
Next, the function that hypenates a given word:
(defun hyphen (raw)
(let ((w (format nil "~a " raw))
res flag dash)
(dotimes (i (length raw))
(if (vowelp (elt w i))
(setf flag nil
dash (some #'vowelp (subseq w (1+ i) (+ i 3))))
(when (not (or (setf flag (not flag))
(setf dash (not (vowelp (elt w (1+ i)))))))
(push #\- res)))
(push (elt w i) res)
(when dash (push #\- res))
(setf dash nil))
(ppcre:split #\- (concatenate 'string (reverse res)))))
HYPHEN
And few tests:
(mapcar #'hyphen '("işkillendim" "ağrılarımsa" "erkekle" "tarımsal" "yap" "üre" "tank" "ionya" "çekoslavakyalılaştıramadıklarımızdanmışcasınaymışsa"))
((iş kil len dim) (ağ rı la rım sa) (er kek le) (ta rım sal) (yap) (ü re)
(tank) (i on ya)
(çe kos la vak ya lı laş tı ra ma dık la rı mız dan mış ca sı nay mış sa))