Word segmentation library in Golang


I’ve been into Golang lately, and today I’m glad to announce my second open source project in Golang, following the feature flags API. My second package is all about word segmentation.

What is the word segmentation problem?

Word segmentation is the process of dividing a phrase without spaces back into its constituent parts. For example, consider a phrase like thisisatest. Humans can immediately identify that the correct phrase should be this is a test. But for machines, this is a tricky problem.

An approach to this problem

A basic idea would be to use a dictionary, and then to try to split words if the current chunk of letters is a valid word. But then you run into issues with sentences like peanutbutter that you will split with this approach as pea nut butter instead of peanut butter.

The idea was to take advantage of frequencies of words in a corpus. This is where the concept of a n-gram is used. In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application.

For example, this is an extract of some unigrams in a corpus composed of 1,024,908,267,229 words distributed by the Linguistic Data Consortium.

used 421438139
go 421086358
b 419765694
work 419483948
last 417601616
most 416210411
music 414028837
buy 410780176
data 406908328
make 405084642
them 403000411
should 402028056

Using unigrams and bigrams, we can score an arrangement of words. This is what is done in the score method for example.

Concurrency and channels

This was also a great opportunity for me to work with channels, because some parts of the program can be run in parallel. I’m just starting to work around goroutines and channels, but I really like it!

Take a look at the source code and the documentation on GitHub: github.com/AntoineAugusti/wordsegmentation

Feature flags API in golang


Over the last few months, I’ve been interested in golang (the Go language) but I didn’t know what to build to really try it. Sure, I’ve done the exercises from the online tutorial and I’ve read the awesome website Go by example, but I didn’t have a real use-case yet. Until a few days ago when I decided to build an API related to feature flags!

What are feature flags?

Feature flags let you enable or disable some features of your application, for example when you’re under unexpected traffic or when you want to let some users try a new feature you’ve been working on. They decouple feature release and code deployment, so that you can release features whenever you want, instead of whenever the code happens to ship.

With this package, you can enable the access of a feature for:

  • specific user IDs
  • specific groups
  • a percentage of your user base
  • everyone
  • no one

And you can combine things! You can give access to a feature for users in the group dev or admin and for users 1337 and 42 if you want to.

What I’ve learned

I guess it’s a rather complete project because it involves a storage layer (a key-value store, with bolt), some logic around a simple model (what is a feature? How do we control access to a feature?) and an HTTP layer (with the default HTTP server and gorilla/mux). Moreover I’ve tried to write some tests, and it was really interesting to discover the “Go way” to do it!

Anyway, I’ve learned a lot and I’m fairly happy with the codebase, but if you spot anything that can be improved or that is wrong, please do get in touch with me (GitHub issues and tweets are perfect).

Here is the source code: github.com/AntoineAugusti/feature-flags.