On word boundaries

We've just had an interesting discussion with rxi about how should ctrl+right(next-word-boundary) and ctrl+left (previous-word-boundary) work in code editors. Here are some musings from that discussion.

What others do

I'll be using | to denote where caret would end up in given direction.

Sublime does this:

  • void|*| alloc|(intptr_t| size|)| {| (to right; 7 steps)
  • |void* |alloc|(|intptr_t |size) |{ (to left; 6 steps)

VSCode does this:

  • void|*| alloc|(intptr_t| size|)| {| (to right; 7 steps)
  • |void|* |alloc|(|intptr_t |size|) |{ (to left; 8 steps)

Try it yourself and with every ctrl+right try to guess where the caret would end up. I had quite a bad score. I had to do it step by step to actually construct the examples. Yes, it's 7 steps, but if I can't predict I will most likely overshoot and will have to compensate by additional hits.

I remember times in UX where people were using "number of clicks to get to goal" as the main metric. Number of steps is irrelevant if the user feels lost most of the route. It's better to have more steps but have user reassured that he's on right path. That way they don't even have to remember "where to click" he can simply use the same logic they used before.

I feel this is the same problem: It's less important how many steps I have to do if I know what the next step is.

Both implementations probably started somewhere with Unicode word-boundary algorithm, so trying to mimic classic word-processors. Code is different, a "word" in code is a token. Tokens use symbol characters for brevity, but they bear the same meaning as a word would:

Void pointer alloc start arguments intptr_t size end arguments start block.

My approach

My approach is always very granular and it stops at every start and end of lexer-provided token. For the same example it would do this (regardless of direction):

  • void|*| |alloc|(|intptr_t| |size|)| |{| (11 steps)

It simply is exactly where you'd see the syntax highlighting color be changed, so it's very easy to predict. 11 steps is double of what other editors do.


There are two things that can improve the approach:

  • Ignoring whitespace.
  • Direction.

Ignoring whitespace fits with idea of a token in many languages where whitespace is lexed-out and ignored in general. We can simply consider whitespace as continuation of a token in given direction. Doing that we'd end up with following two results (depending on direction):

  • void|* |alloc|(|intptr_t |size|) |{| (8 steps)
  • |void|*| alloc|(|intptr_t| size|)| { (8 steps)

Not only it is symmetrical (same number of steps regardless of direction) but it also stops at every token.


Now the question is which direction should do which of the two steps. One set of steps stops at starts, the other stops at ends. Other code editors seem to associate right with ends more than anything. But I don't think that's the correct approach.

I associate right with next and that means I expect to end up at start of something next. This, however, might not be the correct approach for everybody. People map movement differently, especially nowadays.

These are basically two basic mental models: moving the paper vs. moving the viewport. Touch screen devices basically flipped us from viewport-centered model to paper-centered. You may look at it as a move from abstract to physical.

Touch-screen devices, physical world:

  • If you want to go to next page, you'd swipe left; move paper left.
  • If you want to continue reading, you'd swipe up; move paper up; turn right page in a book to left.

UIs, mouse, scrollbars, keyboards, old-school touch-pads, abstract world:

  • If you want to go to next page, you'd click an arrow to right; drag scrollbar to right; swipe on touch-pad to right; hit right key; you move the viewport to right.
  • If you want to continue reading, you'd click an arrow down; drag scrollbar down; swipe on touch-pad down; hit down key; you move the viewport down.

There are also other situations where flip to my subjective preference might be valid:

  • On Mac there's no forward-delete key. You always delete backwards with backspace key. So if your intent is to delete, then you'd prefer the "ends" associated with "right".
  • There are cultural differences, for example cultures with right-to-left languages might completely disagree (even for working with left-to-right text). But even within left-to-right western world, not all people would agree.

Hence, there needs to be four commands:

  • move-to-next-word-start
  • move-to-previous-word-start
  • move-to-next-word-end
  • move-to-previous-word-end