Morse and Remorse with Raku

by Arne Sommer

Morse and Remorse with Raku

[43] Published 20. November 2019

This is my response to the Perl Weekly Challenge #35.

Challenge 35.1

Contributed by Paul Johnson

Write a program to encode text into binary encoded morse code.

Pay attention to any changes which might need to be made to the text to make it valid morse code.

Morse code consists of dots, dashes and gaps. It can be encoded in binary in the following fashion:

dot: 1
dash: 111
intra-character gap: 0
character gap: 000
word gap: 0000000
An intra-character gap is inserted between the dots and dashes in a character.

I'll start with setting up the data structures to convert between the formats. The wiki page (given in the challenge) has the mappings between characters and morse code, so we start with that:

File: lib/BinaryMorse.pm6 (partial)
unit module BinaryMorse;

our %morse =
(
   A  => '.-',
   B  => '-...',
   C  => '-.-.',
   D  => '-..',
   E  => '.',
   F  => '..-.',
   G  => '--.',
   H  => '....',
   I  => '..',
   J  => '.---',
   K  => '-.-',
   L  => '.-..',
   M  => '--',
   N  => '-.',
   O  => '---',
   P  => '.--.',
   Q  => '--.-',
   R  => '.-.',
   S  => '...',
   T  => '-',
   U  => '..-',
   V  => '...-',
   W  => '.--',
   X  => '-..-',
   Y  => '-.--',
   Z  => '--..',
   0  => '-----',
   1  => '.----',
   2  => '..---',
   3  => '...--',
   4  => '....-',
   5  => '.....',
   6  => '-....',
   7  => '--...',
   8  => '---..',
   9  => '----.',
  '.' => '.-.-.',
  ',' => '--..--'
);

I have chosen to use upper case letters only. I have used the basic latin (or english) letters only, the digits and the period and comma. The wiki page specifies several other characters, and implementing them is as simple as adding one line to this hash declaration for each.

We don't really need the reverse lookup to answer the challenge, but here it is anyway:

File: lib/BinaryMorse.pm6 (partial)
our %remorse = %morse.antipairs;

The «antipairs» method takes all the pairs (of keys and values), swaps them around, and gives us back the lot (as a sequence of pair objects. Assigning it to a hash coerces it to a hash). See docs.raku.org/routine/antipairs for more details about «antipairs».

The «our» keyword gives a variable that we can access from outside the module. See docs.raku.org/syntax/our for details about «our».

The next step is the mapping between characters and the binary encoded morse code. I chose to use the actual morse code (as done above), as that is shorter (and easier to get right). The binary encoded morse code can easily be calculated given the morse code:

File: lib/BinaryMorse.pm6 (partial)
our %binary-morse;

for %morse.kv -> $char, $value
{
  %binary-morse{$char}
    = $value.comb.map({ $_ eq '.' ?? '1' !! '111' }).join('0');
}

The loop iterates over the characters. Let us e.g. look at «P», where «$char» has the value «P» and «$value» is «.--.». The «comb» method split the value into a list of single characters; «.», «-», «-» and «.». The «map» method returns «1» if the character was a dot («.»), and «111» if it was a dash («-»). And finally the «join» glues the individual characters (now in binary form) together, with a «0» between them. («P» gives us «10111011101».)

This time we do need the reverse lookup (for part 2 of the challenge), and it is just as easy as the first time:

File: lib/BinaryMorse.pm6 (partial)
our %binary-remorse = %binary-morse.antipairs;

Then a test program to show that it works, before we can go on with the challenge. It iterates over all three data structures (the characters, the morse code, and the binary encoded morse code) showing the other two types for each value:

File: bem-test
use lib "lib";

use BinaryMorse;

for %BinaryMorse::morse.keys.sort -> $key
{
  say "$key | { (%BinaryMorse::morse{$key}).fmt('%-6s') } | ",
      "%BinaryMorse::binary-morse{$key}";
}

say "";

for %BinaryMorse::remorse.keys.sort -> $key
{
  say "{ $key.fmt('%-6s') } | { %BinaryMorse::remorse{$key} } | ",
      "{ %BinaryMorse::binary-morse{%BinaryMorse::remorse{$key}} }";
}

say "";

for %BinaryMorse::binary-remorse.keys.sort -> $key
{
  say "{ $key.fmt('%-20s') } | { (%BinaryMorse::binary-remorse{$key}) } | "
      "%BinaryMorse::morse{%BinaryMorse::binary-remorse{$key}}";
}

Running it:

$ raku bem-test
, | --..-- | 1110111010101110111
. | .-.-.  | 1011101011101
0 | -----  | 1110111011101110111
1 | .----  | 10111011101110111
2 | ..---  | 101011101110111
3 | ...--  | 1010101110111
4 | ....-  | 10101010111
5 | .....  | 101010101
6 | -....  | 11101010101
7 | --...  | 1110111010101
8 | ---..  | 111011101110101
9 | ----.  | 11101110111011101
A | .-     | 10111
B | -...   | 111010101
C | -.-.   | 11101011101
D | -..    | 1110101
E | .      | 1
F | ..-.   | 101011101
G | --.    | 111011101
H | ....   | 1010101
I | ..     | 101
J | .---   | 1011101110111
K | -.-    | 111010111
L | .-..   | 101110101
M | --     | 1110111
N | -.     | 11101
O | ---    | 11101110111
P | .--.   | 10111011101
Q | --.-   | 1110111010111
R | .-.    | 1011101
S | ...    | 10101
T | -      | 111
U | ..-    | 1010111
V | ...-   | 101010111
W | .--    | 101110111
X | -..-   | 11101010111
Y | -.--   | 1110101110111
Z | --..   | 11101110101

-      | T | 111
--     | M | 1110111
---    | O | 11101110111
-----  | 0 | 1110111011101110111
----.  | 9 | 11101110111011101
---..  | 8 | 111011101110101
--.    | G | 111011101
--.-   | Q | 1110111010111
--..   | Z | 11101110101
--..-- | , | 1110111010101110111
--...  | 7 | 1110111010101
-.     | N | 11101
-.-    | K | 111010111
-.--   | Y | 1110101110111
-.-.   | C | 11101011101
-..    | D | 1110101
-..-   | X | 11101010111
-...   | B | 111010101
-....  | 6 | 11101010101
.      | E | 1
.-     | A | 10111
.--    | W | 101110111
.---   | J | 1011101110111
.----  | 1 | 10111011101110111
.--.   | P | 10111011101
.-.    | R | 1011101
.-.-.  | . | 1011101011101
.-..   | L | 101110101
..     | I | 101
..-    | U | 1010111
..---  | 2 | 101011101110111
..-.   | F | 101011101
...    | S | 10101
...-   | V | 101010111
...--  | 3 | 1010101110111
....   | H | 1010101
....-  | 4 | 10101010111
.....  | 5 | 101010101

1                    | E | .
101                  | I | ..
10101                | S | ...
1010101              | H | ....
101010101            | 5 | .....
10101010111          | 4 | ....-
101010111            | V | ...-
1010101110111        | 3 | ...--
1010111              | U | ..-
101011101            | F | ..-.
101011101110111      | 2 | ..---
10111                | A | .-
1011101              | R | .-.
101110101            | L | .-..
1011101011101        | . | .-.-.
101110111            | W | .--
10111011101          | P | .--.
1011101110111        | J | .---
10111011101110111    | 1 | .----
111                  | T | -
11101                | N | -.
1110101              | D | -..
111010101            | B | -...
11101010101          | 6 | -....
11101010111          | X | -..-
111010111            | K | -.-
11101011101          | C | -.-.
1110101110111        | Y | -.--
1110111              | M | --
111011101            | G | --.
11101110101          | Z | --..
1110111010101        | 7 | --...
1110111010101110111  | , | --..--
1110111010111        | Q | --.-
11101110111          | O | ---
111011101110101      | 8 | ---..
11101110111011101    | 9 | ----.
1110111011101110111  | 0 | -----

The mappings look fine.

Now on to the challenge itself. Let us start with a slightly modified version of the module, with the changes highlighted:

File: lib/BinaryMorse2.pm6 (partial)
unit module BinaryMorse2;

my %morse =
(
   A  => '.-',
   B  => '-...',
   C  => '-.-.',
   D  => '-..',
   E  => '.',
   F  => '..-.',
   G  => '--.',
   H  => '....',
   I  => '..',
   J  => '.---',
   K  => '-.-',
   L  => '.-..',
   M  => '--',
   N  => '-.',
   O  => '---',
   P  => '.--.',
   Q  => '--.-',
   R  => '.-.',
   S  => '...',
   T  => '-',
   U  => '..-',
   V  => '...-',
   W  => '.--',
   X  => '-..-',
   Y  => '-.--',
   Z  => '--..',
   0  => '-----',
   1  => '.----',
   2  => '..---',
   3  => '...--',
   4  => '....-',
   5  => '.....',
   6  => '-....',
   7  => '--...',
   8  => '---..',
   9  => '----.',
  '.' => '.-.-.',
  ',' => '--..--',
  '?' => '..--..'
);

my %remorse = %morse.antipairs;

my %binary-morse;

for %morse.kv -> $char, $value
{
  %binary-morse{$char}
    = $value.comb.map({ $_ eq '.' ?? '1' !! '111' }).join('0');
}

my %binary-remorse = %binary-morse.antipairs;

I have made all the data structure private (not accessible outside of the module), by replacing «our» with «my». The additional morse character «?» has been added as I need something to represent unknown values:

File: lib/BinaryMorse2.pm6 (partial)
constant unknown = '?';

We can now write the encoding procedure, that is turning readable text into binary morse code:

File: lib/BinaryMorse2.pm6 (partial)
our sub morsify (Str $text)
{
  my @words;                             # [1]
  
  for $text.split(/\s+/) -> $word        # [2]
  {
    my @new-word;                        # [3]
    for $word.comb -> $character         # [4]
    {
      @new-word.push(%binary-morse{$character}    //  # [5]
                     %binary-morse{$character.uc} //  # [5a]
                     %binary-morse{unknown} );        # [5b]
    }
    @words.push(@new-word.join('000')); # [6]
  }

  return @words.join("0000000");        # [7]
}

[1] We collect the encoded words in this array.

[2] Get the individual words, and iterate over them.

[3] We collect the individual characters of the new encoded word in this array.

[4] Get the individual characters, and iterate over them.

[5] Save the encoded character. If that fails, try turning it to upper case (assuming it is in lower case). if that also fail, bail out with the aforementioned «?».

[6] Merge the characters into a word, separated by the character separator «000», and add it to the word array. The array is cleared in [3], ready for a new word.

[7] Return the encoded words as a single string, separated by the word separator «0000000».

Note that I had to decide what consitutes a word, as the challenge and the wiki page didn't say. I decided to use one or more space characters (including tabs), and the «\s+» regex does the trick. (This assumption is actually based on personal experience, as I have served as a CW Operator in the Norwegian army.)

I'll wait with the program using the module until I have presented the decoder.

Challenge 35.2

Contributed by Paul Johnson

Write a program to decode binary morse code.

Consider how it might be possible to recover from badly formed morse code.
  1. by splitting the morse code on gaps
  2. without looking further than one digit ahead

A custom type that only allows binary morse code is useful, so here it is:

File: lib/BinaryMorse2.pm6 (partial)
subset BinaryMorse of Str where * ~~ /^<[01]>*$/;

Then the decoder, which looks somewhat similar to the encoder:

File: lib/BinaryMorse2.pm6 (partial)
our sub demorsify (BinaryMorse $text)     # [1]
{
  my @words;

  for $text.split("0000000") -> $word     # [2]
  {
    my $new-word = "";
    for $word.split("000") -> $character  # [3]
    {
      $new-word ~= %binary-remorse{$character} //unknown;  # [4]
    }
    @words.push($new-word);
  }
  return @words.join(" ");               # [5]
}

[1] This will fail if we try to pass it noncompliant characters (anything besides combinations of «0» and «1»).

[2] The word separator is «0000000», so we split on that to get the words.

[3] The character separator is «000», so we split on that to get the individual characters.

[4] Decode the character. If that fail, use the default «?»

[5] Return the decoded string, with a space between the words.

Then a program using the module:

File: bem
use lib "lib";

use BinaryMorse2;

multi sub MAIN (BinaryMorse2::BinaryMorse $binary-morse)
{
  say BinaryMorse2::demorsify($binary-morse);
}

multi sub MAIN (Str $text)
{
  say BinaryMorse2::morsify($text);
}

The «multi»s is set up so that if we specify a string consisting of «0» and «1» only we decode it. If not, it is encoded.

Running it:

$ raku bem ABC
1011100011101010100011101011101

$ raku bem abc
1011100011101010100011101011101

$ raku bem 1011100011101010100011101011101
ABC

Lower case letters are upper cased, and the whole thing round trips.

Manual round tripping is cumbersome, but we can let the program do it for us:

File: bem2
use lib "lib";

use BinaryMorse2;

multi sub MAIN (BinaryMorse2::BinaryMorse $binary-morse)
{
  say BinaryMorse2::demorsify($binary-morse);
}

multi sub MAIN (Str $text, :$roundtrip)
{
  my $encoded = BinaryMorse2::morsify($text);
  say $encoded;
  say BinaryMorse2::demorsify($encoded) if $roundtrip;
}

Running it:

$ raku bem2 abc
1011100011101010100011101011101

$ raku bem2 --roundtrip abc
1011100011101010100011101011101
ABC

$ raku bem2 --roundtrip "This is it"
1110001010101000101000101010000000101000101010000000101000111
THIS IS IT

Let us have a go at illegal characters:

$ raku bem2 --roundtrip "Mäd Süß"
11101110001010111011101010001110101000000010101000101011101110101000101011101110101
M?D S??

Corrupt data in the binary encoded morse code is also handled:

$ raku bem2 1111111111111111111111111111111111111111111111
?

$ raku bem2 1111111111111
?

$ raku bem2 111
T

$ raku bem2 1110000111
T?

Wrapping Up

Then we can have a look at the gotchas presented in the challenges.

«Pay attention to any changes which might need to be made to the text to make it valid morse code.» Morse code has no distinction of case, and I have chosen to use the upper case versions of the letters only. Any lower case letters are converted to upper case. Unknown characters are replaced by a «?».

«Consider how it might be possible to recover from badly formed morse code. a/ by splitting the morse code on gaps.» I have done just that. Unknown sequences are replaced with a «?» in the decoded output.

«b/without looking further than one digit ahead». My code doesn't look ahead at all, but uses «split» on the boundaries between words and characters. (I don't split the characters into dashes and dots, as we have a hash lookup for just that.)

Wrapping Up, Binary Edition

A possible problem with the expression «binary encoded morse code» is the very first word: «binary». «1» and «0» certainly are the binary digits, but we should perhaps convert the result to binary data?

$ raku bem2 ABC
1011100011101010100011101011101

$ raku
> "1011100011101010100011101011101".parse-base(2);
1551189853

Converting binary digits to binary data actually requires converting the sequence to a decimal number. Then we could apply «polymod» to split it into 8-bit chunks, like this:

> "1011100011101010100011101011101".parse-base(2).polymod(256 xx 20)
(93 71 117 92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)

See docs.raku.org/routine/polymod for more information about «polymod».

Note that we had to specify a list of enough 256's to make the last non-zero value smaller than 256. The unfortunate result (in this case) is that we get that many values, as padded zeros at the end. But we can get rid of them, like this:

> "1011100011101010100011101011101".parse-base(2).polymod(256 xx 20).grep(* > 0)
(93 71 117 92)

Except that it would also get rid of zeroes inside the list of values, where they would be perfectly legal (and have actual meaning). Fixing that isn't trivial with «grep», so I'll do this another way.

But before we do that, I'd like to point out that this list of values is the reverse representation of the initial value, so we would have to flip it to get a correct representation. (This will turn the trailing zeroes into leading zeroes, where they are harmless - and thus prove this point.)

> "1011100011101010100011101011101".parse-base(2).polymod(256 xx 20).reverse
(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 92 117 71 93)

Now let us do this as a program:

File: bem3
use lib "lib";

use BinaryMorse2;

multi sub MAIN (BinaryMorse2::BinaryMorse $binary-morse)
{
  say BinaryMorse2::demorsify($binary-morse);
}

multi sub MAIN (:$binary where $binary.IO.f && $binary.IO.r)
{
  my $fh = open $binary, :bin;
  my $data = $fh.read;
  $fh.close;

  my $result = "";
  $result ~= $_.base(2).fmt("%08d") for @($data);

  $result.substr-rw(0,1) = "" while $result.chars && $result.substr(0,1) eq "0";
  
  say BinaryMorse2::demorsify($result);
}

multi sub MAIN (Str $text, :$roundtrip, :$binary)
{
  if $binary
  {
    if my $fh = open $binary, :w, :bin
    {
      my @values = BinaryMorse2::morsify($text).parse-base(2).polymod(256 xx 200).reverse;
      @values.shift while @values && @values[0] == 0;
      die "Set a higher polymod value" if @values[0] > 255;
      $fh.write: Blob.new(@values);
      $fh.close;

      MAIN(:$binary) if $roundtrip;
    }
    else
    {
      die "Unable to open file $binary";
    }
  }
  
  else
  {
    my $encoded = BinaryMorse2::morsify($text);
    say $encoded;
    say BinaryMorse2::demorsify($encoded) if $roundtrip;
  }
}

I have added a «--binary» command line option. It takes a file name as argument, and the binary data is saved or read from that file.

The second «multi MAIN» is new, and it handles the case of reading the binary data from a file. Note that we have to read and write the data in binary mode (:bin), so that the Unicode layer doesn't interfere (and mess up).

Storing binary data (which really is (8 bit) bytes) is done with a Blob (Binary Large OBject). See docs.raku.org/type/Blob for details.

And finnaly, note the (recursive) call to «MAIN» in the last «multi MAIN». The «MAIN» function is executed automatically, but we can call it explicitlty as well. As done her to execute the second «multi MAIN».

Running it:

$ raku bem3 --binary=binary.bin ABCaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasda

$raku bem3 --binary=binary.bin
ABCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAASDA

Let us take a look at the binary data, with my «hex-dump» program from Challenge 24:

$ raku ../Challenge24/hex-dump binary.bin 
01 71 D5 1D 74 5C 5C 5C 5C 5C | ░q░░t\\\\\
5C 5C 5C 5C 5C 5C 5C 5C 5C 5C | \\\\\\\\\\
5C 5C 5C 5C 5C 5C 5C 5C 5C 5C | \\\\\\\\\\
5C 5C 5C 5C 5C 5C 5C 5C 5C 5C | \\\\\\\\\\
5C 5C 5C 5C 5C 5C 5C 5C 5C 5C | \\\\\\\\\\
5C 5C 5C 5C 5C 5C 5C 5C 5C 5C | \\\\\\\\\\
5C 5C 5C 5C 5C 5C 54 75 17    | \\\\\\Tu░

This is binary data, so isn't very interesting. Turning the values into decimal wouldn't help much, so 'll refrain from adding that feature to the «hex-dump» program.

Automatic round tripping is also supported:

$ raku bem3 --binary=binary.bin --roundtrip AAAAA
AAAAA

$raku bem3 --binary=binary.bin --roundtrip AAAAAe
AAAAAE

The hex dump is different this time:

../Challenge24/hex-dump binary.bin
01 71 71 71 71 71             | ░qqqqq

It does round trip, but I cannot say with absolute certainty that this is the - or rather a - correct binary representation. (Things like Little Endian and Big Endian tend to make thinks like this hard.)

On the other hand, that may not actually matter as the encoder and decoder combination does round trip.

And that's really it.