[ViewVC] Diff of: cvs/rxvt-unicode/src/perl/matcher

Comparing rxvt-unicode/src/perl/matcher (file contents):
Revision 1.37 by root, Sat Jul 24 09:48:43 2021 UTC vs.
Revision 1.39 by root, Sun Nov 21 19:33:32 2021 UTC

     URxvt.matcher.button:     1
     URxvt.matcher.pattern.1:  \\bwww\\.[\\w-]+\\.[\\w./?&@#-]*[\\w/-]
     URxvt.matcher.pattern.2:  \\B(/\\S+?):(\\d+)(?=:|$)
     URxvt.matcher.launcher.2: gvim +$2 $1
+=head2 Regex encoding/wide character matching
+Urxvt stores all text as unicode, in a special encoding that uses
+one character/code point per column. For various reasons, the regular
+expressions are matched directly against this encoding, which means there are a few things
+you need to keep in mind:
+=over
+=item X resources/command line arguments are locale-encoded
+The regexes taken from the command line or resources will be converted
+from locale encoding to unicode. This can change the number of code points
+per character.
+=item Wide characters are column-padded with C<$urxvt::NOCHAR>
+Wide characters (such as kanji and sometimes tabs) are padded with
+a special character value (C<$urxvt::NOCHAR>). That means that
+constructs such as C<\w> or C<.> will only match part of a character, as
+C<$urxvt::NOCHAR> is not matched by C<\w> and both only match the first
+"column" of a wide character.
+That means you have to incorporate C<$urxvt::NOCHAR> into parts of regexes
+that may match wide characters. For example, to match C<\w+> you might
+want to use C<[\w$urxvt::NOCHAR]+> instead, and to match a single character
+(C<.>) you might want to use C<.$urxvt::NOCHAR*> instead.
+=back
 =cut
 my $url =
    qr{
       (?:https?://|ftp://|news://|mailto:|file://|\bwww\.)
-      [\w\-\@;\/?:&=%\$.+!*\x27,~#]*
+      [\w\-\@;\/?:&=%\$.+!*\x27,~#$urxvt::NOCHAR]*
       (
-         \([\w\-\@;\/?:&=%\$.+!*\x27,~#]*\)| # Allow a pair of matched parentheses
+         \([\w\-\@;\/?:&=%\$.+!*\x27,~#$urxvt::NOCHAR]*\)| # Allow a pair of matched parentheses
          [\w\-\@;\/?:&=%\$+*~]  # exclude some trailing characters (heuristic)
       )+
    }x;
 sub matchlist_key_press {
       }
    }
    my @defaults = ($url);
    my @matchers;
-   for (my $idx = 0; defined (my $res = $self->my_resource ("pattern.$idx") || $defaults[$idx]); $idx++) {
+   for (my $idx = 0; defined (my $res = $self->locale_decode ($self->my_resource ("pattern.$idx")) || $defaults[$idx]); $idx++) {
-      $res = $self->locale_decode ($res);
-      utf8::encode $res;
       my $launcher = $self->my_resource ("launcher.$idx");
       $launcher =~ s/\$&|\$\{&\}/\${0}/g if $launcher;
       my $rend = $self->parse_rend($self->my_resource ("rend.$idx"));
       unshift @matchers, [qr($res)x,$launcher,$rend];
    }

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing rxvt-unicode/src/perl/matcher (file contents): Revision 1.37 by root, Sat Jul 24 09:48:43 2021 UTC vs. Revision 1.39 by root, Sun Nov 21 19:33:32 2021 UTC

Diff Legend

Comparing rxvt-unicode/src/perl/matcher (file contents):
Revision 1.37 by root, Sat Jul 24 09:48:43 2021 UTC vs.
Revision 1.39 by root, Sun Nov 21 19:33:32 2021 UTC