Reply to topic  [ 14 posts ] 
Perl output messed up 
Author Message

Joined: 2007-01-17 05:46:17
Posts: 145
Location: Tokyo, Japan
Hello,

I want to write a macro like the following:
Code:
$res = ""
Begin Perl
   $res = `/usr/bin/perl ~/Desktop/myPerlScript.pl`;
End

New
Insert Text $res


If the perl script "~/Desktop/myPerlScript.pl" has a code like the following:
Code:
#!/usr/bin/perl
binmode (STDOUT, ":utf8");
print "あいうえお";


the result will be a mess (something like "ãÂ￾‚ãÂ￾„ãÂ￾†ãÂ￾ˆãÂ￾Š").

Even if I remove the line 'binmode (STDOUT, ":utf8")', I get an even worse result.

How can I get the expected result?

Thank you in advance.

_________________
Best regards,

Nobumi Iyanaga
Tokyo,
Japan


2008-05-14 04:17:17
Profile WWW
User avatar

Joined: 2007-01-17 03:25:42
Posts: 729
This should work. I only changed utf8 to utf-8:
Code:
#!/usr/bin/perl
binmode (STDOUT, ":utf-8");
print "あいうえお";


2008-05-14 06:02:22
Profile
Official Nisus Person
User avatar

Joined: 2002-07-11 17:14:10
Posts: 4251
Location: San Diego, CA
Hamid wrote:
This should work. I only changed utf8 to utf-8:
Code:
#!/usr/bin/perl
binmode (STDOUT, ":utf-8");
print "あいうえお";

This may be a misleading solution. I don't believe the string "utf-8" is valid to designate an encoding, you should use "utf8" without a dash. To see this try changing the line to:
Code:
binmode (STDOUT, ":xxx");

You should obtain the same results as if you had used "utf-8".

I think the proper thing to do is:
Code:
use utf8;
binnmode (STDOUT, ":utf8");

The first line tells perl the encoding of the actual ".pl" file. The second tells perl to use the UTF-8 encoding for standard output, which NWP expects. Of course, none of this will work if you didn't save your ".pl" file using UTF-8 in the first place.


2008-05-14 14:10:17
Profile WWW
Official Nisus Person
User avatar

Joined: 2002-07-11 17:14:10
Posts: 4251
Location: San Diego, CA
Post 
There's unfortunately something else going on here. It seems the string returned using perl's backtick operator is defaulting to the system encoding. Using just perl from the command line I run into problems. The first script "print.pl" is:
Code:
#!/usr/bin/perl
use utf8;
binmode(STDOUT, ":utf8");
print "あいうえお\n";


The second script which calls the above script is:
Code:
use utf8;
binmode(STDOUT, ":utf8");
$res = `/usr/bin/perl ~/Desktop/print.pl`;
print $res;
print "だい\n";


The output is unfortunately:
Code:
ããããã
だい


The backtick operator is garbling the text. I assume because it isn't respecting the command's stdout encoding. I'm not sure there's a way to fix this without getting complicated and using perl's "system" command and configuring/reading the pipes yourself.


2008-05-14 15:22:29
Profile WWW

Joined: 2007-01-17 05:46:17
Posts: 145
Location: Tokyo, Japan
Post 
Hello Martin,

Thank you for your reply.

martin wrote:
There's unfortunately something else going on here. It seems the string returned using perl's backtick operator is defaulting to the system encoding. Using just perl from the command line I run into problems. The first script "print.pl" is:
Code:
#!/usr/bin/perl
use utf8;
binmode(STDOUT, ":utf8");
print "あいうえお\n";


The second script which calls the above script is:
Code:
use utf8;
binmode(STDOUT, ":utf8");
$res = `/usr/bin/perl ~/Desktop/print.pl`;
print $res;
print "だい\n";


The output is unfortunately:
Code:
ããããã
だい


The backtick operator is garbling the text. I assume because it isn't respecting the command's stdout encoding. I'm not sure there's a way to fix this without getting complicated and using perl's "system" command and configuring/reading the pipes yourself.


I have not yet tested the "system" and pipe/redirection method, but I think the problem is different. Try this code:

Code:
#!/usr/bin/perl

use utf8;

$res = `/usr/bin/perl ~/Desktop/print.pl`;

binmode (STDOUT, ":utf8");
print $res;

#binmode (STDOUT, ":utf8");

print "だい";


The output will be something like:

Code:
ã￾‚ã￾„ã￾†ã￾ˆã￾Š
だい


but if you do:

Code:
#!/usr/bin/perl

use utf8;

$res = `/usr/bin/perl ~/Desktop/print.pl`;

#binmode (STDOUT, ":utf8");
print $res;

binmode (STDOUT, ":utf8");

print "だい";


you will get the correct result. I think this means that the result of

Code:
`/usr/bin/perl ~/Desktop/print.pl`;


being already encoded in UTF-8, printing it with 'binmode (STDOUT, ":utf8")' will re-encode it into UTF-8, which gives a garbled result.

Now, it seems that your "Begin Perl" block auto-generates a temporary Perl script having at the beginning something like:

Code:
use utf8;
binmode (STDOUT, ":utf8");


I think this is the culprit. You should make an option with which these two lines will not be inserted into the generated temporary Perl script...

By the way, another option which is needed is to not delete immediately this temporary Perl script. As you delete it, it becomes impossible to debug it...!

Thank you in advance.

_________________
Best regards,

Nobumi Iyanaga
Tokyo,
Japan


2008-05-14 16:46:56
Profile WWW
Official Nisus Person
User avatar

Joined: 2002-07-11 17:14:10
Posts: 4251
Location: San Diego, CA
Post 
Nobumi Iyanaga wrote:
Now, it seems that your "Begin Perl" block auto-generates a temporary Perl script having at the beginning something like:

Code:
use utf8;
binmode (STDOUT, ":utf8");

This is exactly how embedded perl blocks work. However, it is necessary for the correct functioning of the NWP macro system, which requires that the perl output be encoded using UTF-8.

Quote:
Code:
#!/usr/bin/perl

use utf8;

$res = `/usr/bin/perl ~/Desktop/print.pl`;

#binmode (STDOUT, ":utf8");
print $res;

binmode (STDOUT, ":utf8");

print "だい";


you will get the correct result. I think this means that the result of

Code:
`/usr/bin/perl ~/Desktop/print.pl`;


being already encoded in UTF-8, printing it with 'binmode (STDOUT, ":utf8")' will re-encode it into UTF-8, which gives a garbled result.

Honestly I would consider this a bug in perl. The identity of the characters stored in $res should not change because the variable is printed to a file handle whose encoding has changed. Perl should convert between two encodings seamlessly. Or even if not, it should recognize that the output of the "print.pl" script is already UTF-8 and leave it as is.

At this time I don't see a way to resolve this, besides perhaps handling the system call yourself, eg: not using backticks.


2008-05-15 15:02:18
Profile WWW

Joined: 2007-01-17 05:46:17
Posts: 145
Location: Tokyo, Japan
Post 
Hello Martin,

Thank you for your reply.

martin wrote:
This is exactly how embedded perl blocks work. However, it is necessary for the correct functioning of the NWP macro system, which requires that the perl output be encoded using UTF-8.

....
Honestly I would consider this a bug in perl. The identity of the characters stored in $res should not change because the variable is printed to a file handle whose encoding has changed. Perl should convert between two encodings seamlessly. Or even if not, it should recognize that the output of the "print.pl" script is already UTF-8 and leave it as is.

At this time I don't see a way to resolve this, besides perhaps handling the system call yourself, eg: not using backticks.


I understand your position, but...

I know that in many cases, it is necessary for Nisus embedded Perl scripts to have the two lines:
Code:
use utf8;
binmode (STDOUT, ":utf8");

But if all the string to be dealt with is in ASCII characters, they are not really needed. And in some (perhaps rare) cases, these two lines can cause real problems -- as this is the case for the problem we are discussing.

The ideal would be that you create a kind of "expert mode", in which you would leave the users to decide if they want or not the two lines.

And in that "expert mode", you would not delete immediately the generated Perl scripts -- to let the users to debug them. That would be something like:
Code:
Begin Perl -Expert
       [your_perl_code]
End


For the time being, I could work around the problem using a temporary file, in this way:

Code:
$temp_fpath = File.temporaryPathWithName “temp.txt”
Begin Perl
   `/usr/bin/perl $my_script_fpath $args > $temp_fpath`;
End
$my_res = File.readDataFromPath $temp_fpath
...

But this is really unnecessary...

_________________
Best regards,

Nobumi Iyanaga
Tokyo,
Japan


2008-05-15 17:14:23
Profile WWW
Official Nisus Person
User avatar

Joined: 2002-07-11 17:14:10
Posts: 4251
Location: San Diego, CA
Post 
Thanks for your thoughts Nobumi. There would still be some difficulties with such an expert mode, eg: transferring text variables whose characters cannot be represented in ASCII or the system text encoding. But I suppose if such limitations were documented the mode might have some use, as in this case.


2008-05-16 13:05:06
Profile WWW

Joined: 2008-05-17 04:02:32
Posts: 400
Quote:
the result will be a mess (something like "ãÂ￾‚ãÂ￾„ãÂ￾†ãÂ￾ˆãÂ￾Š").


That is not a NW Pro problem but what you always get even from a script run in Terminal, when backticks are used to receive something from an external program. As perl does not know the encoding of the output, the UTF-8 flag is not turned on automatically. I don't know if this is a proper way but usually I use "utf8::decode".

Code:
begin Perl
   $res = `~/Desktop/myPerlScript.pl`;
   # /usr/bin/perl is unnecessary as far as the script is executable.
   utf8::decode $res;
end


Quote:
By the way, another option which is needed is to not delete immediately this temporary Perl script. As you delete it, it becomes impossible to debug it...!


Me too, I have complaint to Martin about the deletion repeatedly but recently I realised that I can see the script very easily, just by putting `open -e "$0"`; somewhere in the perl block. Alternatively and with NW Pro 1.1, you can do:

Code:
Debug.setDestination 'new'
Debug.setIncludePerl true

begin Perl
   [your code]
   $NisusPL = `cat "$0"`;
   utf8::decode $NisusPL;
   print STDOUT $NisusPL;
end


Quote:
The ideal would be that you create a kind of "expert mode", in which you would leave the users to decide if they want or not the two lines.


Once I sent a feedback requesting the same feature but I began to think such an option is perhaps unnecessary because I have not had a single occasion in which I really need it and because I think (not tested) you can override those three lines by inserting your own preambules at the top of a perl block, for example...

Code:
   no utf8;
   binmode (STDIN, ":bytes");
   binmode (STDOUT, ":bytes");


OTOH what looks urgent to me now is, as I requested via feedback some hours ago, a new command for controling which variables are sent to a perl block, excluding all the others.

With the excellent and innovative new macro language introduced by NW Pro 1.1, many of macros I write now begin with

Code:
$doc = Document.active
$text = $doc.text


Then, if there is a perl block, the whole $doc.text will be written to a temporary perl script file. I think this would be a serious performance hit especially when $doc.text is large and the perl block is executed repeatedly in a loop. Of course, you could sometimes work it around by putting "$text = $doc.text" after perl block(s), but not always. So I'd like to have a way to specify variables to be written to the script file.

I think such a command would not cause any trouble to any user as far as the restriction of variables sent to perl occurs only when the hypothetical "Perl.variablesAvailable ($var1, $var2...)" command is present in the macro.

And, on this occasion, I'd like to repeat my past feature request here again. I'd like to have "begin Shell... end". It is absurd and inelegant to call perl just to execute an external program via backticks.


Kino


2008-05-17 04:21:51
Profile

Joined: 2007-01-17 05:46:17
Posts: 145
Location: Tokyo, Japan
Hello Kino,

Thank you for your reply.

Kino wrote:
That is not a NW Pro problem but what you always get even from a script run in Terminal, when backticks are used to receive something from an external program. As perl does not know the encoding of the output, the UTF-8 flag is not turned on automatically. I don't know if this is a proper way but usually I use "utf8::decode".

Code:
begin Perl
   $res = `~/Desktop/myPerlScript.pl`;
   # /usr/bin/perl is unnecessary as far as the script is executable.
   utf8::decode $res;
end


This seems to be the answer to my problem. I tried this, and it worked perfectly. Thank you!


Quote:
Quote:
By the way, another option which is needed is to not delete immediately this temporary Perl script. As you delete it, it becomes impossible to debug it...!


Me too, I have complaint to Martin about the deletion repeatedly but recently I realised that I can see the script very easily, just by putting `open -e "$0"`; somewhere in the perl block. Alternatively and with NW Pro 1.1, you can do:

Code:
Debug.setDestination 'new'
Debug.setIncludePerl true

begin Perl
   [your code]
   $NisusPL = `cat "$0"`;
   utf8::decode $NisusPL;
   print STDOUT $NisusPL;
end


I tried this, and it worked perfectly. This is very good to know this!

Quote:
Quote:
The ideal would be that you create a kind of "expert mode", in which you would leave the users to decide if they want or not the two lines.


Once I sent a feedback requesting the same feature but I began to think such an option is perhaps unnecessary because I have not had a single occasion in which I really need it and because I think (not tested) you can override those three lines by inserting your own preambules at the top of a perl block, for example...

Code:
   no utf8;
   binmode (STDIN, ":bytes");
   binmode (STDOUT, ":bytes");


I tried this, and it seems to not work. When I put 'binmode (STDOUT, ":byte")' in the script, the Perl block doesn't return any output; I can do for example:
Code:
binmode (STDOUT, ":bytes");
$res = `perl $myScript`;
binmode (STDOUT, ":utf8");

In this case, I get the result, but it can be garbled -- the same thing as if I have not put either of these 'binmode' lines. I think this is due to the final line of the Nisus Perl block (that I could verify using your '`open -e "$0"`;' technique...):
Code:
print "
NisusPerlBoundary-AB3F2CA0-35C7-4528-8293-CB88BD60146B
";

that I don't understand. Anyway, this is not the behavior of the normal Perl script.

This seems to mean that my suggestion of the "expert mode" would not work with the current implementation of Nisus Perl block... So, the only possibilities for this kind of cases is either to use 'utf8:: decode' or to redirect the output to a temporary file and read it afterward... Sigh!

But all this should be documented in the Reference.

Quote:
OTOH what looks urgent to me now is, as I requested via feedback some hours ago, a new command for controling which variables are sent to a perl block, excluding all the others.

With the excellent and innovative new macro language introduced by NW Pro 1.1, many of macros I write now begin with

Code:
$doc = Document.active
$text = $doc.text


Then, if there is a perl block, the whole $doc.text will be written to a temporary perl script file. I think this would be a serious performance hit especially when $doc.text is large and the perl block is executed repeatedly in a loop. Of course, you could sometimes work it around by putting "$text = $doc.text" after perl block(s), but not always. So I'd like to have a way to specify variables to be written to the script file.


Ah, this is a serious problem. I think you realized the existence of this problem because you could look at the actual Perl code using your "debug mode".

Quote:
I think such a command would not cause any trouble to any user as far as the restriction of variables sent to perl occurs only when the hypothetical "Perl.variablesAvailable ($var1, $var2...)" command is present in the macro.


This is a very good suggestion, and I second it!

Quote:
And, on this occasion, I'd like to repeat my past feature request here again. I'd like to have "begin Shell... end". It is absurd and inelegant to call perl just to execute an external program via backticks.


And of course, I agree on this as well!

_________________
Best regards,

Nobumi Iyanaga
Tokyo,
Japan


2008-05-19 06:57:44
Profile WWW

Joined: 2008-05-17 04:02:32
Posts: 400
Nobumi Iyanaga wrote:
Kino wrote:
Code:
   no utf8;
   binmode (STDIN, ":bytes");
   binmode (STDOUT, ":bytes");

I tried this, and it seems to not work.


Oh, sorry. I tend to think the reason of the failure of those preambules is that UTF-8 characters, as values of variables, are written already and before them in the temporary script file but I'm not sure.

I cannot figure out exactly what you are planning to do with your real script (not the sample script you posted here) but perhaps you could use "utf8::encode" as a workaround. (All those command names are confusing to me, btw.) According to the man page of utf8, it "Converts in-place the character sequence to the corresponding octet sequence in UTF-X. The UTF-8 flag is turned off." For example, this seems to work.

Code:
### Show UTF-8 bytes for non-ASCII chars ###

$str = 'un élève'
begin Perl
   local undef $/;
   utf8::encode $str;
   $str =~ s/([^\x00-\x7F])/sprintf ("\\x%02X", ord ($1))/eg;
end
Exit $str

Please see
http://web.archive.org/web/20041230004348/http://homepage1.nifty.com/nomenclator/perl/unicode.htm
http://www.lr.pi.titech.ac.jp/~abekawa/perl/perl_unicode.html
(both in Japanese)

Quote:
But all this should be documented in the Reference.

Agreed. I'm very eager to have something like NW Pro Macro technical notes. At the same time, I'm afraid probably Martin and other Nisus people working on the NW Pro macro language are too busy. As I was very much surprised and impressed by the richness of the new commands introduced in NW Pro 1.1. beta, I feel myself exceptionally indulgent to the relative non-richness of the documentation...

Quote:
Quote:
And, on this occasion, I'd like to repeat my past feature request here again. I'd like to have "begin Shell... end". It is absurd and inelegant to call perl just to execute an external program via backticks.

And of course, I agree on this as well!

Thank you for your support ;-) And now I think my argument for requesting the shell block was not good. The most important reason is that it is very difficult, if not impossible, for me to embed a shell script (not a shell command) in a perl block.

And I second strongly your request for the direct support of AppleScript in NW Pro macro. It is really annoying to be forced to run an AS script via osascript via backticks from within a perl block.

I hope a future version of NW Pro macro language will support main and frequently used features of perl, AppleScript, shell commands, etc. But obviously it would be impossible to support them all. So easy and simple ways to use external programs would be very welcome.


2008-05-19 08:55:45
Profile
Official Nisus Person
User avatar

Joined: 2002-07-11 17:14:10
Posts: 4251
Location: San Diego, CA
Thanks for all your suggestions guys- if only there were infinite time to add all the features we want for 1.1! But we will of course consider them for future versions.

Nobumi Iyanaga wrote:
I think this is due to the final line of the Nisus Perl block (that I could verify using your '`open -e "$0"`;' technique...):
Code:
print "
NisusPerlBoundary-AB3F2CA0-35C7-4528-8293-CB88BD60146B
";

that I don't understand.

This print statement is used by NWP to update the state of your variables, in case they were changed by perl. NWP scans the perl script's output looking for the boundary, after which all the variable values are printed.


2008-05-19 15:16:19
Profile WWW

Joined: 2007-01-17 05:46:17
Posts: 145
Location: Tokyo, Japan
Post 
Hello Kino and Martin,

Thanks for your replies and your consideration. I too am very much surprised and impressed by the richness of the new commands introduced in NW Pro 1.1. beta; the documentation is still to be improved, but it has the essential. I think we, users, should also work to make some kind of tutorials, although it would be a very hard task...

As I said, Kino's technique to display the contents of Perl block is very useful -- but there is still problem. That is, when the Perl block contains error(s), it does not work. And this is precisely when Perl produces errors that we want to see and understand the reason...!

So, I still would like to repeat my feature request to add an option to Perl block, so that the temporary script is not deleted immediately.

Thank you for your consideration.

_________________
Best regards,

Nobumi Iyanaga
Tokyo,
Japan


2008-05-21 18:09:28
Profile WWW

Joined: 2008-05-17 04:02:32
Posts: 400
Post 
Many thanks to Martin and other Nisus people for the very quick addition of "Set Exported Perl Variables" and "Set Include Perl UTF Preamble" commands to NW Pro 1.1 and for numerous fantastic macro enhancements.

Nobumi Iyanaga wrote:
That is, when the Perl block contains error(s), it does not work. And this is precisely when Perl produces errors that we want to see and understand the reason...!


Well, that is a problem. I think -- not very sure if I'm not saying something absurd -- the ideal behaviour would be...

1. NW Pro will not delete a temporary perl script file immediately but, when the macro reaches to a perl block -- the same perl block in a loop or a different block --- overwrite it.

2. NW Pro will show the temporary perl script file if an error occurs.

3. NW Pro will delete temproray perl script file(s) [a] when NW Pro finishes to run a macro containing a perl block normally, [b] when NW Pro quits and [c] when NW Pro starts up.

In this way, NW Pro still and always has a single temporary perl script file. So it does not need checking if a file named "nisus.pl" is already existent.

Perhaps I may be wrong but I think overwriting a file is faster than deleting and creating a file. So this would improve the performance a little bit.

I think there would be nothing bad in deleting nisus.pl (and nisus-1.pl, nisus-2.pl...?) inconditionally as far as it is situated in a special sub-folder never used by any other application. However, if there might be a problem with a folder under /var/tmp, perhaps it would not be a bad idea to use a folder inside Nisus Writer Pro.app to store a temporary perl script file?

[edit: The last idea was stupid. A non-admin user could not write a file in a folder within an application in /Applications.]


2008-05-24 05:12:05
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 14 posts ] 

Who is online

Users browsing this forum: Yahoo [Bot] and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software