Perl output messed up

Get help using and writing Nisus Writer Pro macros.
Post Reply
Nobumi Iyanaga
Posts: 158
Joined: 2007-01-17 05:46:17
Location: Tokyo, Japan
Contact:

Perl output messed up

Post by Nobumi Iyanaga »

Hello,

I want to write a macro like the following:

Code: Select all

$res = ""
Begin Perl
   $res = `/usr/bin/perl ~/Desktop/myPerlScript.pl`;
End

New
Insert Text $res
If the perl script "~/Desktop/myPerlScript.pl" has a code like the following:

Code: Select all

#!/usr/bin/perl
binmode (STDOUT, ":utf8");
print "あいうえお";
the result will be a mess (something like "ãÂ￾‚ãÂ￾„ãÂ￾†ãÂ￾ˆãÂ￾Š").

Even if I remove the line 'binmode (STDOUT, ":utf8")', I get an even worse result.

How can I get the expected result?

Thank you in advance.
Best regards,

Nobumi Iyanaga
Tokyo,
Japan
User avatar
Hamid
Posts: 777
Joined: 2007-01-17 03:25:42

Re: Perl output messed up

Post by Hamid »

This should work. I only changed utf8 to utf-8:

Code: Select all

#!/usr/bin/perl
binmode (STDOUT, ":utf-8");
print "あいうえお";
User avatar
martin
Official Nisus Person
Posts: 5227
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Re: Perl output messed up

Post by martin »

Hamid wrote:This should work. I only changed utf8 to utf-8:

Code: Select all

#!/usr/bin/perl
binmode (STDOUT, ":utf-8");
print "あいうえお";
This may be a misleading solution. I don't believe the string "utf-8" is valid to designate an encoding, you should use "utf8" without a dash. To see this try changing the line to:

Code: Select all

binmode (STDOUT, ":xxx");
You should obtain the same results as if you had used "utf-8".

I think the proper thing to do is:

Code: Select all

use utf8;
binnmode (STDOUT, ":utf8");
The first line tells perl the encoding of the actual ".pl" file. The second tells perl to use the UTF-8 encoding for standard output, which NWP expects. Of course, none of this will work if you didn't save your ".pl" file using UTF-8 in the first place.
User avatar
martin
Official Nisus Person
Posts: 5227
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Post by martin »

There's unfortunately something else going on here. It seems the string returned using perl's backtick operator is defaulting to the system encoding. Using just perl from the command line I run into problems. The first script "print.pl" is:

Code: Select all

#!/usr/bin/perl 
use utf8;
binmode(STDOUT, ":utf8");
print "あいうえお\n";
The second script which calls the above script is:

Code: Select all

use utf8;
binmode(STDOUT, ":utf8");
$res = `/usr/bin/perl ~/Desktop/print.pl`;
print $res;
print "だい\n";
The output is unfortunately:

Code: Select all

ããããã
だい
The backtick operator is garbling the text. I assume because it isn't respecting the command's stdout encoding. I'm not sure there's a way to fix this without getting complicated and using perl's "system" command and configuring/reading the pipes yourself.
Nobumi Iyanaga
Posts: 158
Joined: 2007-01-17 05:46:17
Location: Tokyo, Japan
Contact:

Post by Nobumi Iyanaga »


Best regards,

Nobumi Iyanaga
Tokyo,
Japan
User avatar
martin
Official Nisus Person
Posts: 5227
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Post by martin »

Nobumi Iyanaga wrote:Now, it seems that your "Begin Perl" block auto-generates a temporary Perl script having at the beginning something like:

Code: Select all

use utf8;
binmode (STDOUT, ":utf8");
This is exactly how embedded perl blocks work. However, it is necessary for the correct functioning of the NWP macro system, which requires that the perl output be encoded using UTF-8.

Code: Select all

#!/usr/bin/perl

use utf8;

$res = `/usr/bin/perl ~/Desktop/print.pl`;

#binmode (STDOUT, ":utf8");
print $res;

binmode (STDOUT, ":utf8");

print "だい";
you will get the correct result. I think this means that the result of

Code: Select all

`/usr/bin/perl ~/Desktop/print.pl`;
being already encoded in UTF-8, printing it with 'binmode (STDOUT, ":utf8")' will re-encode it into UTF-8, which gives a garbled result.
Honestly I would consider this a bug in perl. The identity of the characters stored in $res should not change because the variable is printed to a file handle whose encoding has changed. Perl should convert between two encodings seamlessly. Or even if not, it should recognize that the output of the "print.pl" script is already UTF-8 and leave it as is.

At this time I don't see a way to resolve this, besides perhaps handling the system call yourself, eg: not using backticks.
Nobumi Iyanaga
Posts: 158
Joined: 2007-01-17 05:46:17
Location: Tokyo, Japan
Contact:

Post by Nobumi Iyanaga »

Hello Martin,

Thank you for your reply.
martin wrote:This is exactly how embedded perl blocks work. However, it is necessary for the correct functioning of the NWP macro system, which requires that the perl output be encoded using UTF-8.

....
Honestly I would consider this a bug in perl. The identity of the characters stored in $res should not change because the variable is printed to a file handle whose encoding has changed. Perl should convert between two encodings seamlessly. Or even if not, it should recognize that the output of the "print.pl" script is already UTF-8 and leave it as is.

At this time I don't see a way to resolve this, besides perhaps handling the system call yourself, eg: not using backticks.
I understand your position, but...

I know that in many cases, it is necessary for Nisus embedded Perl scripts to have the two lines:

Code: Select all

use utf8;
binmode (STDOUT, ":utf8");
But if all the string to be dealt with is in ASCII characters, they are not really needed. And in some (perhaps rare) cases, these two lines can cause real problems -- as this is the case for the problem we are discussing.

The ideal would be that you create a kind of "expert mode", in which you would leave the users to decide if they want or not the two lines.

And in that "expert mode", you would not delete immediately the generated Perl scripts -- to let the users to debug them. That would be something like:

Code: Select all

Begin Perl -Expert
       [your_perl_code]
End
For the time being, I could work around the problem using a temporary file, in this way:

Code: Select all

$temp_fpath = File.temporaryPathWithName “temp.txt”
Begin Perl
	`/usr/bin/perl $my_script_fpath $args > $temp_fpath`;
End
$my_res = File.readDataFromPath $temp_fpath
...
But this is really unnecessary...
Best regards,

Nobumi Iyanaga
Tokyo,
Japan
User avatar
martin
Official Nisus Person
Posts: 5227
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Post by martin »

Thanks for your thoughts Nobumi. There would still be some difficulties with such an expert mode, eg: transferring text variables whose characters cannot be represented in ASCII or the system text encoding. But I suppose if such limitations were documented the mode might have some use, as in this case.
Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Perl output messed up

Post by Kino »


Nobumi Iyanaga
Posts: 158
Joined: 2007-01-17 05:46:17
Location: Tokyo, Japan
Contact:

Re: Perl output messed up

Post by Nobumi Iyanaga »

Hello Kino,

Thank you for your reply.
Kino wrote:That is not a NW Pro problem but what you always get even from a script run in Terminal, when backticks are used to receive something from an external program. As perl does not know the encoding of the output, the UTF-8 flag is not turned on automatically. I don't know if this is a proper way but usually I use "utf8::decode".

Code: Select all

begin Perl
	$res = `~/Desktop/myPerlScript.pl`;
	# /usr/bin/perl is unnecessary as far as the script is executable.
	utf8::decode $res;
end
This seems to be the answer to my problem. I tried this, and it worked perfectly. Thank you!

By the way, another option which is needed is to not delete immediately this temporary Perl script. As you delete it, it becomes impossible to debug it...!
Me too, I have complaint to Martin about the deletion repeatedly but recently I realised that I can see the script very easily, just by putting `open -e "$0"`; somewhere in the perl block. Alternatively and with NW Pro 1.1, you can do:

Code: Select all

Debug.setDestination 'new'
Debug.setIncludePerl true

begin Perl
	[your code]
	$NisusPL = `cat "$0"`;
	utf8::decode $NisusPL;
	print STDOUT $NisusPL;
end
I tried this, and it worked perfectly. This is very good to know this!
The ideal would be that you create a kind of "expert mode", in which you would leave the users to decide if they want or not the two lines.
Once I sent a feedback requesting the same feature but I began to think such an option is perhaps unnecessary because I have not had a single occasion in which I really need it and because I think (not tested) you can override those three lines by inserting your own preambules at the top of a perl block, for example...

Code: Select all

	no utf8; 
	binmode (STDIN, ":bytes");
	binmode (STDOUT, ":bytes");
I tried this, and it seems to not work. When I put 'binmode (STDOUT, ":byte")' in the script, the Perl block doesn't return any output; I can do for example:

Code: Select all

binmode (STDOUT, ":bytes");
$res = `perl $myScript`;
binmode (STDOUT, ":utf8");
In this case, I get the result, but it can be garbled -- the same thing as if I have not put either of these 'binmode' lines. I think this is due to the final line of the Nisus Perl block (that I could verify using your '`open -e "$0"`;' technique...):

Code: Select all

print "
NisusPerlBoundary-AB3F2CA0-35C7-4528-8293-CB88BD60146B
"; 
that I don't understand. Anyway, this is not the behavior of the normal Perl script.

This seems to mean that my suggestion of the "expert mode" would not work with the current implementation of Nisus Perl block... So, the only possibilities for this kind of cases is either to use 'utf8:: decode' or to redirect the output to a temporary file and read it afterward... Sigh!

But all this should be documented in the Reference.
OTOH what looks urgent to me now is, as I requested via feedback some hours ago, a new command for controling which variables are sent to a perl block, excluding all the others.

With the excellent and innovative new macro language introduced by NW Pro 1.1, many of macros I write now begin with

Code: Select all

$doc = Document.active
$text = $doc.text
Then, if there is a perl block, the whole $doc.text will be written to a temporary perl script file. I think this would be a serious performance hit especially when $doc.text is large and the perl block is executed repeatedly in a loop. Of course, you could sometimes work it around by putting "$text = $doc.text" after perl block(s), but not always. So I'd like to have a way to specify variables to be written to the script file.
Ah, this is a serious problem. I think you realized the existence of this problem because you could look at the actual Perl code using your "debug mode".
I think such a command would not cause any trouble to any user as far as the restriction of variables sent to perl occurs only when the hypothetical "Perl.variablesAvailable ($var1, $var2...)" command is present in the macro.
This is a very good suggestion, and I second it!
And, on this occasion, I'd like to repeat my past feature request here again. I'd like to have "begin Shell... end". It is absurd and inelegant to call perl just to execute an external program via backticks.
And of course, I agree on this as well!
Best regards,

Nobumi Iyanaga
Tokyo,
Japan
Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Perl output messed up

Post by Kino »

Nobumi Iyanaga wrote:
Kino wrote:

Code: Select all

	no utf8; 
	binmode (STDIN, ":bytes");
	binmode (STDOUT, ":bytes");
I tried this, and it seems to not work.
Oh, sorry. I tend to think the reason of the failure of those preambules is that UTF-8 characters, as values of variables, are written already and before them in the temporary script file but I'm not sure.

I cannot figure out exactly what you are planning to do with your real script (not the sample script you posted here) but perhaps you could use "utf8::encode" as a workaround. (All those command names are confusing to me, btw.) According to the man page of utf8, it "Converts in-place the character sequence to the corresponding octet sequence in UTF-X. The UTF-8 flag is turned off." For example, this seems to work.

Code: Select all

### Show UTF-8 bytes for non-ASCII chars ###

$str = 'un élève'
begin Perl
	local undef $/;
	utf8::encode $str;
	$str =~ s/([^\x00-\x7F])/sprintf ("\\x%02X", ord ($1))/eg;
end
Exit $str
Please see
http://web.archive.org/web/200412300043 ... nicode.htm
http://www.lr.pi.titech.ac.jp/~abekawa/ ... icode.html
(both in Japanese)
But all this should be documented in the Reference.
Agreed. I'm very eager to have something like NW Pro Macro technical notes. At the same time, I'm afraid probably Martin and other Nisus people working on the NW Pro macro language are too busy. As I was very much surprised and impressed by the richness of the new commands introduced in NW Pro 1.1. beta, I feel myself exceptionally indulgent to the relative non-richness of the documentation...
And, on this occasion, I'd like to repeat my past feature request here again. I'd like to have "begin Shell... end". It is absurd and inelegant to call perl just to execute an external program via backticks.
And of course, I agree on this as well!
Thank you for your support ;-) And now I think my argument for requesting the shell block was not good. The most important reason is that it is very difficult, if not impossible, for me to embed a shell script (not a shell command) in a perl block.

And I second strongly your request for the direct support of AppleScript in NW Pro macro. It is really annoying to be forced to run an AS script via osascript via backticks from within a perl block.

I hope a future version of NW Pro macro language will support main and frequently used features of perl, AppleScript, shell commands, etc. But obviously it would be impossible to support them all. So easy and simple ways to use external programs would be very welcome.
User avatar
martin
Official Nisus Person
Posts: 5227
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Re: Perl output messed up

Post by martin »

Thanks for all your suggestions guys- if only there were infinite time to add all the features we want for 1.1! But we will of course consider them for future versions.
Nobumi Iyanaga wrote:I think this is due to the final line of the Nisus Perl block (that I could verify using your '`open -e "$0"`;' technique...):

Code: Select all

print "
NisusPerlBoundary-AB3F2CA0-35C7-4528-8293-CB88BD60146B
"; 
that I don't understand.
This print statement is used by NWP to update the state of your variables, in case they were changed by perl. NWP scans the perl script's output looking for the boundary, after which all the variable values are printed.
Nobumi Iyanaga
Posts: 158
Joined: 2007-01-17 05:46:17
Location: Tokyo, Japan
Contact:

Post by Nobumi Iyanaga »

Hello Kino and Martin,

Thanks for your replies and your consideration. I too am very much surprised and impressed by the richness of the new commands introduced in NW Pro 1.1. beta; the documentation is still to be improved, but it has the essential. I think we, users, should also work to make some kind of tutorials, although it would be a very hard task...

As I said, Kino's technique to display the contents of Perl block is very useful -- but there is still problem. That is, when the Perl block contains error(s), it does not work. And this is precisely when Perl produces errors that we want to see and understand the reason...!

So, I still would like to repeat my feature request to add an option to Perl block, so that the temporary script is not deleted immediately.

Thank you for your consideration.
Best regards,

Nobumi Iyanaga
Tokyo,
Japan
Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Post by Kino »

Many thanks to Martin and other Nisus people for the very quick addition of "Set Exported Perl Variables" and "Set Include Perl UTF Preamble" commands to NW Pro 1.1 and for numerous fantastic macro enhancements.
Nobumi Iyanaga wrote:That is, when the Perl block contains error(s), it does not work. And this is precisely when Perl produces errors that we want to see and understand the reason...!
Well, that is a problem. I think -- not very sure if I'm not saying something absurd -- the ideal behaviour would be...

1. NW Pro will not delete a temporary perl script file immediately but, when the macro reaches to a perl block -- the same perl block in a loop or a different block --- overwrite it.

2. NW Pro will show the temporary perl script file if an error occurs.

3. NW Pro will delete temproray perl script file(s) [a] when NW Pro finishes to run a macro containing a perl block normally, when NW Pro quits and [c] when NW Pro starts up.

In this way, NW Pro still and always has a single temporary perl script file. So it does not need checking if a file named "nisus.pl" is already existent.

Perhaps I may be wrong but I think overwriting a file is faster than deleting and creating a file. So this would improve the performance a little bit.

I think there would be nothing bad in deleting nisus.pl (and nisus-1.pl, nisus-2.pl...?) inconditionally as far as it is situated in a special sub-folder never used by any other application. However, if there might be a problem with a folder under /var/tmp, perhaps it would not be a bad idea to use a folder inside Nisus Writer Pro.app to store a temporary perl script file?

[edit: The last idea was stupid. A non-admin user could not write a file in a folder within an application in /Applications.]
Post Reply