Output Devanagari (Hindi) from raw unicode using luatex Announcing the arrival of Valued...
How do I find out the mythology and history of my Fortress?
How much damage would a cupful of neutron star matter do to the Earth?
What does this say in Elvish?
Why is it faster to reheat something than it is to cook it?
Misunderstanding of Sylow theory
What order were files/directories output in dir?
Dynamic filling of a region of a polar plot
Sum letters are not two different
How were pictures turned from film to a big picture in a picture frame before digital scanning?
What's the difference between the capability remove_users and delete_users?
Is multiple magic items in one inherently imbalanced?
Deconstruction is ambiguous
Why does it sometimes sound good to play a grace note as a lead in to a note in a melody?
Would it be easier to apply for a UK visa if there is a host family to sponsor for you in going there?
Why are vacuum tubes still used in amateur radios?
Can the Flaming Sphere spell be rammed into multiple Tiny creatures that are in the same 5-foot square?
How can I prevent/balance waiting and turtling as a response to cooldown mechanics
Trademark violation for app?
Significance of Cersei's obsession with elephants?
Is the IBM 5153 color display compatible with the Tandy 1000 16 color modes?
Prove that BD bisects angle ABC
Is there hard evidence that the grant peer review system performs significantly better than random?
How to compare two different files line by line in unix?
How does Belgium enforce obligatory attendance in elections?
Output Devanagari (Hindi) from raw unicode using luatex
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)Conflict between color, graphicx and libertineXeTex - Times New Roman font for Romanian characters ș, ț, Ș and ȚIs LuaLaTeX producing faulty pdfs?Using a handwriting font from myscriptfont.comDevanagari/Indic in LuaTeXDevanagari Combined GlyphsVery multilingual work'table index is nil' error when using the Avenir font with fontspec + luatexTurkish characters do not appear end of the wordWho changed my Chinese character?
I can get the following code to compile, using luatex, with the Hindi/Devanagari characters correctly printed in the pdf:
documentclass{article}
usepackage{fontspec}
setmainfont{Times New Roman}
newfontscript{Devanagari}{deva,dev2}
newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}
begin{document}
Here is normal text.
{hindi नमस्ते }
end{document}
However, I'm using a program that outputs the tex and that won't allow me to type the Hindi script into my tex editor; instead, it will only give me the unicode version of the word, "नमस्ते", which is "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>"
.
How can I get luatex to compile correctly from these raw code characters? What I want to compile (to produce a pdf with the single word "नमस्ते") is something like this:
documentclass{article}
usepackage{fontspec}
setmainfont{Times New Roman}
newfontscript{Devanagari}{deva,dev2}
newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}
begin{document}
Here is normal text.
{hindi <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> }
end{document}
...but that won't work.
fonts luatex languages characters indic
add a comment |
I can get the following code to compile, using luatex, with the Hindi/Devanagari characters correctly printed in the pdf:
documentclass{article}
usepackage{fontspec}
setmainfont{Times New Roman}
newfontscript{Devanagari}{deva,dev2}
newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}
begin{document}
Here is normal text.
{hindi नमस्ते }
end{document}
However, I'm using a program that outputs the tex and that won't allow me to type the Hindi script into my tex editor; instead, it will only give me the unicode version of the word, "नमस्ते", which is "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>"
.
How can I get luatex to compile correctly from these raw code characters? What I want to compile (to produce a pdf with the single word "नमस्ते") is something like this:
documentclass{article}
usepackage{fontspec}
setmainfont{Times New Roman}
newfontscript{Devanagari}{deva,dev2}
newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}
begin{document}
Here is normal text.
{hindi <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> }
end{document}
...but that won't work.
fonts luatex languages characters indic
Can you get your program to outputchar"0928char"092Echar"0938char"094Dchar"0924 char"0947
instead of<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>
?
– Mico
3 hours ago
1
Yes, I could do that! What would the full script then need to look like?
– lethalSinger
3 hours ago
I'm afraid I cannot answer your question as I don't know which scripting tool you employ. I just posted an answer, though, which creates a Lua function that converts<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>
tochar"0928char"092Echar"0938char"094Dchar"0924 char"0947
.
– Mico
2 hours ago
add a comment |
I can get the following code to compile, using luatex, with the Hindi/Devanagari characters correctly printed in the pdf:
documentclass{article}
usepackage{fontspec}
setmainfont{Times New Roman}
newfontscript{Devanagari}{deva,dev2}
newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}
begin{document}
Here is normal text.
{hindi नमस्ते }
end{document}
However, I'm using a program that outputs the tex and that won't allow me to type the Hindi script into my tex editor; instead, it will only give me the unicode version of the word, "नमस्ते", which is "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>"
.
How can I get luatex to compile correctly from these raw code characters? What I want to compile (to produce a pdf with the single word "नमस्ते") is something like this:
documentclass{article}
usepackage{fontspec}
setmainfont{Times New Roman}
newfontscript{Devanagari}{deva,dev2}
newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}
begin{document}
Here is normal text.
{hindi <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> }
end{document}
...but that won't work.
fonts luatex languages characters indic
I can get the following code to compile, using luatex, with the Hindi/Devanagari characters correctly printed in the pdf:
documentclass{article}
usepackage{fontspec}
setmainfont{Times New Roman}
newfontscript{Devanagari}{deva,dev2}
newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}
begin{document}
Here is normal text.
{hindi नमस्ते }
end{document}
However, I'm using a program that outputs the tex and that won't allow me to type the Hindi script into my tex editor; instead, it will only give me the unicode version of the word, "नमस्ते", which is "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>"
.
How can I get luatex to compile correctly from these raw code characters? What I want to compile (to produce a pdf with the single word "नमस्ते") is something like this:
documentclass{article}
usepackage{fontspec}
setmainfont{Times New Roman}
newfontscript{Devanagari}{deva,dev2}
newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}
begin{document}
Here is normal text.
{hindi <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> }
end{document}
...but that won't work.
fonts luatex languages characters indic
fonts luatex languages characters indic
edited 53 mins ago
ShreevatsaR
28.2k873102
28.2k873102
asked 3 hours ago
lethalSingerlethalSinger
203
203
Can you get your program to outputchar"0928char"092Echar"0938char"094Dchar"0924 char"0947
instead of<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>
?
– Mico
3 hours ago
1
Yes, I could do that! What would the full script then need to look like?
– lethalSinger
3 hours ago
I'm afraid I cannot answer your question as I don't know which scripting tool you employ. I just posted an answer, though, which creates a Lua function that converts<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>
tochar"0928char"092Echar"0938char"094Dchar"0924 char"0947
.
– Mico
2 hours ago
add a comment |
Can you get your program to outputchar"0928char"092Echar"0938char"094Dchar"0924 char"0947
instead of<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>
?
– Mico
3 hours ago
1
Yes, I could do that! What would the full script then need to look like?
– lethalSinger
3 hours ago
I'm afraid I cannot answer your question as I don't know which scripting tool you employ. I just posted an answer, though, which creates a Lua function that converts<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>
tochar"0928char"092Echar"0938char"094Dchar"0924 char"0947
.
– Mico
2 hours ago
Can you get your program to output
char"0928char"092Echar"0938char"094Dchar"0924 char"0947
instead of <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>
?– Mico
3 hours ago
Can you get your program to output
char"0928char"092Echar"0938char"094Dchar"0924 char"0947
instead of <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>
?– Mico
3 hours ago
1
1
Yes, I could do that! What would the full script then need to look like?
– lethalSinger
3 hours ago
Yes, I could do that! What would the full script then need to look like?
– lethalSinger
3 hours ago
I'm afraid I cannot answer your question as I don't know which scripting tool you employ. I just posted an answer, though, which creates a Lua function that converts
<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>
to char"0928char"092Echar"0938char"094Dchar"0924 char"0947
.– Mico
2 hours ago
I'm afraid I cannot answer your question as I don't know which scripting tool you employ. I just posted an answer, though, which creates a Lua function that converts
<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>
to char"0928char"092Echar"0938char"094Dchar"0924 char"0947
.– Mico
2 hours ago
add a comment |
1 Answer
1
active
oldest
votes
(added an extra operation in the Lua function 'conv' to address the OP's follow-up request)
Since you're using LuaLaTeX, here's a solution that employs a Lua function to convert strings of the form '<U%+(.-)>'
to '\char"%1'
; here, %+
represents the literal character +
and %1
represents the non-greedy "capture" of the pattern (.-)
-- in words: "0 or more characters other than >
". In a second step, the Lua function converts any whitespace characters present in the string to explicit (interword) whitespace.
In addition, the code also sets up a LaTeX macro that acts as a front-end for the Lua function. Thus, one may call the Lua function via a conv{<your string here>}
directive.
You can either manually encase the sequences of unicode code in conv{...}
statements or, depending on how far you can get your program to do the work for you, instruct the scripting program to encase the sequences of unicode code in a conv{...}
statements automatically.
documentclass{article}
usepackage{fontspec}
setmainfont{Times New Roman}
newfontscript{Devanagari}{deva,dev2}
newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}
%%%% -- copy the next eight lines of code to your document --
usepackage{luacode} % for 'luacode' env. and 'luastringN' macro
begin{luacode}
function conv ( s )
s = s:gsub ( '<U%+(.-)>' , '\char"%1' )
tex.sprint ( ( s:gsub( '%s+' , '\ ' ) ) )
end
end{luacode}
newcommandconv[1]{directlua{conv(luastringN{#1})}}
begin{document}
Latin-alphabet text.
{hindi नमस्ते }
{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>} }
{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>}}
end{document}
1
This gets incredibly close. The only problem now is with breaks between words, which get ignored. E.g. "नमस्ते राज" (2 words) gets printed as "नमस्तेराज" (1 single word) even though there is the proper space between the unicode characters: "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>". How can I fix the spacing issue?
– lethalSinger
2 hours ago
1
@lethalSinger - Please see the updated answer I just posted. (The solution is to add a secondgsub
(short for "global substitution") operation.)
– Mico
1 hour ago
@lethalSinger -- Instead of inserting a secondgsub
step, the whitespace issue could also have been "solved" by changings:gsub( '<U%+(.-)>' , '\char"%1' )
tos:gsub( '<U%+(.-)>' , '\char"%1{}' )
; note the insertion of a pair of curly braces. IMNSHO, though, it's preferable -- and certainly more transparent, coding-wise -- to avoid slights of hand such as inserting an "empty TeX group" and to perform two separategsub
operations.
– Mico
9 mins ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "85"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f485697%2foutput-devanagari-hindi-from-raw-unicode-using-luatex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
(added an extra operation in the Lua function 'conv' to address the OP's follow-up request)
Since you're using LuaLaTeX, here's a solution that employs a Lua function to convert strings of the form '<U%+(.-)>'
to '\char"%1'
; here, %+
represents the literal character +
and %1
represents the non-greedy "capture" of the pattern (.-)
-- in words: "0 or more characters other than >
". In a second step, the Lua function converts any whitespace characters present in the string to explicit (interword) whitespace.
In addition, the code also sets up a LaTeX macro that acts as a front-end for the Lua function. Thus, one may call the Lua function via a conv{<your string here>}
directive.
You can either manually encase the sequences of unicode code in conv{...}
statements or, depending on how far you can get your program to do the work for you, instruct the scripting program to encase the sequences of unicode code in a conv{...}
statements automatically.
documentclass{article}
usepackage{fontspec}
setmainfont{Times New Roman}
newfontscript{Devanagari}{deva,dev2}
newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}
%%%% -- copy the next eight lines of code to your document --
usepackage{luacode} % for 'luacode' env. and 'luastringN' macro
begin{luacode}
function conv ( s )
s = s:gsub ( '<U%+(.-)>' , '\char"%1' )
tex.sprint ( ( s:gsub( '%s+' , '\ ' ) ) )
end
end{luacode}
newcommandconv[1]{directlua{conv(luastringN{#1})}}
begin{document}
Latin-alphabet text.
{hindi नमस्ते }
{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>} }
{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>}}
end{document}
1
This gets incredibly close. The only problem now is with breaks between words, which get ignored. E.g. "नमस्ते राज" (2 words) gets printed as "नमस्तेराज" (1 single word) even though there is the proper space between the unicode characters: "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>". How can I fix the spacing issue?
– lethalSinger
2 hours ago
1
@lethalSinger - Please see the updated answer I just posted. (The solution is to add a secondgsub
(short for "global substitution") operation.)
– Mico
1 hour ago
@lethalSinger -- Instead of inserting a secondgsub
step, the whitespace issue could also have been "solved" by changings:gsub( '<U%+(.-)>' , '\char"%1' )
tos:gsub( '<U%+(.-)>' , '\char"%1{}' )
; note the insertion of a pair of curly braces. IMNSHO, though, it's preferable -- and certainly more transparent, coding-wise -- to avoid slights of hand such as inserting an "empty TeX group" and to perform two separategsub
operations.
– Mico
9 mins ago
add a comment |
(added an extra operation in the Lua function 'conv' to address the OP's follow-up request)
Since you're using LuaLaTeX, here's a solution that employs a Lua function to convert strings of the form '<U%+(.-)>'
to '\char"%1'
; here, %+
represents the literal character +
and %1
represents the non-greedy "capture" of the pattern (.-)
-- in words: "0 or more characters other than >
". In a second step, the Lua function converts any whitespace characters present in the string to explicit (interword) whitespace.
In addition, the code also sets up a LaTeX macro that acts as a front-end for the Lua function. Thus, one may call the Lua function via a conv{<your string here>}
directive.
You can either manually encase the sequences of unicode code in conv{...}
statements or, depending on how far you can get your program to do the work for you, instruct the scripting program to encase the sequences of unicode code in a conv{...}
statements automatically.
documentclass{article}
usepackage{fontspec}
setmainfont{Times New Roman}
newfontscript{Devanagari}{deva,dev2}
newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}
%%%% -- copy the next eight lines of code to your document --
usepackage{luacode} % for 'luacode' env. and 'luastringN' macro
begin{luacode}
function conv ( s )
s = s:gsub ( '<U%+(.-)>' , '\char"%1' )
tex.sprint ( ( s:gsub( '%s+' , '\ ' ) ) )
end
end{luacode}
newcommandconv[1]{directlua{conv(luastringN{#1})}}
begin{document}
Latin-alphabet text.
{hindi नमस्ते }
{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>} }
{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>}}
end{document}
1
This gets incredibly close. The only problem now is with breaks between words, which get ignored. E.g. "नमस्ते राज" (2 words) gets printed as "नमस्तेराज" (1 single word) even though there is the proper space between the unicode characters: "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>". How can I fix the spacing issue?
– lethalSinger
2 hours ago
1
@lethalSinger - Please see the updated answer I just posted. (The solution is to add a secondgsub
(short for "global substitution") operation.)
– Mico
1 hour ago
@lethalSinger -- Instead of inserting a secondgsub
step, the whitespace issue could also have been "solved" by changings:gsub( '<U%+(.-)>' , '\char"%1' )
tos:gsub( '<U%+(.-)>' , '\char"%1{}' )
; note the insertion of a pair of curly braces. IMNSHO, though, it's preferable -- and certainly more transparent, coding-wise -- to avoid slights of hand such as inserting an "empty TeX group" and to perform two separategsub
operations.
– Mico
9 mins ago
add a comment |
(added an extra operation in the Lua function 'conv' to address the OP's follow-up request)
Since you're using LuaLaTeX, here's a solution that employs a Lua function to convert strings of the form '<U%+(.-)>'
to '\char"%1'
; here, %+
represents the literal character +
and %1
represents the non-greedy "capture" of the pattern (.-)
-- in words: "0 or more characters other than >
". In a second step, the Lua function converts any whitespace characters present in the string to explicit (interword) whitespace.
In addition, the code also sets up a LaTeX macro that acts as a front-end for the Lua function. Thus, one may call the Lua function via a conv{<your string here>}
directive.
You can either manually encase the sequences of unicode code in conv{...}
statements or, depending on how far you can get your program to do the work for you, instruct the scripting program to encase the sequences of unicode code in a conv{...}
statements automatically.
documentclass{article}
usepackage{fontspec}
setmainfont{Times New Roman}
newfontscript{Devanagari}{deva,dev2}
newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}
%%%% -- copy the next eight lines of code to your document --
usepackage{luacode} % for 'luacode' env. and 'luastringN' macro
begin{luacode}
function conv ( s )
s = s:gsub ( '<U%+(.-)>' , '\char"%1' )
tex.sprint ( ( s:gsub( '%s+' , '\ ' ) ) )
end
end{luacode}
newcommandconv[1]{directlua{conv(luastringN{#1})}}
begin{document}
Latin-alphabet text.
{hindi नमस्ते }
{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>} }
{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>}}
end{document}
(added an extra operation in the Lua function 'conv' to address the OP's follow-up request)
Since you're using LuaLaTeX, here's a solution that employs a Lua function to convert strings of the form '<U%+(.-)>'
to '\char"%1'
; here, %+
represents the literal character +
and %1
represents the non-greedy "capture" of the pattern (.-)
-- in words: "0 or more characters other than >
". In a second step, the Lua function converts any whitespace characters present in the string to explicit (interword) whitespace.
In addition, the code also sets up a LaTeX macro that acts as a front-end for the Lua function. Thus, one may call the Lua function via a conv{<your string here>}
directive.
You can either manually encase the sequences of unicode code in conv{...}
statements or, depending on how far you can get your program to do the work for you, instruct the scripting program to encase the sequences of unicode code in a conv{...}
statements automatically.
documentclass{article}
usepackage{fontspec}
setmainfont{Times New Roman}
newfontscript{Devanagari}{deva,dev2}
newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}
%%%% -- copy the next eight lines of code to your document --
usepackage{luacode} % for 'luacode' env. and 'luastringN' macro
begin{luacode}
function conv ( s )
s = s:gsub ( '<U%+(.-)>' , '\char"%1' )
tex.sprint ( ( s:gsub( '%s+' , '\ ' ) ) )
end
end{luacode}
newcommandconv[1]{directlua{conv(luastringN{#1})}}
begin{document}
Latin-alphabet text.
{hindi नमस्ते }
{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>} }
{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>}}
end{document}
edited 1 hour ago
answered 2 hours ago
MicoMico
287k32393781
287k32393781
1
This gets incredibly close. The only problem now is with breaks between words, which get ignored. E.g. "नमस्ते राज" (2 words) gets printed as "नमस्तेराज" (1 single word) even though there is the proper space between the unicode characters: "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>". How can I fix the spacing issue?
– lethalSinger
2 hours ago
1
@lethalSinger - Please see the updated answer I just posted. (The solution is to add a secondgsub
(short for "global substitution") operation.)
– Mico
1 hour ago
@lethalSinger -- Instead of inserting a secondgsub
step, the whitespace issue could also have been "solved" by changings:gsub( '<U%+(.-)>' , '\char"%1' )
tos:gsub( '<U%+(.-)>' , '\char"%1{}' )
; note the insertion of a pair of curly braces. IMNSHO, though, it's preferable -- and certainly more transparent, coding-wise -- to avoid slights of hand such as inserting an "empty TeX group" and to perform two separategsub
operations.
– Mico
9 mins ago
add a comment |
1
This gets incredibly close. The only problem now is with breaks between words, which get ignored. E.g. "नमस्ते राज" (2 words) gets printed as "नमस्तेराज" (1 single word) even though there is the proper space between the unicode characters: "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>". How can I fix the spacing issue?
– lethalSinger
2 hours ago
1
@lethalSinger - Please see the updated answer I just posted. (The solution is to add a secondgsub
(short for "global substitution") operation.)
– Mico
1 hour ago
@lethalSinger -- Instead of inserting a secondgsub
step, the whitespace issue could also have been "solved" by changings:gsub( '<U%+(.-)>' , '\char"%1' )
tos:gsub( '<U%+(.-)>' , '\char"%1{}' )
; note the insertion of a pair of curly braces. IMNSHO, though, it's preferable -- and certainly more transparent, coding-wise -- to avoid slights of hand such as inserting an "empty TeX group" and to perform two separategsub
operations.
– Mico
9 mins ago
1
1
This gets incredibly close. The only problem now is with breaks between words, which get ignored. E.g. "नमस्ते राज" (2 words) gets printed as "नमस्तेराज" (1 single word) even though there is the proper space between the unicode characters: "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>". How can I fix the spacing issue?
– lethalSinger
2 hours ago
This gets incredibly close. The only problem now is with breaks between words, which get ignored. E.g. "नमस्ते राज" (2 words) gets printed as "नमस्तेराज" (1 single word) even though there is the proper space between the unicode characters: "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>". How can I fix the spacing issue?
– lethalSinger
2 hours ago
1
1
@lethalSinger - Please see the updated answer I just posted. (The solution is to add a second
gsub
(short for "global substitution") operation.)– Mico
1 hour ago
@lethalSinger - Please see the updated answer I just posted. (The solution is to add a second
gsub
(short for "global substitution") operation.)– Mico
1 hour ago
@lethalSinger -- Instead of inserting a second
gsub
step, the whitespace issue could also have been "solved" by changing s:gsub( '<U%+(.-)>' , '\char"%1' )
to s:gsub( '<U%+(.-)>' , '\char"%1{}' )
; note the insertion of a pair of curly braces. IMNSHO, though, it's preferable -- and certainly more transparent, coding-wise -- to avoid slights of hand such as inserting an "empty TeX group" and to perform two separate gsub
operations.– Mico
9 mins ago
@lethalSinger -- Instead of inserting a second
gsub
step, the whitespace issue could also have been "solved" by changing s:gsub( '<U%+(.-)>' , '\char"%1' )
to s:gsub( '<U%+(.-)>' , '\char"%1{}' )
; note the insertion of a pair of curly braces. IMNSHO, though, it's preferable -- and certainly more transparent, coding-wise -- to avoid slights of hand such as inserting an "empty TeX group" and to perform two separate gsub
operations.– Mico
9 mins ago
add a comment |
Thanks for contributing an answer to TeX - LaTeX Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f485697%2foutput-devanagari-hindi-from-raw-unicode-using-luatex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you get your program to output
char"0928char"092Echar"0938char"094Dchar"0924 char"0947
instead of<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>
?– Mico
3 hours ago
1
Yes, I could do that! What would the full script then need to look like?
– lethalSinger
3 hours ago
I'm afraid I cannot answer your question as I don't know which scripting tool you employ. I just posted an answer, though, which creates a Lua function that converts
<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>
tochar"0928char"092Echar"0938char"094Dchar"0924 char"0947
.– Mico
2 hours ago