Skip to main content

Why won't some unicode characters print to my terminal? [Resolved]

I'm running Arch Linux with simple terminal using the Adobe Source Code Pro font. My locale is correctly set to LANG=en_US.UTF-8.

I want to print Unicode characters representing playing cards to my terminal. I'm using Wikipedia for reference.

The Unicode characters for card suits work fine. For example, issuing

$ printf "\u2660"

prints a black heart to the screen.

However, I'm having trouble with specific playing cards. Issuing

$ printf "\u1F0A1"

prints the symbol ?1 instead of the ace of spades ??. What's going wrong?

This problem persists across several terminals (urxvt, xterm, termite) and every font I've tried (DejaVu, Inconsolata).

Question Credit: Brian Fitzpatrick
Question Reference
Asked July 20, 2019
Posted Under: Unix Linux
2 Answers

help printf defers to printf(1) for the escape sequences interpreted, and the docs for GNU printf says:

printf interprets two character syntaxes introduced in ISO C 99: \u for 16-bit Unicode (ISO/IEC 10646) characters, specified as four hexadecimal digits hhhh, and \U for 32-bit Unicode characters, specified as eight hexadecimal digits hhhhhhhh. printf outputs the Unicode characters according to the LC_CTYPE locale. Unicode characters in the ranges U+0000…U+009F, U+D800…U+DFFF cannot be specified by this syntax, except for U+0024 ($), U+0040 (@), and U+0060 (`).

Something similar is specified in the Bash manual for ANSI C Quoting and echo:

the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHH (one to four hex digits)

the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHHHHHH (one to eight hex digits)

In short: \u is not for 5 hex digits. It's \U:

# printf "\u2660 \u1F0A1 \U1F0A1\n"
? ?1 ??

credit: muru
Answered July 20, 2019

Muru's answer is completely correct, but just to clarify one point:

When you're printing \u1F0A1, that's interpreted as a sixteen-bit Unicode escape \u1F0A, followed by the literal character 1 (since \u takes the following four characters, no more, no less). U+1F0A then gives ?, a Greek alpha with a couple diacritics on it (Greek Capital Letter Alpha with Psili and Varia, to be precise).

If you want more than sixteen bits in your Unicode escape, you need to use \U, which takes eight characters' worth of hex: \U0001F0A1 will give you the playing card.

credit: Draconis
Answered July 20, 2019
Your Answer