Crear un carácter Unicode a partir de su número

Question

Crear un carácter Unicode a partir de su número

Quiero mostrar un carácter Unicode en Java. Si hago esto, funciona bien:

String symbol = "\u2202";

El símbolo es igual a "∂". Eso es lo que quiero.

El problema es que conozco el número Unicode y necesito crear el símbolo Unicode a partir de eso. Intenté (para mí) lo obvio:

int c = 2202;
String symbol =  "\\u" + c;

Sin embargo, en este caso, el símbolo es igual a "\u2202". Eso no es lo que quiero.

¿Cómo puedo construir el símbolo si conozco su número Unicode (pero solo en tiempo de ejecución - - - I ¿no puedes codificarlo como en el primer ejemplo)?

88

java string character unicode

Author: Paul Reiners, 2011-04-07

Source

13 answers

Si desea obtener una unidad de código codificado UTF-16 como char, puede analizar el entero y convertirlo como otros han sugerido.

Si desea admitir todos los puntos de código, use Character.toChars(int). Esto manejará los casos en los que los puntos de código no pueden caber en un solo valor char.

El Doc dice:

Convierte el carácter especificado (punto de código Unicode) a su representación UTF-16 almacenada en una matriz char. Si el punto de código especificado es un BMP (Basic Multilingual Plane or Plano 0) valor, el array char resultante tiene el mismo valor que codePoint. Si el punto de código especificado es un punto de código suplementario, el array char resultante tiene el par sustituto correspondiente.

112

Author: McDowell,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2014-11-25 02:31:17

Las otras respuestas aquí solo admiten unicode hasta U+FFFF (las respuestas se ocupan de una sola instancia de char) o no dicen cómo llegar al símbolo real (las respuestas se detienen en el Carácter.toChars () o usando un método incorrecto después de eso), así que agregar mi respuesta aquí también.

Para apoyar también los puntos de código suplementarios, esto es lo que hay que hacer:

// this character:
// http://www.isthisthingon.org/unicode/index.php?page=1F&subpage=4&glyph=1F495
// using code points here, not U+n notation
// for equivalence with U+n, below would be 0xnnnn
int codePoint = 128149;
// converting to char[] pair
char[] charPair = Character.toChars(codePoint);
// and to String, containing the character we want
String symbol = new String(charPair);

// we now have str with the desired character as the first item
// confirm that we indeed have character with code point 128149
System.out.println("First code point: " + symbol.codePointAt(0));

También hice una prueba rápida en cuanto a qué métodos de conversión funcionan y cuáles no

int codePoint = 128149;
char[] charPair = Character.toChars(codePoint);

String str = new String(charPair, 0, 2);
System.out.println("First code point: " + str.codePointAt(0));    // 128149, worked
String str2 = charPair.toString();
System.out.println("Second code point: " + str2.codePointAt(0));  // 91, didn't work
String str3 = new String(charPair);
System.out.println("Third code point: " + str3.codePointAt(0));   // 128149, worked
String str4 = String.valueOf(code);
System.out.println("Fourth code point: " + str4.codePointAt(0));  // 49, didn't work
String str5 = new String(new int[] {codePoint}, 0, 1);
System.out.println("Fifth code point: " + str5.codePointAt(0));   // 128149, worked

17

Author: eis,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2013-04-16 10:49:48

Recuerde que char es un tipo integral, y por lo tanto se le puede dar un valor entero, así como una constante char.

char c = 0x2202;//aka 8706 in decimal. \u codepoints are in hex.
String s = String.valueOf(c);

5

Author: ILMTitan,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2011-04-07 21:22:00

Este funcionó bien para mí.

  String cc2 = "2202";
  String text2 = String.valueOf(Character.toChars(Integer.parseInt(cc2, 16)));

Ahora text2 tendrá ∂.

5

Author: MeraNaamJoker,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2013-11-27 10:09:57

Así es como lo haces:

int cc = 0x2202;
char ccc = (char) Integer.parseInt(String.valueOf(cc), 16);
final String text = String.valueOf(ccc);

Esta solución es de Arne Vajhøj.

2

Author: Paul Reiners,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2015-05-05 14:09:28

String st="2202";
int cp=Integer.parseInt(st,16);// it convert st into hex number.
char c[]=Character.toChars(cp);
System.out.println(c);// its display the character corresponding to '\u2202'.

2

Author: Kapil K. Kushwah,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2017-07-24 06:06:07

El siguiente código escribirá los 4 caracteres unicode (representados por decimales) para la palabra "be" en japonés. Sí, el verbo "be" en japonés tiene 4 caracteres! El valor de los caracteres está en decimal y se ha leído en una matriz de Cadena [] using usando split, por ejemplo. Si tienes Octal o Hex, parseInt toma una radix también.

// pseudo code
// 1. init the String[] containing the 4 unicodes in decima :: intsInStrs 
// 2. allocate the proper number of character pairs :: c2s
// 3. Using Integer.parseInt (... with radix or not) get the right int value
// 4. place it in the correct location of in the array of character pairs
// 5. convert c2s[] to String
// 6. print 

String[] intsInStrs = {"12354", "12426", "12414", "12377"}; // 1.
char [] c2s = new char [intsInStrs.length * 2];  // 2.  two chars per unicode

int ii = 0;
for (String intString : intsInStrs) {
    // 3. NB ii*2 because the 16 bit value of Unicode is written in 2 chars
    Character.toChars(Integer.parseInt(intsInStrs[ii]), c2s, ii * 2 ); // 3 + 4
    ++ii; // advance to the next char
}

String symbols = new String(c2s);  // 5.
System.out.println("\nLooooonger code point: " + symbols); // 6.
// I tested it in Eclipse and Java 7 and it works.  Enjoy

1

Author: user96265,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2015-06-09 21:38:36

Desafortunadamente, eliminar una reacción como se mencionó en el primer comentario (newbiedoodle) no conduce a un buen resultado. La mayoría (si no todos) IDE emite errores de sintaxis. La razón es en esto, que Java Escapó formato Unicode espera sintaxis "\uXXXX", donde XXXX son 4 dígitos hexadecimales, que son obligatorios. Los intentos de doblar esta cadena de piezas fallan. Por supuesto, "\u" no es lo mismo como "\\u". La primera sintaxis significa escaped 'u', segunda significa escaped backlash (que es backlash) seguido de'u'. Es extraño, que en las páginas de Apache se presenta la utilidad, que hace exactamente este comportamiento. Pero en realidad, es Escape mimic utility . Apache tiene algunas de sus propias utilidades (no las probé), que hacen este trabajo por ti. Puede ser, todavía no es eso, lo que quieres tener. Apache Escape Unicode utilities Pero esta utilidad 1 tener un buen enfoque de la solución. Con la combinación descrita anteriormente (MeraNaamJoker). Mi solución es crear esta cadena mímica escapada y luego convertirlo de nuevo a unicode(para evitar la restricción real de Escape Unicode). Lo usé para copiar texto, por lo que es posible, que en el método uencode sea mejor usar '\\u' excepto '\\\\u'. Pruébelo.

  /**
   * Converts character to the mimic unicode format i.e. '\\u0020'.
   * 
   * This format is the Java source code format.
   * 
   *   CharUtils.unicodeEscaped(' ') = "\\u0020"
   *   CharUtils.unicodeEscaped('A') = "\\u0041"
   * 
   * @param ch  the character to convert
   * @return is in the mimic of escaped unicode string, 
   */
  public static String unicodeEscaped(char ch) {
    String returnStr;
    //String uniTemplate = "\u0000";
    final static String charEsc = "\\u";

    if (ch < 0x10) {
      returnStr = "000" + Integer.toHexString(ch);
    }
    else if (ch < 0x100) {
      returnStr = "00" + Integer.toHexString(ch);
    }
    else if (ch < 0x1000) {
      returnStr = "0" + Integer.toHexString(ch);
    }
    else
      returnStr = "" + Integer.toHexString(ch);

    return charEsc + returnStr;
  }

  /**
   * Converts the string from UTF8 to mimic unicode format i.e. '\\u0020'.
   * notice: i cannot use real unicode format, because this is immediately translated
   * to the character in time of compiling and editor (i.e. netbeans) checking it
   * instead reaal unicode format i.e. '\u0020' i using mimic unicode format '\\u0020'
   * as a string, but it doesn't gives the same results, of course
   * 
   * This format is the Java source code format.
   * 
   *   CharUtils.unicodeEscaped(' ') = "\\u0020"
   *   CharUtils.unicodeEscaped('A') = "\\u0041"
   * 
   * @param String - nationalString in the UTF8 string to convert
   * @return is the string in JAVA unicode mimic escaped
   */
  public String encodeStr(String nationalString) throws UnsupportedEncodingException {
    String convertedString = "";

    for (int i = 0; i < nationalString.length(); i++) {
      Character chs = nationalString.charAt(i);
      convertedString += unicodeEscaped(chs);
    }
    return convertedString;
  }

  /**
   * Converts the string from mimic unicode format i.e. '\\u0020' back to UTF8.
   * 
   * This format is the Java source code format.
   * 
   *   CharUtils.unicodeEscaped(' ') = "\\u0020"
   *   CharUtils.unicodeEscaped('A') = "\\u0041"
   * 
   * @param String - nationalString in the JAVA unicode mimic escaped
   * @return is the string in UTF8 string
   */
  public String uencodeStr(String escapedString) throws UnsupportedEncodingException {
    String convertedString = "";

    String[] arrStr = escapedString.split("\\\\u");
    String str, istr;
    for (int i = 1; i < arrStr.length; i++) {
      str = arrStr[i];
      if (!str.isEmpty()) {
        Integer iI = Integer.parseInt(str, 16);
        char[] chaCha = Character.toChars(iI);
        convertedString += String.valueOf(chaCha);
      }
    }
    return convertedString;
  }

0

Author: hariprasad,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2014-05-28 15:34:05

Aunque esta es una pregunta antigua, hay una manera muy fácil de hacer esto en Java 11 que se lanzó hoy: puede usar una nueva sobrecarga de caracteres.toString():

public static String toString(int codePoint)

Returns a String object representing the specified character (Unicode code point). The result is a string of length 1 or 2, consisting solely of the specified codePoint.

Parameters:
codePoint - the codePoint to be converted

Returns:
the string representation of the specified codePoint

Throws:
IllegalArgumentException - if the specified codePoint is not a valid Unicode code point.

Since:
11

Dado que este método admite cualquier punto de código Unicode, la longitud de la cadena devuelta no es necesariamente 1.

El código necesario para el ejemplo dado en la pregunta es simplemente:

    int codePoint = '\u2202';
    String s = Character.toString(codePoint); // <<< Requires JDK 11 !!!
    System.out.println(s); // Prints ∂

Este enfoque ofrece varias ventajas:

Funciona para cualquier punto de código Unicode que solo aquellos que se pueden manejar usando un char.
Es conciso, y es fácil entender lo que está haciendo el código.
Devuelve el valor como una cadena en lugar de un char[], que a menudo es lo que desea. La respuesta publicada por McDowell es apropiada si desea que el punto de código sea devuelto como char[].

0

Author: skomisa,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2018-09-26 02:18:14

Char c = (char)0x2202; String s= "" +c;

-1

Author: dave110022,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2016-10-20 03:34:45

Aquí hay un bloque para imprimir caracteres unicode entre \u00c0 y \u00ff:

char[] ca = {'\u00c0'};
for (int i = 0; i < 4; i++) {
    for (int j = 0; j < 16; j++) {
        String sc = new String(ca);
        System.out.print(sc + " ");
        ca[0]++;
    }
    System.out.println();
}

-1

Author: fjiang_ca,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2016-10-29 17:05:22

(LA RESPUESTA ESTÁ EN DOT NET 4.5 y en java, debe existir un enfoque similar)

Soy de Bengala Occidental en la INDIA. Entiendo que tu problema es ... Quieres producir similar a ' অ '(Es una letra en idioma bengalí) que tiene Unicode HEX: 0X0985.

Ahora, si conoce este valor con respecto a su idioma, ¿cómo producirá ese símbolo Unicode específico del idioma ?

En Dot Net es tan simple como esto:

int c = 0X0985;
string x = Char.ConvertFromUtf32(c);

Ahora x es su respuesta. Pero esto es hexadecimal por HEXADECIMAL convertir y la conversión de frase a frase es un trabajo para los investigadores: P

-4

Author: Suman Kr. Nath,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/ajaxhispano.com/template/agent.layouts/content.php on line 61
2015-06-21 15:23:31

score 55 · Accepted Answer

Simplemente lanza tu int a un char. Puedes convertir eso en un String usando Character.toString():

String s = Character.toString((char)c);

EDITAR:

Solo recuerde que las secuencias de escape en el código fuente Java (los bits \u) están en hexadecimal, por lo que si está tratando de reproducir una secuencia de escape, necesitará algo como int c = 0x2202.