Warnings on the Use of the Online Functions of an IME
In an environment that deals with multibyte text such as Japanese, an IME (Input Method Editor) is an indispensable function. Recently, cloud-related functions that need to always be connected to the Internet have been implemented in these IMEs. These are valuable functions if used well, but here I give an explanation of some warnings concerning their use. The definitions of the cloud functions are different for each IME, but they are generally similar to the following:
- Saving a user dictionary on an external server (dictionary synchronization)
- Obtaining conversion candidates from an external server (cloud conversion)
These functions are extremely attractive from the viewpoint of precision and efficiency of text input. However, some warnings are needed from the viewpoint of security.
Saving a User Dictionary on an External Server (Dictionary Synchronization)
Most IMEs automatically learn based on the user’s input data, and that enables them to perform effective conversion. This includes items such as automatically learned words and words that the user has entered on his/her own, which is called a user dictionary. With a cloud function for a user dictionary, there is synchronization across multiple terminals. It is possible to make conversion more efficient by storing the user dictionary on an external server and sharing it across multiple terminals that the user owns. For the purpose of dictionary protection, some IMEs require authorization and require the user to explicitly enable the functions.
However, since data is accumulated in user dictionaries by automatic learning, unintended words may be registered. If one tries to obtain data by using search functions and export functions with a dictionary that one has used extensively, it may be that words have been registered that the user does not want to show to others. One must also note that some words may have been registered for the purpose of input abbreviation. (Example: one’s own credit card number is registered as the word “credit1.”)
When a dictionary synchronization function is used, the dictionary is saved on an external server, so it is necessary to pay attention to managing the authentication information used for protection. And, when handling information that would be a problem if transmitted, such as a credit card number, it is necessary to disable the IME or to make sure that such information has not been registered in the dictionary.
Obtaining Conversion Candidates from an External Server (Cloud Conversion)
An IME includes a dictionary that contains items such as word information that is used for conversion, and this may be called a “system dictionary” or a “standard dictionary.” New words that are used in daily life can come along at any time, while the words stored in the dictionary do not change. Thus the words that cannot be converted by this same dictionary will increase over time. To account for this, the dictionary may be updated by an online update, and, in a more advanced form, there are functions that transmit input data to an external server in order to obtain conversion candidates. The benefit of this is that one need not wait for an update. Instead, conversion candidates can be obtained virtually in real time. This is also a useful function for mobile devices that are unable to hold large amounts of dictionary data due to capacity limitations.
In this situation, only conversion is being performed, but it seems that some IMEs could convert without user authorization. As one can tell from the way conversion candidates are obtained from the external server, input data is externally transmitted. The following shows the results of investigating the transmitted data of a free IME that supports cloud conversion:
The input contents are transmitted one by one as shown here, until finally the confirmed contents are transmitted:
- o
- ose
- osewa
- osewa ni
- osewa ni na
- osewa ni nat
- osewa ni natte
- osewa ni natte o
- osewa ni natte ori
- osewa ni natte orima
- osewa ni natte orimasu
- osewa ni natte orimasu (conversion confirmed)
[Translator’s note: In the above, the Japanese original shows Japanese phonetic symbols, and they have been transliterated here. Each line contains one more Japanese phonetic symbol than the previous line.]
During the input, information that can identify the client is not transmitted, but at confirmation time the name of the input target application and the user’s security identifier (SID) are sent simultaneously.
So what happens with text that is a little longer?
== Input Text ==
Company AAA Mr. BBB
Thank you for your continuing support.
The password for the attached file that I just sent by mail is
“pw123456789.”
Sorry for the trouble, but please enter it and return it by next week.
I appreciate your attention to this matter.
Code language: plaintext (plaintext)
[Translator: In the above, the Japanese original contains normal Japanese text that is a mixture of Chinese characters and Japanese phonetic symbols. It has been translated here.]
== Text Reconstructed from the Transmitted Data (Appropriate Line Terminations Inserted) ==
AAA sha BBB sama
itsumo osewa ni natte orimasu
saki hodo me–ru de okurimashita tenpu fairu no pasuwa–do desu ga
“pw123456789” to narimasu.
otesuu desu ga raishuu made ni kinyuu shite gohensou kudasai.
ijou, yoroshiku onegai itashimasu
Code language: plaintext (plaintext)
[Translator: In the above, the Japanese original shows Japanese phonetic characters corresponding to the same text shown previously. It has been transliterated here.]
According to the IME status during input (full-width or half-width), half-width text may be transmitted as full-width, but we see that the input data is transmitted mostly as is. If the IME had been disabled and only half-width alphanumeric characters had been entered, then they could not be sent, so this will happen only when the IME has been enabled. This investigation has not covered cloud conversion for all IMEs, but as long as conversion candidates are obtained from an external server, it is assumed that the functions must be similar.
Conclusion
When a cloud conversion function is enabled as described above, even for the editing of text files, charts, and presentations on a local terminal, the input data is externally transmitted. Unlike keyword input, such as to a search engine, if a user does not understand how the system works it can be difficult to realize that the process of IME conversion is transmitting data externally.
Many IMEs require that the cloud-related functions be clearly enabled. If the user realizes that data will be transmitted externally and enables it explicitly, then I suppose there is no problem.
However, for some free IMEs, the recommended installation setting is to be automatically enabled. And such IMEs may be bundled and installed along with free software. If one doesn’t understand the options that are displayed at installation time and just pushes the OK button, the IME will also be installed and the cloud conversion function will be enabled. Or it may be included with preinstalled software in a PC from the manufacturer. In that case, it may already be installed and enabled by the specifications at shipping time. In any case, the user is not enabling it intentionally, and, if used that way, then information will be sent unintentionally.
In particular, in organizations such as businesses, if it is determined that information handled by an IME is to be kept within the organization, then it is necessary to take into account the rules for the usage and management of software within the organization, and the following measures should be investigated:
- Make sure that the function is not used when specifying user environments.
- Prevent communication by the function such as at firewalls that define the organization’s boundary.
IIJ-SECT, Security Diary “IMEのオンライン機能利用における注意について” (The Japanese original of this article), https://sect.iij.ad.jp/d/2013/12/104971.html