HEX
Server: Apache/2.4.65 (Ubuntu)
System: Linux ielts-store-v2 6.8.0-1036-gcp #38~22.04.1-Ubuntu SMP Thu Aug 14 01:19:18 UTC 2025 x86_64
User: root (0)
PHP: 7.2.34-54+ubuntu20.04.1+deb.sury.org+1
Disabled: pcntl_alarm,pcntl_fork,pcntl_waitpid,pcntl_wait,pcntl_wifexited,pcntl_wifstopped,pcntl_wifsignaled,pcntl_wifcontinued,pcntl_wexitstatus,pcntl_wtermsig,pcntl_wstopsig,pcntl_signal,pcntl_signal_get_handler,pcntl_signal_dispatch,pcntl_get_last_error,pcntl_strerror,pcntl_sigprocmask,pcntl_sigwaitinfo,pcntl_sigtimedwait,pcntl_exec,pcntl_getpriority,pcntl_setpriority,pcntl_async_signals,
Upload Files
File: //snap/google-cloud-cli/396/platform/gsutil/third_party/charset_normalizer/docs/user/cli.rst
Command Line Interface
======================

charset-normalizer ship with a CLI that should be available as `normalizer`.
This is a great tool to fully exploit the detector capabilities without having to write Python code.

Possible use cases:

#. Quickly discover probable originating charset from a file.
#. I want to quickly convert a non Unicode file to Unicode.
#. Debug the charset-detector.

Down below, we will guide you through some basic examples.

Arguments
---------

You may simply invoke `normalizer -h` (with the h(elp) flag) to understand the basics.

::

   usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
                     file [file ...]

   The Real First Universal Charset Detector. Discover originating encoding used
   on text file. Normalize text to unicode.

   positional arguments:
     files                 File(s) to be analysed

   optional arguments:
     -h, --help            show this help message and exit
     -v, --verbose         Display complementary information about file if any.
                           Stdout will contain logs about the detection process.
     -a, --with-alternative
                           Output complementary possibilities if any. Top-level
                           JSON WILL be a list.
     -n, --normalize       Permit to normalize input file. If not set, program
                           does not write anything.
     -m, --minimal         Only output the charset detected to STDOUT. Disabling
                           JSON output.
     -r, --replace         Replace file when trying to normalize it instead of
                           creating a new one.
     -f, --force           Replace file without asking if you are sure, use this
                           flag with caution.
     -t THRESHOLD, --threshold THRESHOLD
                           Define a custom maximum amount of chaos allowed in
                           decoded content. 0. <= chaos <= 1.
     --version             Show version information and exit.

.. code:: bash

   normalizer ./data/sample.1.fr.srt

You may also run the command line interface using:

.. code:: bash

   python -m charset_normalizer ./data/sample.1.fr.srt


Main JSON Output
----------------

🎉 Since version 1.4.0 the CLI produce easily usable stdout result in
JSON format.

.. code:: json

   {
       "path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
       "encoding": "cp1252",
       "encoding_aliases": [
           "1252",
           "windows_1252"
       ],
       "alternative_encodings": [
           "cp1254",
           "cp1256",
           "cp1258",
           "iso8859_14",
           "iso8859_15",
           "iso8859_16",
           "iso8859_3",
           "iso8859_9",
           "latin_1",
           "mbcs"
       ],
       "language": "French",
       "alphabets": [
           "Basic Latin",
           "Latin-1 Supplement"
       ],
       "has_sig_or_bom": false,
       "chaos": 0.149,
       "coherence": 97.152,
       "unicode_path": null,
       "is_preferred": true
   }


I recommend the `jq` command line tool to easily parse and exploit specific data from the produced JSON.

Multiple File Input
-------------------

It is possible to give multiple files to the CLI. It will produce a list instead of an object at the top level.
When using the `-m` (minimal output) it will rather print one result (encoding) per line.

Unicode Conversion
------------------

If you desire to convert any file to Unicode you will need to append the flag `-n`. It will produce another file,
it won't replace it by default.

The newly created file path will be declared in `unicode_path` (JSON output).