Quantization Error Examples

quark_docs_repo_onnx_tutorial_microscaling_quantization.rst

In this documentation, AMD Quark is sometimes referred to simply as "Quark" for ease of reference. When you encounter the term "Quark" without the "AMD" prefix, it specifically refers to the AMD Quark ...

GitHub

custom_quantization_int8_example.py

self.register_buffer("weight", torch.zeros((out_features, in_features), dtype=torch.int8)) self.register_buffer("weight_scale", torch.zeros((out_features, 1), dtype ...

Hackaday

TurboQuant: Reducing LLM Memory Usage With Vector Quantization

Large language models (LLMs) aren’t actually giant computer brains. Instead, they are massive vector spaces in which the probabilities of tokens occurring in a specific order is encoded. Billions of ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

quark_docs_repo_onnx_tutorial_microscaling_quantization.rst

custom_quantization_int8_example.py

TurboQuant: Reducing LLM Memory Usage With Vector Quantization

Trending now