HuggingFace Pipeline
HuggingFace 中的pipeline
pipline是使用模型进行推理的一种很好且简单的方法。使用pipline()可以使用**Hub**中的任何模型来推理任何语言、计算机视觉、语音和多模式任务。而不用考虑模型底层代码。
这些pipline是从库中抽象出大部分复杂代码的对象,提供专用于多个任务的简单 API
简单使用
每个任务都有其关联的pipline,使用应用于特定任务的pipeline的时候,能够自动加载默认模型和能够处理该任务的预处理类
- 首先创建**pipeline()**并指定推理任务:
1 | from transformers import pipeline |
- 将输入文本传递给**pipeline()**:
1 | generator("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac") |
3.也可以更换参数量更大的模型进行推理
1 | generator = pipeline(model="openai/whisper-large") |
4.可以将pipline用在批量数据或者数据集上
1 | generator( |
Parameters
参数列表如下:
-
task (
str
) — The task defining which pipeline will be returned. Currently accepted tasks are:"audio-classification"
: will return a AudioClassificationPipeline."automatic-speech-recognition"
: will return a AutomaticSpeechRecognitionPipeline."conversational"
: will return a ConversationalPipeline."depth-estimation"
: will return a DepthEstimationPipeline."document-question-answering"
: will return a DocumentQuestionAnsweringPipeline."feature-extraction"
: will return a FeatureExtractionPipeline."fill-mask"
: will return a FillMaskPipeline:."image-classification"
: will return a ImageClassificationPipeline."image-segmentation"
: will return a ImageSegmentationPipeline."image-to-text"
: will return a ImageToTextPipeline."mask-generation"
: will return aMaskGenerationPipeline
."object-detection"
: will return a ObjectDetectionPipeline."question-answering"
: will return a QuestionAnsweringPipeline."summarization"
: will return a SummarizationPipeline."table-question-answering"
: will return a TableQuestionAnsweringPipeline."text2text-generation"
: will return a Text2TextGenerationPipeline."text-classification"
(alias"sentiment-analysis"
available): will return a TextClassificationPipeline."text-generation"
: will return a TextGenerationPipeline:."token-classification"
(alias"ner"
available): will return a TokenClassificationPipeline."translation"
: will return a TranslationPipeline."translation_xx_to_yy"
: will return a TranslationPipeline."video-classification"
: will return a VideoClassificationPipeline."visual-question-answering"
: will return a VisualQuestionAnsweringPipeline."zero-shot-classification"
: will return a ZeroShotClassificationPipeline."zero-shot-image-classification"
: will return a ZeroShotImageClassificationPipeline."zero-shot-audio-classification"
: will return a ZeroShotAudioClassificationPipeline."zero-shot-object-detection"
: will return a ZeroShotObjectDetectionPipeline.
-
model (
str
or PreTrainedModel or TFPreTrainedModel, optional) — The model that will be used by the pipeline to make predictions. This can be a model identifier or an actual instance of a pretrained model inheriting from PreTrainedModel (for PyTorch) or TFPreTrainedModel (for TensorFlow).If not provided, the default for the
task
will be loaded. -
config (
str
or PretrainedConfig, optional) — The configuration that will be used by the pipeline to instantiate the model. This can be a model identifier or an actual pretrained model configuration inheriting from PretrainedConfig.If not provided, the default configuration file for the requested model will be used. That means that if
model
is given, its default configuration will be used. However, ifmodel
is not supplied, thistask
’s default model’s config is used instead. -
tokenizer (
str
or PreTrainedTokenizer, optional) — The tokenizer that will be used by the pipeline to encode data for the model. This can be a model identifier or an actual pretrained tokenizer inheriting from PreTrainedTokenizer.If not provided, the default tokenizer for the given
model
will be loaded (if it is a string). Ifmodel
is not specified or not a string, then the default tokenizer forconfig
is loaded (if it is a string). However, ifconfig
is also not given or not a string, then the default tokenizer for the giventask
will be loaded. -
feature_extractor (
str
orPreTrainedFeatureExtractor
, optional) — The feature extractor that will be used by the pipeline to encode data for the model. This can be a model identifier or an actual pretrained feature extractor inheriting fromPreTrainedFeatureExtractor
.Feature extractors are used for non-NLP models, such as Speech or Vision models as well as multi-modal models. Multi-modal models will also require a tokenizer to be passed.
If not provided, the default feature extractor for the given
model
will be loaded (if it is a string). Ifmodel
is not specified or not a string, then the default feature extractor forconfig
is loaded (if it is a string). However, ifconfig
is also not given or not a string, then the default feature extractor for the giventask
will be loaded. -
framework (
str
, optional) — The framework to use, either"pt"
for PyTorch or"tf"
for TensorFlow. The specified framework must be installed.If no framework is specified, will default to the one currently installed. If no framework is specified and both frameworks are installed, will default to the framework of the
model
, or to PyTorch if no model is provided. -
revision (
str
, optional, defaults to"main"
) — When passing a task name or a string model identifier: The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. -
use_fast (
bool
, optional, defaults toTrue
) — Whether or not to use a Fast tokenizer if possible (a PreTrainedTokenizerFast). -
use_auth_token (
str
or bool, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). -
device (
int
orstr
ortorch.device
) — Defines the device (e.g.,"cpu"
,"cuda:1"
,"mps"
, or a GPU ordinal rank like1
) on which this pipeline will be allocated. -
device_map (
str
orDict[str, Union[int, str, torch.device]
, optional) — Sent directly asmodel_kwargs
(just a simpler shortcut). Whenaccelerate
library is present, setdevice_map="auto"
to compute the most optimizeddevice_map
automatically (see here for more information).Do not use
device_map
ANDdevice
at the same time as they will conflict -
torch_dtype (
str
ortorch.dtype
, optional) — Sent directly asmodel_kwargs
(just a simpler shortcut) to use the available precision for this model (torch.float16
,torch.bfloat16
, … or"auto"
). -
trust_remote_code (
bool
, optional, defaults toFalse
) — Whether or not to allow for custom code defined on the Hub in their own modeling, configuration, tokenization or even pipeline files. This option should only be set toTrue
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. -
model_kwargs (
Dict[str, Any]
, optional) — Additional dictionary of keyword arguments passed along to the model’sfrom_pretrained(..., **model_kwargs)
topction. -
kwargs (
Dict[str, Any]
, optional) — Additional keyword arguments passed along to the specific pipeline init (see the documentation for the corresponding pipeline class for possible values).
Device
- device (
int
orstr
ortorch.device
) — Defines the device (e.g.,"cpu"
,"cuda:1"
,"mps"
, or a GPU ordinal rank like1
) on which this pipeline will be allocated.
1 | generator = pipeline(model="openai/whisper-large", device=0) |
如果模型太大,使用单张GPU不够,可以通过设置 device_map = “auto”使用 🤗 Accelerate 自动分配显存,加载模型参数
1 | #!pip install accelerate |
如果你使用了device_map = “auto”,那么在实例化pipline的时候就不需要设置device参数的值
- device_map (
str
orDict[str, Union[int, str, torch.device]
, optional) — Sent directly asmodel_kwargs
(just a simpler shortcut). Whenaccelerate
library is present, setdevice_map="auto"
to compute the most optimizeddevice_map
automatically (see here for more information).
Batch Size
默认情况下pipline不会进行批处理,但是可以通过下面的方式使用
1 | generator = pipeline(model="openai/whisper-large", device=0, batch_size=2) |
上述的代码使用pipline对10个音频文件进行推理,但是会按照两个批次传递给位于GPU上模型。
对于list或者dataset或者generator都可以使用批处理
1 | from transformers import pipeline |
但是批处理也不是都会给模型加速,可能会变慢,甚至导致内存泄漏
Dataset1
1 | from transformers import pipeline |
1 | # On GTX 970 |
Dataset2
1 | class MyDataset(Dataset): |
1 | ------------------------------ |
由于第二个数据集中,存在个别特别长的数据,就导致同一批次的所有数据都要padding到400个token长,那个一个batch就会从之前的【64,4】变成【64,400】,当批次再增大到256的时候,程序就会崩溃。
但是对于这样的问题,没有通用的解决方案,只能根据经验和实际的数据进行测试
下面这些情况不要使用Batch
对于模型推理的实时性要求较高、如果使用CPU、如果对于可能的输入的sequence_length未知
如果使用了batching,那么需要注意OOMs(out of memory)
Pipeline chunk batching
像zero-shot- classification和QA问题这样的,一个输入可能需要对一个模型做多个前向传播,所以需要使用ChunkPipeline而不是普通的Pipeline
普通模型:
1 | preprocessed = pipe.preprocess(inputs) |
chunk pipeline:
1 | all_model_outputs = [] |
在dataset上使用pipeline
从🤗 Datasets加载并迭代数据集
1 | # KeyDataset is a util that will just output the item we're interested in. |
在大模型上通过accelerate使用pipeline
1 | # pip install accelerate |
也可以传递 8 位加载模型load_in_8bit=True
1 | # pip install accelerate bitsandbytes |