Sharded fork of Salesforce/codegen-6B-mono with a custom pipeline.py

This repository implements a custom pipeline task for text-generation for 🤗 Inference Endpoints for LLM inference using bitsandbytes quantization. The code for the customized pipeline is in the pipeline.py.

There is also a notebook included.

expected Request payload

{
    "inputs": "# load distilbert model and initialize text-classification pipeline\nmodel_id = 'distil",
    "parameters": {
        "top_k": 100,
        "max_length": 64,
        "early_stopping": true,
        "do_sample": true,
        "eos_token_id": 50256,
    }
}

below is an example on how to run a request using Python and requests.

Run Request

import json
from typing import List
import requests as r
import base64
ENDPOINT_URL = ""
HF_TOKEN = ""

parameters={
        "top_k": 100,
        "max_length": 64,
        "early_stopping": True,
        "do_sample": True,
        "eos_token_id": 50256,
    }

def predict(code_snippet:str=None):
    payload = {"inputs": code_snippet,"parameters": parameters}
    response = r.post(
        ENDPOINT_URL, headers={"Authorization": f"Bearer {HF_TOKEN}"}, json=payload
    )
    return response.json()
prediction = predict(
    code_snippet="# load distilbert model and initialize text-classification pipeline\nmodel_id = 'distil"
)

expected output

{'generated_text': "# load distilbert model and initialize text-classification pipeline\nmodel_id = 'distilbert-base-uncased'\nmodel_url = 'https://tfhub.dev/tensorflow/small_bert/1'\n\nmodel_dir = './distilBERT'"}
Downloads last month
13
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.