llm-proxying

Following approaches are available for API proxying. OpenAPI is most popular. We want local proxy, that will talk OpenAPI but will proxy to Gemini and Sonnet.

Proxying Gemini

See https://ai.google.dev/gemini-api/docs/openai
Google provides a base URL for OpenAPI proxy. You just set the base URL, you are fine:
Using Python:

https://generativelanguage.googleapis.com/v1beta/openai/


    from openai import OpenAI

    client = OpenAI(
        api_key="GEMINI_API_KEY",
        base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
    )

    response = client.chat.completions.create(
        model="gemini-2.0-flash",
        n=1,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {
                "role": "user",
                "content": "Explain to me how AI works"
            }
        ]
    )

    print(response.choices[0].message)

Using NodeJS:


    import OpenAI from "openai";

    const openai = new OpenAI({
        apiKey: "GEMINI_API_KEY",
        baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/"
    });

    const response = await openai.chat.completions.create({
        model: "gemini-2.0-flash",
        messages: [
            { role: "system", content: "You are a helpful assistant." },
            {
                role: "user",
                content: "Explain to me how AI works",
            },
        ],
    });

    console.log(response.choices[0].message);

REST API:


    curl "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer GEMINI_API_KEY" \
    -d '{
        "model": "gemini-2.0-flash",
        "messages": [
            {"role": "user", "content": "Explain to me how AI works"}
        ]
        }'