Open-source Text Embeddings

We made an OpenAI-like API so that you can instantly use better, open-source text embedding models. Like the models it is built upon, the API is fully open-source on GitHub under an MIT license. We also host a free (within reason) version of the API for you to use today.

About Trelent

When we're not open-sourcing our infrastructure, Trelent uses LLMs to create complex process automation workflows. You can think of it somewhat similar to an agent, though we really don't like the term. We are venture-backed by multiple funds, have thousands of users, and are building fast.

We're hiring a founding engineer to help with our NextJS and Rust stack - come build with us! Please apply at jobs.trelent.com.

Usage

Integration is as easy as changing the OpenAI base URL. Here are some examples:


# Python SDK
openai.api_base = "https://api.opentextembeddings.com/v1"
openai.Embedding.create(
  model="bge-large-en",
  input="Hey, is this API working?"
)
        

# Langchain
from langchain.embeddings import OpenAIEmbeddings

embeddings_model = OpenAIEmbeddings(
  openai_api_base="https://api.opentextembeddings.com/v1",
  model="bge-large-en"
)
embeddings = embeddings_model.embed_documents([
  "I think it is!",
  "Sweet, I love better embeddings!"
])
        

// JavaScript SDK
const openai = new OpenAI({
    apiBase: 'https://api.opentextembeddings.com/v1',
});

const embedding = await openai.embeddings.create({
  model: "bge-base-en",
  input: "Yeah, it's so easy to use!",
});
        

# cURL + REST API
curl https://api.opentextembeddings.com/v1/embeddings \
    -H "Content-Type: application/json" \
    -d '{
        "input": "I know. Gonna go share this with everyone right now!",
        "model": "gte-large"
    }'
        

Supported model info

Model

MTEB Score

Dimensions

Seq Length

Cost per 10M tokens

bge-large-en 63.98% 1024 512 $0
bge-base-en 63.36% 768 512 $0
gte-large 63.13% 1024 512 $0
gte-base 62.39% 768 512 $0
text-embedding-ada-002 60.99% 1536 8192 $1

Legend

Model is the name of the model to use in the API. If it has a red background, we do not support it.

MTEB Score is the average score of the model within Massive Text Embedding Benchmark, which consists of over 56 datasets.

Dimensions is the number of dimensions of the embedding vectors. A smaller number of dimensions means the embeddings are cheaper to store in a vector DB.

Seq Length is the maximum number of tokens in a sequence. Typically anything above 256 is enough for most use cases, as most embedding models perform poorly with large chunks of text.

Cost is the cost per 10M tokens.

Limitations

To keep this API free and available to as many people as possible, you may only make 100 requests each hour. A single request with multiple inputs specified is counted against this limit as the number of inputs provided. For example, a request with 10 inputs counts as 10 requests against this rate-limit.

If you need more than 100 requests per hour, please feel free to self-host on your own infrastructure or contact us - we may be able to provide a custom limit for good causes. For bulk requests, please contact us so we can negotiate some reasonable pricing on a dedicated cluster.

Our goal is to keep this API free for as long as possible, but we reserve the right to change this at any time. If you would like to support this project, please consider reaching out to us to add nodes to our cluster.