OpenAI abuse

Using embedding models for generation

You can access one of OpenAI's hidden completion models by setting the "model" parameter in the OpenAI playground URL to its name.

This is an embedding model that's not supposed to be used to generate text, and isn't publicly listed anywhere as a completion model.

Brute-forcing model names

Taking advantage of an inconsistency in the OpenAI API error messages, I noticed that you can determine if a given name corresponds to a private model on OpenAI. After ~22k requests to the their endpoint, here are some internal model names I found that OpenAI has not publicly acknowledged the existence of:

code-cushman canary-ada gpt-4-base babbage-v2 ada-v2 davinci-v2 curie-v2 canary-text-embedding-3-small canary-text-embedding-3-large

After I made a tweet publicizing this exploit, OpenAI patched it. (<3)

4-chan leak

I also found & brought to public attention a large 4-chan leak of OpenAI model names. It included some, but not all, of the names I independently found. Here's some highlights:

gpt-3.5-turbo-16k-scientist gpt-3.5-turbo-16k-superhuman gpt4t-1p-231016-context-sharding-8-shard-treatment fact-factory-robot-magic gpt35-1p-f-240108-720-joshtest-delete-or-you-will-be-fired onesmallstep4man boppenheimer dalle-3p0-inpaint maraschino-rr-d36 test-jenia-sahara-mm gpt-4-mem box-dev gpt-4-base-sft-32k

The original file is down, so I've got it hosted here: leak.txt

Timing side-channels

After OpenAI patched my previous exploit, I tried a timing side-channel attack. A bit of a long shot, as it's conditional on them authenticating API requests in a specific way. Here's a response-time histogram, resulting from ~50k requests: