<-- henry : writing

OpenAI abuse


Using embedding models for generation

You can access one of OpenAI's hidden completion models by setting the "model" parameter in the OpenAI playground URL to its name.

This is an embedding model that's not supposed to be used to generate text, and isn't publicly listed anywhere as a completion model.

Brute-forcing model names

Taking advantage of an inconsistency in the OpenAI API error messages, I noticed that you can determine if a given name corresponds to a private model on OpenAI. After ~22k requests to the their endpoint, here are some internal model names I found that OpenAI has not publicly acknowledged the existence of:

code-cushman
canary-ada
gpt-4-base
babbage-v2
ada-v2
davinci-v2
curie-v2
canary-text-embedding-3-small
canary-text-embedding-3-large

After I made a tweet publicizing this exploit, OpenAI patched it. (<3)

4-chan leak

I also found & brought to public attention a large 4-chan leak of OpenAI model names. It included some, but not all, of the names I independently found. Here's some highlights:

gpt-3.5-turbo-16k-scientist
gpt-3.5-turbo-16k-superhuman
gpt4t-1p-231016-context-sharding-8-shard-treatment
fact-factory-robot-magic
gpt35-1p-f-240108-720-joshtest-delete-or-you-will-be-fired
onesmallstep4man
boppenheimer
dalle-3p0-inpaint
maraschino-rr-d36
test-jenia-sahara-mm
gpt-4-mem
box-dev
gpt-4-base-sft-32k

The original file is down, so I've got it hosted here: leak.txt

Timing side-channels

After OpenAI patched my previous exploit, I tried a timing side-channel attack. A bit of a long shot, as it's conditional on them authenticating API requests in a specific way. Here's a response-time histogram, resulting from ~50k requests: