Browse Source

fix: audio api endpoint filetype check

RFC2046 allows the Content-Type field to have additional parameters
after the main type/subtype information (Section 1).

Following RFC4281, many applications put codec information inside
parameters in the Content-Type. This is especially common for formats
that support many codecs, such as Ogg (RFC5334, Section 4).

The `/api/audio/transcriptions` endpoint is currently rejecting files
that contain parameters in the Content-Type field with a bad request
error.

This commit changes the current check in order to accept any
Content-Type field that begins with a supported type/subtype as listed
in the `supported_filetypes` tuple.

Since Content-Type here is provided by the user, I believe this check
is meant to prevent honest mistakes, like posting a PDF to an audio
processing endpoint, not as a security measure against possibly
malicious use. Therefore, I think it's OK not to validate the rest of
the field.
Hermógenes Oliveira 1 month ago
parent
commit
e936d7b53d
1 changed files with 3 additions and 1 deletions
  1. 3 1
      backend/open_webui/routers/audio.py

+ 3 - 1
backend/open_webui/routers/audio.py

@@ -625,7 +625,9 @@ def transcription(
 ):
 ):
     log.info(f"file.content_type: {file.content_type}")
     log.info(f"file.content_type: {file.content_type}")
 
 
-    if file.content_type not in ["audio/mpeg", "audio/wav", "audio/ogg", "audio/x-m4a"]:
+    supported_filetypes = ("audio/mpeg", "audio/wav", "audio/ogg", "audio/x-m4a")
+
+    if not file.content_type.startswith(supported_filetypes):
         raise HTTPException(
         raise HTTPException(
             status_code=status.HTTP_400_BAD_REQUEST,
             status_code=status.HTTP_400_BAD_REQUEST,
             detail=ERROR_MESSAGES.FILE_NOT_SUPPORTED,
             detail=ERROR_MESSAGES.FILE_NOT_SUPPORTED,