Spaces:

abhi02072005
/

ai-foley-studio-backend

Sleeping

App Files Files Community

abhi02072005 commited on Oct 28

Commit

8ca7c55

1 Parent(s): ba32884

final

Browse files

Files changed (5) hide show

FIXES_SUMMARY.md +105 -0
agent2.py +58 -13
link2.py +8 -8
qsec.py +10 -0
real.py +15 -0

FIXES_SUMMARY.md ADDED Viewed

	@@ -0,0 +1,105 @@

+# Bug Fixes Summary - Audio Video Generation
+## Date: October 29, 2025
+## Issues Fixed
+### 1. **Import Conflict in link2.py**
+**Problem:** Line 53 imported `process_video_for_footstep_audio` from `agent.py`, which overrode the correct import from `agent2.py` on line 21.
+**Fix:**
+- Removed the conflicting import from `agent.py`
+- Renamed the import from `agent2.py` to `process_video_agent2` for clarity
+- Updated the function call in `generate_audio_video_task()` to use `process_video_agent2()`
+### 2. **File Path Validation in link2.py**
+**Problem:** No validation to check if the audio file was successfully generated.
+**Fix:**
+- Added file existence check after calling `process_video_agent2()`
+- Raises an exception if the file doesn't exist, allowing proper error handling
+### 3. **Missing Error Handling in real.py AudioGenerator**
+**Problem:** `generate_footstep()` returned `None` when audio extraction failed, causing `TypeError: object of type 'NoneType' has no len()` in `create_audio_track()`.
+**Fix in `generate_footstep()`:**
+- Added fallback audio generation when `extract_second_audio_librosa()` returns `None`
+- Creates a synthetic footstep sound using a dampened sine wave
+**Fix in `create_audio_track()`:**
+- Added safety check to validate `step_sound` is not `None` or empty
+- Skips problematic events with a warning instead of crashing
+### 4. **Improved Error Handling in agent2.py**
+**Problem:** `process_video_for_footstep_audio()` could return `None`, causing downstream failures.
+**Fix:**
+- Wrapped entire function in try-catch block
+- Added `create_fallback_audio()` function that generates a synthetic footstep sound
+- Returns fallback audio path if any step fails (frame extraction, base64 conversion, LLM generation)
+- Added file existence verification before returning the path
+- Ensures `./audio` directory exists
+### 5. **Better Error Messages in qsec.py**
+**Problem:** Generic error messages didn't help identify the root cause (file not found vs. other errors).
+**Fix:**
+- Added file path validation before attempting to load audio
+- Added specific error message when file path is empty
+- Added specific error message when file doesn't exist
+- Added `os` import for path validation
+## Files Modified
+1. **link2.py**
+   - Fixed import conflict (removed `agent` import, renamed `agent2` import)
+   - Added audio file validation in `generate_audio_video_task()`
+2. **real.py**
+   - Added fallback audio generation in `generate_footstep()`
+   - Added null check in `create_audio_track()`
+3. **agent2.py**
+   - Added comprehensive error handling in `process_video_for_footstep_audio()`
+   - Created `create_fallback_audio()` function
+   - Ensured audio directory exists
+   - Added file existence verification
+4. **qsec.py**
+   - Added `os` import
+   - Added file path validation with specific error messages
+## Testing Recommendations
+1. Test with valid video file - should generate audio successfully
+2. Test with corrupted video file - should use fallback audio
+3. Test when LLM fails - should use fallback audio
+4. Test when audio extraction fails - should generate synthetic footstep sound
+5. Verify all error messages are clear and helpful
+## Technical Details
+### Error Flow (Before Fix)
+```
+link2.py → agent.py (wrong import) → returns string name →
+qsec.py tries to load string as file → fails → returns (None, None) →
+real.py tries len(None) → TypeError crash
+```
+### Error Flow (After Fix)
+```
+link2.py → agent2.py → generates actual audio file → returns file path →
+qsec.py validates path → loads audio → returns audio array →
+real.py uses audio (or fallback if None) → success
+```
+## Fallback Audio Details
+- **Sample Rate:** 44,100 Hz
+- **Duration:** 1 second
+- **Waveform:** Combination of:
+  - 80 Hz sine wave (low frequency thump) with exponential decay
+  - Gaussian noise burst with faster exponential decay
+  - Normalized to 80% maximum amplitude
+- **Format:** WAV file (16-bit PCM)
+- **Location:** `./audio/fallback_footstep.wav`

agent2.py CHANGED Viewed

@@ -125,6 +125,9 @@ chain = (
 def analyze_image_and_generate_audio(image_base64):
     try:
         result = chain.invoke(image_base64)
         p=open("ss.txt","w")
         p.write(str(result))
@@ -155,19 +158,61 @@ def analyze_image_and_generate_audio(image_base64):
 def process_video_for_footstep_audio(video_path):
-    print("🎥 Extracting first frame...")
-    first_frame = extract_first_frame(video_path)
-    if first_frame is None:
-        print("❌ Failed to extract frame from video")
-        return None
-    image_base64 = image_to_base64(first_frame)
-    if image_base64 is None:
-        print("❌ Failed to convert image to base64")
-        return None
-    print("🤖 Generating footstep audio from LLM...")
-    audio_path = analyze_image_and_generate_audio(image_base64)
-    print(audio_path)
-    return audio_path

 def analyze_image_and_generate_audio(image_base64):
     try:
+        # Ensure audio directory exists
+        os.makedirs("./audio", exist_ok=True)
         result = chain.invoke(image_base64)
         p=open("ss.txt","w")
         p.write(str(result))
 def process_video_for_footstep_audio(video_path):
+    try:
+        print("🎥 Extracting first frame...")
+        first_frame = extract_first_frame(video_path)
+        if first_frame is None:
+            print("❌ Failed to extract frame from video")
+            # Return a fallback audio path
+            return create_fallback_audio()
+        image_base64 = image_to_base64(first_frame)
+        if image_base64 is None:
+            print("❌ Failed to convert image to base64")
+            return create_fallback_audio()
+        print("🤖 Generating footstep audio from LLM...")
+        audio_path = analyze_image_and_generate_audio(image_base64)
+        # Verify the file exists
+        if audio_path and os.path.exists(audio_path):
+            print(f"✅ Audio generated successfully at: {audio_path}")
+            return audio_path
+        else:
+            print("❌ Generated audio file not found")
+            return create_fallback_audio()
+    except Exception as e:
+        print(f"❌ Error in process_video_for_footstep_audio: {e}")
+        import traceback
+        traceback.print_exc()
+        return create_fallback_audio()
+def create_fallback_audio():
+    """Create a simple fallback audio file"""
+    try:
+        os.makedirs("./audio", exist_ok=True)
+        fallback_path = "./audio/fallback_footstep.wav"
+        # Create a simple footstep-like sound
+        sample_rate = 44100
+        duration = 1.0
+        t = np.linspace(0, duration, int(sample_rate * duration))
+        # Combination of low frequency thump and noise burst
+        footstep = (
+            np.sin(2 * np.pi * 80 * t) * np.exp(-8 * t) +  # Low frequency thump
+            np.random.normal(0, 0.1, len(t)) * np.exp(-15 * t)  # Noise burst
+        )
+        footstep = footstep / np.max(np.abs(footstep)) * 0.8
+        write(fallback_path, sample_rate, np.int16(footstep * 32767))
+        print(f"✅ Created fallback audio at: {fallback_path}")
+        return fallback_path
+    except Exception as e:
+        print(f"❌ Failed to create fallback audio: {e}")
+        return None

link2.py CHANGED Viewed

@@ -19,7 +19,7 @@ import asyncio
 from concurrent.futures import ThreadPoolExecutor
 import base64
 from io import BytesIO
-from agent2 import process_video_for_footstep_audio
 # Suppress warnings
@@ -53,7 +53,6 @@ from real import (
 )
 # Import your custom modules
-from agent import process_video_for_footstep_audio
 from sound_agent import main_sound
 from qsec import extract_second_audio_librosa
@@ -515,7 +514,13 @@ def generate_audio_video_task(task_id: str):
         print(f"[DEBUG] Generating audio track...")
         audio_gen = AudioGenerator()
-        aud_path = process_video_for_footstep_audio(video_path)
         duration = results['total_frames'] / results['fps']
         audio_track = audio_gen.create_audio_track(
@@ -843,9 +848,4 @@ async def stop_live_session(session_id: str):
     }
-'''if __name__ == "__main__":
-    import uvicorn
-    uvicorn.run(app, host="0.0.0.0", port=8000)
-'''

 from concurrent.futures import ThreadPoolExecutor
 import base64
 from io import BytesIO
+from agent2 import process_video_for_footstep_audio as process_video_agent2
 # Suppress warnings
 )
 # Import your custom modules
 from sound_agent import main_sound
 from qsec import extract_second_audio_librosa
         print(f"[DEBUG] Generating audio track...")
         audio_gen = AudioGenerator()
+        # Use agent2 to generate audio from video frame
+        aud_path = process_video_agent2(video_path)
+        # Verify audio file exists
+        if not aud_path or not os.path.exists(aud_path):
+            print(f"[WARNING] Audio file not found at {aud_path}, using fallback")
+            raise Exception(f"Failed to generate audio file from video")
         duration = results['total_frames'] / results['fps']
         audio_track = audio_gen.create_audio_track(
     }

qsec.py CHANGED Viewed

@@ -1,8 +1,18 @@
 import numpy as np
 import librosa
 def extract_second_audio_librosa(file_path, target_second=0, sample_rate=22050):
     try:
         # Load audio file
         audio_data, sr = librosa.load(file_path, sr=sample_rate)

 import numpy as np
 import librosa
+import os
 def extract_second_audio_librosa(file_path, target_second=0, sample_rate=22050):
     try:
+        # Validate file path
+        if not file_path:
+            print(f"Error processing audio: No file path provided")
+            return None, None
+        if not os.path.exists(file_path):
+            print(f"Error processing audio: File not found at '{file_path}'")
+            return None, None
         # Load audio file
         audio_data, sr = librosa.load(file_path, sr=sample_rate)

real.py CHANGED Viewed

@@ -723,6 +723,15 @@ class AudioGenerator:
             target_second=1,
             sample_rate=self.sample_rate
         )
         return arr
     def create_audio_track(self, events, aud_path, duration=0.3):
@@ -731,6 +740,12 @@ class AudioGenerator:
         for i, event in enumerate(events):
             step_sound = self.generate_footstep(aud_path)
             pitch_shift = 1.0 + (i % 5 - 2) * 0.03
             indices = np.arange(len(step_sound)) * pitch_shift
             indices = np.clip(indices, 0, len(step_sound) - 1).astype(int)

             target_second=1,
             sample_rate=self.sample_rate
         )
+        # If extraction failed, create a fallback footstep sound
+        if arr is None:
+            print(f"[WARNING] Failed to extract audio from {aud_path}, generating fallback sound")
+            # Create a simple footstep-like sound (dampened click)
+            t = np.linspace(0, 0.2, int(0.2 * self.sample_rate))
+            arr = np.sin(2 * np.pi * 100 * t) * np.exp(-10 * t)
+            arr = arr.astype(np.float32)
         return arr
     def create_audio_track(self, events, aud_path, duration=0.3):
         for i, event in enumerate(events):
             step_sound = self.generate_footstep(aud_path)
+            # Safety check: ensure step_sound is valid
+            if step_sound is None or len(step_sound) == 0:
+                print(f"[WARNING] Invalid step_sound for event {i}, skipping")
+                continue
             pitch_shift = 1.0 + (i % 5 - 2) * 0.03
             indices = np.arange(len(step_sound)) * pitch_shift
             indices = np.clip(indices, 0, len(step_sound) - 1).astype(int)