abhi02072005 commited on
Commit
8ca7c55
·
1 Parent(s): ba32884
Files changed (5) hide show
  1. FIXES_SUMMARY.md +105 -0
  2. agent2.py +58 -13
  3. link2.py +8 -8
  4. qsec.py +10 -0
  5. real.py +15 -0
FIXES_SUMMARY.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Bug Fixes Summary - Audio Video Generation
2
+
3
+ ## Date: October 29, 2025
4
+
5
+ ## Issues Fixed
6
+
7
+ ### 1. **Import Conflict in link2.py**
8
+ **Problem:** Line 53 imported `process_video_for_footstep_audio` from `agent.py`, which overrode the correct import from `agent2.py` on line 21.
9
+
10
+ **Fix:**
11
+ - Removed the conflicting import from `agent.py`
12
+ - Renamed the import from `agent2.py` to `process_video_agent2` for clarity
13
+ - Updated the function call in `generate_audio_video_task()` to use `process_video_agent2()`
14
+
15
+ ### 2. **File Path Validation in link2.py**
16
+ **Problem:** No validation to check if the audio file was successfully generated.
17
+
18
+ **Fix:**
19
+ - Added file existence check after calling `process_video_agent2()`
20
+ - Raises an exception if the file doesn't exist, allowing proper error handling
21
+
22
+ ### 3. **Missing Error Handling in real.py AudioGenerator**
23
+ **Problem:** `generate_footstep()` returned `None` when audio extraction failed, causing `TypeError: object of type 'NoneType' has no len()` in `create_audio_track()`.
24
+
25
+ **Fix in `generate_footstep()`:**
26
+ - Added fallback audio generation when `extract_second_audio_librosa()` returns `None`
27
+ - Creates a synthetic footstep sound using a dampened sine wave
28
+
29
+ **Fix in `create_audio_track()`:**
30
+ - Added safety check to validate `step_sound` is not `None` or empty
31
+ - Skips problematic events with a warning instead of crashing
32
+
33
+ ### 4. **Improved Error Handling in agent2.py**
34
+ **Problem:** `process_video_for_footstep_audio()` could return `None`, causing downstream failures.
35
+
36
+ **Fix:**
37
+ - Wrapped entire function in try-catch block
38
+ - Added `create_fallback_audio()` function that generates a synthetic footstep sound
39
+ - Returns fallback audio path if any step fails (frame extraction, base64 conversion, LLM generation)
40
+ - Added file existence verification before returning the path
41
+ - Ensures `./audio` directory exists
42
+
43
+ ### 5. **Better Error Messages in qsec.py**
44
+ **Problem:** Generic error messages didn't help identify the root cause (file not found vs. other errors).
45
+
46
+ **Fix:**
47
+ - Added file path validation before attempting to load audio
48
+ - Added specific error message when file path is empty
49
+ - Added specific error message when file doesn't exist
50
+ - Added `os` import for path validation
51
+
52
+ ## Files Modified
53
+
54
+ 1. **link2.py**
55
+ - Fixed import conflict (removed `agent` import, renamed `agent2` import)
56
+ - Added audio file validation in `generate_audio_video_task()`
57
+
58
+ 2. **real.py**
59
+ - Added fallback audio generation in `generate_footstep()`
60
+ - Added null check in `create_audio_track()`
61
+
62
+ 3. **agent2.py**
63
+ - Added comprehensive error handling in `process_video_for_footstep_audio()`
64
+ - Created `create_fallback_audio()` function
65
+ - Ensured audio directory exists
66
+ - Added file existence verification
67
+
68
+ 4. **qsec.py**
69
+ - Added `os` import
70
+ - Added file path validation with specific error messages
71
+
72
+ ## Testing Recommendations
73
+
74
+ 1. Test with valid video file - should generate audio successfully
75
+ 2. Test with corrupted video file - should use fallback audio
76
+ 3. Test when LLM fails - should use fallback audio
77
+ 4. Test when audio extraction fails - should generate synthetic footstep sound
78
+ 5. Verify all error messages are clear and helpful
79
+
80
+ ## Technical Details
81
+
82
+ ### Error Flow (Before Fix)
83
+ ```
84
+ link2.py → agent.py (wrong import) → returns string name →
85
+ qsec.py tries to load string as file → fails → returns (None, None) →
86
+ real.py tries len(None) → TypeError crash
87
+ ```
88
+
89
+ ### Error Flow (After Fix)
90
+ ```
91
+ link2.py → agent2.py → generates actual audio file → returns file path →
92
+ qsec.py validates path → loads audio → returns audio array →
93
+ real.py uses audio (or fallback if None) → success
94
+ ```
95
+
96
+ ## Fallback Audio Details
97
+
98
+ - **Sample Rate:** 44,100 Hz
99
+ - **Duration:** 1 second
100
+ - **Waveform:** Combination of:
101
+ - 80 Hz sine wave (low frequency thump) with exponential decay
102
+ - Gaussian noise burst with faster exponential decay
103
+ - Normalized to 80% maximum amplitude
104
+ - **Format:** WAV file (16-bit PCM)
105
+ - **Location:** `./audio/fallback_footstep.wav`
agent2.py CHANGED
@@ -125,6 +125,9 @@ chain = (
125
 
126
  def analyze_image_and_generate_audio(image_base64):
127
  try:
 
 
 
128
  result = chain.invoke(image_base64)
129
  p=open("ss.txt","w")
130
  p.write(str(result))
@@ -155,19 +158,61 @@ def analyze_image_and_generate_audio(image_base64):
155
 
156
 
157
  def process_video_for_footstep_audio(video_path):
158
- print("🎥 Extracting first frame...")
159
- first_frame = extract_first_frame(video_path)
160
- if first_frame is None:
161
- print("❌ Failed to extract frame from video")
162
- return None
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
 
164
- image_base64 = image_to_base64(first_frame)
165
- if image_base64 is None:
166
- print("❌ Failed to convert image to base64")
167
- return None
168
 
169
- print("🤖 Generating footstep audio from LLM...")
170
- audio_path = analyze_image_and_generate_audio(image_base64)
171
- print(audio_path)
172
- return audio_path
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
173
 
 
125
 
126
  def analyze_image_and_generate_audio(image_base64):
127
  try:
128
+ # Ensure audio directory exists
129
+ os.makedirs("./audio", exist_ok=True)
130
+
131
  result = chain.invoke(image_base64)
132
  p=open("ss.txt","w")
133
  p.write(str(result))
 
158
 
159
 
160
  def process_video_for_footstep_audio(video_path):
161
+ try:
162
+ print("🎥 Extracting first frame...")
163
+ first_frame = extract_first_frame(video_path)
164
+ if first_frame is None:
165
+ print("❌ Failed to extract frame from video")
166
+ # Return a fallback audio path
167
+ return create_fallback_audio()
168
+
169
+ image_base64 = image_to_base64(first_frame)
170
+ if image_base64 is None:
171
+ print("❌ Failed to convert image to base64")
172
+ return create_fallback_audio()
173
+
174
+ print("🤖 Generating footstep audio from LLM...")
175
+ audio_path = analyze_image_and_generate_audio(image_base64)
176
+
177
+ # Verify the file exists
178
+ if audio_path and os.path.exists(audio_path):
179
+ print(f"✅ Audio generated successfully at: {audio_path}")
180
+ return audio_path
181
+ else:
182
+ print("❌ Generated audio file not found")
183
+ return create_fallback_audio()
184
+
185
+ except Exception as e:
186
+ print(f"❌ Error in process_video_for_footstep_audio: {e}")
187
+ import traceback
188
+ traceback.print_exc()
189
+ return create_fallback_audio()
190
 
 
 
 
 
191
 
192
+ def create_fallback_audio():
193
+ """Create a simple fallback audio file"""
194
+ try:
195
+ os.makedirs("./audio", exist_ok=True)
196
+ fallback_path = "./audio/fallback_footstep.wav"
197
+
198
+ # Create a simple footstep-like sound
199
+ sample_rate = 44100
200
+ duration = 1.0
201
+ t = np.linspace(0, duration, int(sample_rate * duration))
202
+
203
+ # Combination of low frequency thump and noise burst
204
+ footstep = (
205
+ np.sin(2 * np.pi * 80 * t) * np.exp(-8 * t) + # Low frequency thump
206
+ np.random.normal(0, 0.1, len(t)) * np.exp(-15 * t) # Noise burst
207
+ )
208
+
209
+ footstep = footstep / np.max(np.abs(footstep)) * 0.8
210
+
211
+ write(fallback_path, sample_rate, np.int16(footstep * 32767))
212
+ print(f"✅ Created fallback audio at: {fallback_path}")
213
+ return fallback_path
214
+
215
+ except Exception as e:
216
+ print(f"❌ Failed to create fallback audio: {e}")
217
+ return None
218
 
link2.py CHANGED
@@ -19,7 +19,7 @@ import asyncio
19
  from concurrent.futures import ThreadPoolExecutor
20
  import base64
21
  from io import BytesIO
22
- from agent2 import process_video_for_footstep_audio
23
 
24
 
25
  # Suppress warnings
@@ -53,7 +53,6 @@ from real import (
53
  )
54
 
55
  # Import your custom modules
56
- from agent import process_video_for_footstep_audio
57
  from sound_agent import main_sound
58
  from qsec import extract_second_audio_librosa
59
 
@@ -515,7 +514,13 @@ def generate_audio_video_task(task_id: str):
515
  print(f"[DEBUG] Generating audio track...")
516
  audio_gen = AudioGenerator()
517
 
518
- aud_path = process_video_for_footstep_audio(video_path)
 
 
 
 
 
 
519
 
520
  duration = results['total_frames'] / results['fps']
521
  audio_track = audio_gen.create_audio_track(
@@ -843,9 +848,4 @@ async def stop_live_session(session_id: str):
843
  }
844
 
845
 
846
- '''if __name__ == "__main__":
847
- import uvicorn
848
-
849
- uvicorn.run(app, host="0.0.0.0", port=8000)
850
- '''
851
 
 
19
  from concurrent.futures import ThreadPoolExecutor
20
  import base64
21
  from io import BytesIO
22
+ from agent2 import process_video_for_footstep_audio as process_video_agent2
23
 
24
 
25
  # Suppress warnings
 
53
  )
54
 
55
  # Import your custom modules
 
56
  from sound_agent import main_sound
57
  from qsec import extract_second_audio_librosa
58
 
 
514
  print(f"[DEBUG] Generating audio track...")
515
  audio_gen = AudioGenerator()
516
 
517
+ # Use agent2 to generate audio from video frame
518
+ aud_path = process_video_agent2(video_path)
519
+
520
+ # Verify audio file exists
521
+ if not aud_path or not os.path.exists(aud_path):
522
+ print(f"[WARNING] Audio file not found at {aud_path}, using fallback")
523
+ raise Exception(f"Failed to generate audio file from video")
524
 
525
  duration = results['total_frames'] / results['fps']
526
  audio_track = audio_gen.create_audio_track(
 
848
  }
849
 
850
 
 
 
 
 
 
851
 
qsec.py CHANGED
@@ -1,8 +1,18 @@
1
  import numpy as np
2
  import librosa
 
3
 
4
  def extract_second_audio_librosa(file_path, target_second=0, sample_rate=22050):
5
  try:
 
 
 
 
 
 
 
 
 
6
  # Load audio file
7
  audio_data, sr = librosa.load(file_path, sr=sample_rate)
8
 
 
1
  import numpy as np
2
  import librosa
3
+ import os
4
 
5
  def extract_second_audio_librosa(file_path, target_second=0, sample_rate=22050):
6
  try:
7
+ # Validate file path
8
+ if not file_path:
9
+ print(f"Error processing audio: No file path provided")
10
+ return None, None
11
+
12
+ if not os.path.exists(file_path):
13
+ print(f"Error processing audio: File not found at '{file_path}'")
14
+ return None, None
15
+
16
  # Load audio file
17
  audio_data, sr = librosa.load(file_path, sr=sample_rate)
18
 
real.py CHANGED
@@ -723,6 +723,15 @@ class AudioGenerator:
723
  target_second=1,
724
  sample_rate=self.sample_rate
725
  )
 
 
 
 
 
 
 
 
 
726
  return arr
727
 
728
  def create_audio_track(self, events, aud_path, duration=0.3):
@@ -731,6 +740,12 @@ class AudioGenerator:
731
 
732
  for i, event in enumerate(events):
733
  step_sound = self.generate_footstep(aud_path)
 
 
 
 
 
 
734
  pitch_shift = 1.0 + (i % 5 - 2) * 0.03
735
  indices = np.arange(len(step_sound)) * pitch_shift
736
  indices = np.clip(indices, 0, len(step_sound) - 1).astype(int)
 
723
  target_second=1,
724
  sample_rate=self.sample_rate
725
  )
726
+
727
+ # If extraction failed, create a fallback footstep sound
728
+ if arr is None:
729
+ print(f"[WARNING] Failed to extract audio from {aud_path}, generating fallback sound")
730
+ # Create a simple footstep-like sound (dampened click)
731
+ t = np.linspace(0, 0.2, int(0.2 * self.sample_rate))
732
+ arr = np.sin(2 * np.pi * 100 * t) * np.exp(-10 * t)
733
+ arr = arr.astype(np.float32)
734
+
735
  return arr
736
 
737
  def create_audio_track(self, events, aud_path, duration=0.3):
 
740
 
741
  for i, event in enumerate(events):
742
  step_sound = self.generate_footstep(aud_path)
743
+
744
+ # Safety check: ensure step_sound is valid
745
+ if step_sound is None or len(step_sound) == 0:
746
+ print(f"[WARNING] Invalid step_sound for event {i}, skipping")
747
+ continue
748
+
749
  pitch_shift = 1.0 + (i % 5 - 2) * 0.03
750
  indices = np.arange(len(step_sound)) * pitch_shift
751
  indices = np.clip(indices, 0, len(step_sound) - 1).astype(int)