Hello :) Today is Day 256!
A quick summary of today:
- using YOLO for object and violence detection
Firstly, we met with a rep from Rastech
Rastech is the company for which we are developing a project in my Hannam Design Factory course. When we met today I asked how they do computer vision and some data questions. The main thing I got out is that they use open-source models and YOLO to be specific.
Secondly, using YOLO for violence detection
There is a nice library - ultralytics that makes using YOLO and fine-tuning it very easy.
I found a violence image dataset online and fine tuned a basic model. I did it just for 2 epochs so that I can test it with my teammates. CV models are definitely smaller than LLMs but it still took around 30 minutes.
To finetune YOLOv8, we need a data.yaml
train: train/images
val: valid/images
test: test/images
nc: 2
names: ['NonViolence', 'Violence']
roboflow:
workspace: shah-xxxqs
project: violence-3h8pw
version: 1
license: CC BY 4.0
url: https://universe.roboflow.com/shah-xxxqs/violence-3h8pw/dataset/1
It specifies the classes, and paths to files
and then we can just execute:
yolo task=detect mode=train model=yolov8s.pt data=data.yaml epochs=2 imgsz=416 plots=True device=mps mlflow=False
And the training starts
Then, to use the model ~
import cv2
from ultralytics import YOLO
model = YOLO('best.pt')
print(model.names)
webcamera = cv2.VideoCapture(0)
while True:
success, frame = webcamera.read()
results = model.track(frame, classes=[i for i in range(80)], conf=0.8, imgsz=480)
cv2.putText(frame, f"Total: {len(results[0].boxes)}", (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2, cv2.LINE_AA)
cv2.imshow("Live Camera", results[0].plot())
if cv2.waitKey(1) == ord('q'):
break
webcamera.release()
cv2.destroyAllWindows()
and the results are not bad and promising.
I also found a model from huggingface - IDEA-Research/grounding-dino-base but when I used it the camera was very laggy. I suspect because I am on a mac and even though I specified ‘mps’ as the device, it still uses the CPU (but who knows … I need to test this on my nvidia gpu laptop)
Thirdly, an intro to how computer vision models work
When I was writing my code today, my teammates kept looking, asking questions and just being curious about what’s happening. Even though I explained a little, we had to go back to doing other work. So after finishing for the day, I decided to create a short document explaning how we go from raw images to real-time detection using cameras. The doc is here
That is all for today!
See you tomorrow :)