(Day 256) Using, finetuning and explaining computer vision to my teammatest

Ivan Ivanov · September 13, 2024

deep-learning applying-knowledge

Hello :) Today is Day 256!

A quick summary of today:

using YOLO for object and violence detection

Firstly, we met with a rep from Rastech

Rastech is the company for which we are developing a project in my Hannam Design Factory course. When we met today I asked how they do computer vision and some data questions. The main thing I got out is that they use open-source models and YOLO to be specific.

Secondly, using YOLO for violence detection

There is a nice library - ultralytics that makes using YOLO and fine-tuning it very easy.

I found a violence image dataset online and fine tuned a basic model. I did it just for 2 epochs so that I can test it with my teammates. CV models are definitely smaller than LLMs but it still took around 30 minutes.

To finetune YOLOv8, we need a data.yaml

train: train/images
val: valid/images
test: test/images

nc: 2
names: ['NonViolence', 'Violence']

roboflow:
  workspace: shah-xxxqs
  project: violence-3h8pw
  version: 1
  license: CC BY 4.0
  url: https://universe.roboflow.com/shah-xxxqs/violence-3h8pw/dataset/1

It specifies the classes, and paths to files

and then we can just execute:

yolo task=detect mode=train model=yolov8s.pt data=data.yaml epochs=2 imgsz=416 plots=True device=mps mlflow=False

And the training starts

Then, to use the model ~

import cv2
from ultralytics import YOLO

model = YOLO('best.pt')
print(model.names)
webcamera = cv2.VideoCapture(0)

while True:
    success, frame = webcamera.read()
    
    results = model.track(frame, classes=[i for i in range(80)], conf=0.8, imgsz=480)
    cv2.putText(frame, f"Total: {len(results[0].boxes)}", (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2, cv2.LINE_AA)
    cv2.imshow("Live Camera", results[0].plot())

    if cv2.waitKey(1) == ord('q'):
        break

webcamera.release()
cv2.destroyAllWindows()

and the results are not bad and promising.

I also found a model from huggingface - IDEA-Research/grounding-dino-base but when I used it the camera was very laggy. I suspect because I am on a mac and even though I specified ‘mps’ as the device, it still uses the CPU (but who knows … I need to test this on my nvidia gpu laptop)

Thirdly, an intro to how computer vision models work

When I was writing my code today, my teammates kept looking, asking questions and just being curious about what’s happening. Even though I explained a little, we had to go back to doing other work. So after finishing for the day, I decided to create a short document explaning how we go from raw images to real-time detection using cameras. The doc is here

That is all for today!

See you tomorrow :)