- 231
- 3 037 333
Roboflow
United States
Приєднався 12 кві 2020
Roboflow shares videos on using computer vision!
Florence-2: Fine-tune Microsoft’s Multimodal Model
Learn how to fine-tune Microsoft's Florence-2, a powerful open-source Vision Language Model, for custom object detection tasks. This in-depth tutorial guides you through setting up your environment in Google Colab, preparing datasets, and optimizing the model using LoRA.
Chapters:
- 00:00 Introduction: Unlock the Power of Florence-2
- 01:09 Getting Started: Prepare for VLM Fine-Tuning
- 03:55 Florence-2 in Action: Explore Pre-trained Capabilities
- 07:00 Dataset Deep Dive: PyTorch Data Loading for Florence-2
- 13:02 LoRA: Optimize Your VLM Training
- 14:21 Fine-Tuning: Unleash Florence-2's Custom Object Detection
- 17:30 Model Evaluation: Measure Your VLM's Success
- 21:37 Florence-2 vs Other Computer Vision Models
- 24:09 Conclusion and Next Steps
Resources:
- Roboflow: roboflow.com
- 🔴 Community Session July 3th, 2024 at 08:00 AM PST / 11:00 AM EST / 05:00 PM CET: roboflow.stream
- ⭐ Notebooks GitHub: github.com/roboflow/notebooks
- 📓 Florence notebook: colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-finetune-florence-2-on-detection-dataset.ipynb
- 🗞 Florence-2 arXiv paper: arxiv.org/abs/2311.06242
- 🗞 Florence-2 overview blog post: blog.roboflow.com/florence-2
- 🗞 Florence-2 fine-tuning blog post: blog.roboflow.com/fine-tune-florence-2-object-detection
- 🔗 Florence-2 HF Space: huggingface.co/spaces/gokaygokay/Florence-2
- 🗞 Mean Average Precision (mAP) blog post: blog.roboflow.com/mean-average-precision
- 🗞 Confusion Matrix blog post: blog.roboflow.com/what-is-a-confusion-matrix
Stay updated with the projects I'm working on at github.com/roboflow and github.com/SkalskiP! ⭐
Chapters:
- 00:00 Introduction: Unlock the Power of Florence-2
- 01:09 Getting Started: Prepare for VLM Fine-Tuning
- 03:55 Florence-2 in Action: Explore Pre-trained Capabilities
- 07:00 Dataset Deep Dive: PyTorch Data Loading for Florence-2
- 13:02 LoRA: Optimize Your VLM Training
- 14:21 Fine-Tuning: Unleash Florence-2's Custom Object Detection
- 17:30 Model Evaluation: Measure Your VLM's Success
- 21:37 Florence-2 vs Other Computer Vision Models
- 24:09 Conclusion and Next Steps
Resources:
- Roboflow: roboflow.com
- 🔴 Community Session July 3th, 2024 at 08:00 AM PST / 11:00 AM EST / 05:00 PM CET: roboflow.stream
- ⭐ Notebooks GitHub: github.com/roboflow/notebooks
- 📓 Florence notebook: colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-finetune-florence-2-on-detection-dataset.ipynb
- 🗞 Florence-2 arXiv paper: arxiv.org/abs/2311.06242
- 🗞 Florence-2 overview blog post: blog.roboflow.com/florence-2
- 🗞 Florence-2 fine-tuning blog post: blog.roboflow.com/fine-tune-florence-2-object-detection
- 🔗 Florence-2 HF Space: huggingface.co/spaces/gokaygokay/Florence-2
- 🗞 Mean Average Precision (mAP) blog post: blog.roboflow.com/mean-average-precision
- 🗞 Confusion Matrix blog post: blog.roboflow.com/what-is-a-confusion-matrix
Stay updated with the projects I'm working on at github.com/roboflow and github.com/SkalskiP! ⭐
Переглядів: 2 220
Відео
PaliGemma by Google: Train Model on Custom Detection Dataset
Переглядів 6 тис.28 днів тому
Learn how to fine-tune PaliGemma, Google's open-source Vision-Language Model, for custom object detection tasks. This step-by-step tutorial walks you through modifying Google's notebook to train PaliGemma on your dataset. We'll use the handwritten digits and math operations dataset from RF100, explore the JSONL format, and demonstrate how to deploy your fine-tuned model for real-world inference...
Dwell Time Analysis with Computer Vision | Real-Time Stream Processing
Переглядів 12 тис.2 місяці тому
Learn how to use computer vision to analyze wait times and optimize processes. This tutorial covers object detection, tracking, and calculating time spent in designated zones. Use these techniques to improve customer experience in retail, traffic management, or other scenarios. Chapters: - 00:00 Intro - 00:41 Static File Processing vs. Stream Processing: Time Calculation Explained - 04:29 Time ...
YOLOv9 Tutorial: Train Model on Custom Dataset | How to Deploy YOLOv9
Переглядів 36 тис.3 місяці тому
Description: Get hands-on with YOLOv9! This video dives into the architecture, setup, and how to train YOLOv9 on your custom datasets. Chapters: - 00:00 Intro - 00:36 Setting Up YOLOv9 - 03:29 YOLOv9 Inference with Pre-Trained COCO Weights - 06:35 Training YOLOv9 on Custom Dataset - 10:44 YOLOv9 Model Evaluation - 13:53 YOLOv9 Inference with Fine-Tuned Model - 15:18 Model Deployment with Infere...
YOLO-World: Real-Time, Zero-Shot Object Detection Explained
Переглядів 32 тис.4 місяці тому
In this video, you’ll learn how to use YOLO-World, a cutting-edge zero-shot object detection model. We'll cover its speed, compare it to other models, and run a live code demo for image AND video analysis. Chapters: - 00:00 Intro - 00:42 YOLO-World vs. Traditional Object Detectors: Speed and Accuracy - 02:26 YOLO-World Architecture - prompt-then-detect - 03:59 Setting Up and Running YOLO-World ...
Speed Estimation & Vehicle Tracking | Computer Vision | Open Source
Переглядів 34 тис.5 місяців тому
Learn how to track and estimate the speed of vehicles using YOLO, ByteTrack, and Roboflow Inference. This comprehensive tutorial covers object detection, multi-object tracking, filtering detections, perspective transformation, speed estimation, visualization improvements, and more. Use this knowledge to enhance traffic control systems, monitor road conditions, and gain valuable insights into ve...
GPT-4V Alternative (Self-Hosted): Deploy CogVLM on AWS
Переглядів 4,3 тис.6 місяців тому
Deploy CogVLM, a powerful GPT-4V alternative, on AWS with this step-by-step technical guide. Learn how to set up and run a self-hosted AI model, gaining independence from standard APIs and enhancing your computer vision capabilities. Chapters: - 00:00 Intro - 00:40 Introduction to CogVLM - 01:43 Setting Up the AWS Infrastructure - 03:56 Configuring the Inference Server - 05:41 Running Inference...
AI.engineer 2023: Live Coding a Multimodal Game, paint.wtf
Переглядів 2,5 тис.8 місяців тому
Roboflow's CEO re-creates our hit drawing game, paint.wtf, powered by the OpenAI CLIP model which attracted over 100,000 players in its first week live on stage in 5 minutes at the 2023 AI.engineer conference. paint.wtf: paint.wtf Inference: roboflow.com/inference
Top Object Detection Models in 2023 | Model Selection Guide sponsored by Intel
Переглядів 21 тис.9 місяців тому
Description: Discover the top object detection models in 2023 in this comprehensive video. We compare models like YOLOv8, YOLOv7, RTMDet, DETA, DINO, and GroundingDINO based on metrics like Mean Average Precision, community support, packaging, and licensing for you to decide which is best for your production AI applications. The video also details the challenges in comparing model speed and hig...
Traffic Analysis with YOLOv8 and ByteTrack - Vehicle Detection and Tracking
Переглядів 26 тис.9 місяців тому
In this video, we explore real-time traffic analysis using YOLOv8 and ByteTrack to detect and track vehicles on aerial images. Harnessing the power of Python and Supervision, we delve deep into assigning cars to specific entry zones and understanding their direction of movement. By visualizing their paths, we gain insights into traffic flow across bustling roundabouts. All resources, including ...
How to Use MMDetection | Train RTMDet on a Custom Dataset
Переглядів 14 тис.10 місяців тому
Dive into the world of computer vision with this comprehensive tutorial on training the RTMDet model using the renowned MMDetection library. Whether you're just starting out or looking to refine your skills, this guide offers a deep dive into the OpenMMLab ecosystem, hands-on installation steps, and practical insights into training on custom datasets. Chapters: - 00:00 Introduction - 00:29 What...
Open Source Computer Vision Deployment with Roboflow Inference
Переглядів 7 тис.10 місяців тому
Open Source Computer Vision Deployment with Roboflow Inference
Fast Segment Anything (FastSAM) vs SAM | Is it 50x faster?
Переглядів 15 тис.11 місяців тому
Fast Segment Anything (FastSAM) vs SAM | Is it 50x faster?
CVPR 2023 - Top Papers & Highlights (My first time!)
Переглядів 6 тис.11 місяців тому
CVPR 2023 - Top Papers & Highlights (My first time!)
Autodistill: Label and Train a Computer Vision Model in Under 20 Minutes
Переглядів 6 тис.Рік тому
Autodistill: Label and Train a Computer Vision Model in Under 20 Minutes
Autodistill: Train YOLOv8 with ZERO Annotations
Переглядів 35 тис.Рік тому
Autodistill: Train YOLOv8 with ZERO Annotations
How to Choose the Best Computer Vision Model for Your Project
Переглядів 14 тис.Рік тому
How to Choose the Best Computer Vision Model for Your Project
Train YOLO-NAS - SOTA Object Detection Model - on Custom Dataset
Переглядів 18 тис.Рік тому
Train YOLO-NAS - SOTA Object Detection Model - on Custom Dataset
CLIP, T-SNE, and UMAP - Master Image Embeddings & Vector Analysis
Переглядів 12 тис.Рік тому
CLIP, T-SNE, and UMAP - Master Image Embeddings & Vector Analysis
Accelerate Image Annotation with SAM and Grounding DINO | Python Tutorial
Переглядів 41 тис.Рік тому
Accelerate Image Annotation with SAM and Grounding DINO | Python Tutorial
Label Data with Segment Anything Model (SAM) in Roboflow
Переглядів 19 тис.Рік тому
Label Data with Segment Anything Model (SAM) in Roboflow
SAM - Segment Anything Model by Meta AI: Complete Guide | Python Setup & Applications
Переглядів 61 тис.Рік тому
SAM - Segment Anything Model by Meta AI: Complete Guide | Python Setup & Applications
AWS Startup Showcase - AI/ML Top Startups: Roboflow sponsored by Intel
Переглядів 661Рік тому
AWS Startup Showcase - AI/ML Top Startups: Roboflow sponsored by Intel
Segment Anything Model (SAM) Breakdown | Computer Vision Breakthrough
Переглядів 13 тис.Рік тому
Segment Anything Model (SAM) Breakdown | Computer Vision Breakthrough
Grounding DINO: Automated Dataset Annotation and Evaluation | SOTA Zero-Shot Object Detector
Переглядів 10 тис.Рік тому
Grounding DINO: Automated Dataset Annotation and Evaluation | SOTA Zero-Shot Object Detector
Detect Anything You Want with Grounding DINO | Zero Shot Object Detection SOTA
Переглядів 30 тис.Рік тому
Detect Anything You Want with Grounding DINO | Zero Shot Object Detection SOTA
Build Computer Vision Applications Faster with Supervision
Переглядів 4,2 тис.Рік тому
Build Computer Vision Applications Faster with Supervision
Roboflow 6 Minute Intro | Build a Coin Counter with Computer Vision
Переглядів 55 тис.Рік тому
Roboflow 6 Minute Intro | Build a Coin Counter with Computer Vision
YOLOv8 native tracking | Step-by-step tutorial | Tracking with Live Webcam Stream
Переглядів 34 тис.Рік тому
YOLOv8 native tracking | Step-by-step tutorial | Tracking with Live Webcam Stream
how to train this model on custom dataset for OCR
Please sir also tech us how to annotate with it
You mean how to automatically annotate images?
Thanks Sir. Please do fine-tuning for Oct, captioning and segmentation task
Did you tried to run OCR with pre-trained model?
Thanks a ton for this awesome video! Every single term is explained so clearly-it's super helpful. I can't wait to dive in the code and start putting this knowledge to use!
Thanks a lot! I really put an effort and try not to fall into a bias (not assume that people know those things).
Is this applicable to grade handwritten pdf math assignments?
Florence-2 can be really good at OCR processing of handwritten text. Not sure about math equations. We would need to confirm that.
@@Roboflow {'<DETAILED_CAPTION>': 'In this image we can see a book with some text on it.'} This is the test output of a handwritten math problem deduction, is there someway to get more detailed caption or the OCR output?
Thank you
Very informative video. Thanks for making auch a valuable video free of cost. Just one request when your you make tutorials if possible try to do inferencing, training or fine tuning on agricultural or satellite related data.
Next time I will try to find some cool datasets from this domains
Nice video, as usual
Thanks a lot!
I've been waiting for this tutorial for days. Thank you again for being the first to comprehensively review this new model. Super exited! 🎉🥳
As usual you are the first one to comment on the video! Thanks a lot for all the support! 🔥
should do this for every single image in batch . or we can do one image and replace it with batch of image?
Amazing intro to Roboflow and Object detection. Very intuitive platform for dataset creation.
Thank you!
Awesome! Just Awesome straight to point tutorial.
Very good explanation. One doubt on real time detection. what should be the maximum distance for YOLOv9 to detect the object? can it detect at 30ft or 40 feet could you please provide these details?
I am facing an issue when I try this notebook AttributeError Traceback (most recent call last) <ipython-input-50-314f4612d4ab> in <cell line: 12>() 10 11 # annotators configuration ---> 12 thickness = sv.calculate_dynamic_line_thickness( 13 resolution_wh=video_info.resolution_wh 14 ) AttributeError: module 'supervision' has no attribute 'calculate_dynamic_line_thickness'
Hi , i have a problem , when training the yolo creates new dataset by itself and trains on that without training on my own custom datasety , how to fix it ?
Hello, how can I get performance metrics such as precision and recall while training and evaluating the model?
All at once in the sense, rather than annotating all the images manually is there any way we could do it faster?
I want to use this project. It works on the hugging face, but strangely it doesn't fit my environment, it doesn't work on my PC. I want to "clone" that on the hugging face, is there a way?
Yes. HF Spaces work like git. You can clone entire project to your local.
There needs to be correction factor along the path…it’s like drawing the globe on a flat piece of paper. If you watch cars driving away on the right side, their speed is 140 kph and “reduces” to 133 kph: which is very unlikely. I know the trapezoid can be limited to those vehicles closest to the camera but I thought you might like to tweak your algorithm. 👍
Sure 👍🏻 the whole algo is a bit of simplification as we only have 4 points. If road is not perfectly flat and straight some divisions may occur. Still I think it’s one of the complexity/accuracy tradeoff is okey.
How to annotate all the images at once? I couldn't find that option
Could you be a bit more precise? What do you mean by „at once”?
Cant load kagle dataset any another way
What’s the problem with Kaggle?
@@Roboflow it's showing out of date can't use the loading dataset command. Which is !kaggle competition files -c dfl-bundeslinga-data-shootout | grep clips | head-10 I checked in kaggle it showing no dataset available for dfl- Bundesliga
i get this error: "SupervisionWarnings: The `frame_resolution_wh` parameter is no longer required and will be dropped in version supervision-0.24.0. The mask resolution is now calculated automatically based on the polygon coordinates. SupervisionWarnings: red is deprecated: `Color.red()` is deprecated and will be removed in `supervision-0.22.0`. Use `Color.RED` instead. Traceback (most recent call last): File "f:\PYTHON\smartCCTV-YoloV8\main.py", line 81, in <module> main() File "f:\PYTHON\smartCCTV-YoloV8\main.py", line 58, in main result = model(frame, agnostic_nms=True)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\laragon\bin\python\python-3.11\Lib\site-packages\ultralytics\yolo\engine\model.py", line 58, in __call__ return self.predict(source, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\laragon\bin\python\python-3.11\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\laragon\bin\python\python-3.11\Lib\site-packages\ultralytics\yolo\engine\model.py", line 130, in predict predictor.setup(model=self.model, source=source) File "D:\laragon\bin\python\python-3.11\Lib\site-packages\ultralytics\yolo\engine\predictor.py", line 111, in setup source = str(source or self.args.source) ^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" please help me
I want to ask if the dataset I have doesn't have a map label, can it be used for this code?
Hello, how can I see validation loss?
This Cool! You might like PlateCatcher
i keep vscode on half screen and other half is for youtube, but your code is not properly visible, it's too small to copy from video, also I don't want github links to supervision and inference but direct link to the script file that you have used in this video.
github.com/roboflow/supervision/tree/develop/examples/speed_estimation
Are there any footage requirements for inputting into YOLOV8? I am trying to use it for sports analysis and wondered whether you need the whole pitch/tactical wide lens?!how zoomed in does it need to be? Will it capture a ball being hit at really high speeds?
Nice tutorial. The training status has been stuck at "Training machine starting..." for hours. The number of custom images of mine is 10. Each size is around 2mb. jpeg. Could overlapping bounding boxes be the root?
i am getting this error AttributeError: module 'supervision' has no attribute 'BoxAnnotator'
THANK YOUU SO MUCHH BROO 👍👍👍👍👍👍
i want to learn AI .please make a playlist ..
Can you make a video on person re identification.
Can you make a video on person re identification.
Can you make a video on person re identification.
Can you make a video on person re identification.
Can you make a video on person re identification.
Can you make a video on person re identification.
ita a best
How do I add speed detection in this
Very nice!
Hello, Piotr Skalski! Hello everyone... I am diving a little into the code here... 😁 Quick question - how do I add an image into a detection-box from Supervision? Thanks
Hey iam using yolov9 to detect diease and i wnat to use paleema gemma to give me features and generate Pdf report is this fine
can you do as well as in VQA😊 using pytorch code
Can we get an updated version of this? Training tensorflow with roboflow
The fact that you care about licensing that helps us all out, instant subscribe. Pretty tired of seeing AGPL Licensed code/models being used over and over.
detections = sv.Detections.from_yolov8(result) AttributeError: type object 'Detections' has no attribute 'from_yolov8' error
Great work, thank you for sharing!
Great tutorial @roboflow !!! Is it possible to train same model for VQA as well as object detection? Can you provide any example of how the JSONL file should look like in such cases?
We will soon add support for VQA datasets in roboflow. I plan to roll out tutorials covering this topic soon.
Hi, Can I fine-tune the model on a medical dataset? Currently, the model is not performing well on this data, and the results indicate that it is not suitable for medical data out-of-the-box. If I fine-tune the model on my dataset, which consists of approximately 200 to 300 images, will it work better? Additionally, is it possible to quantize this model to reduce its size from 3B to something smaller without significantly compromising its performance? Thank you.
It is possible to fine-tune on medical images I done that on several use-cases like tumors detection.
@@Roboflow okay
Around 1:06:46 you present the non max merging in supervision. So far it only works for bboxes? Would be cool to have something for the masks from instance segmentation as well. But thanks for the hint
Yes. For now we have that working only with boxes. But long therm plan is to roll it out for masks as well ;)