Roboflow
Roboflow
  • 231
  • 3 037 333
Florence-2: Fine-tune Microsoft’s Multimodal Model
Learn how to fine-tune Microsoft's Florence-2, a powerful open-source Vision Language Model, for custom object detection tasks. This in-depth tutorial guides you through setting up your environment in Google Colab, preparing datasets, and optimizing the model using LoRA.
Chapters:
- 00:00 Introduction: Unlock the Power of Florence-2
- 01:09 Getting Started: Prepare for VLM Fine-Tuning
- 03:55 Florence-2 in Action: Explore Pre-trained Capabilities
- 07:00 Dataset Deep Dive: PyTorch Data Loading for Florence-2
- 13:02 LoRA: Optimize Your VLM Training
- 14:21 Fine-Tuning: Unleash Florence-2's Custom Object Detection
- 17:30 Model Evaluation: Measure Your VLM's Success
- 21:37 Florence-2 vs Other Computer Vision Models
- 24:09 Conclusion and Next Steps
Resources:
- Roboflow: roboflow.com
- 🔴 Community Session July 3th, 2024 at 08:00 AM PST / 11:00 AM EST / 05:00 PM CET: roboflow.stream
- ⭐ Notebooks GitHub: github.com/roboflow/notebooks
- 📓 Florence notebook: colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-finetune-florence-2-on-detection-dataset.ipynb
- 🗞 Florence-2 arXiv paper: arxiv.org/abs/2311.06242
- 🗞 Florence-2 overview blog post: blog.roboflow.com/florence-2
- 🗞 Florence-2 fine-tuning blog post: blog.roboflow.com/fine-tune-florence-2-object-detection
- 🔗 Florence-2 HF Space: huggingface.co/spaces/gokaygokay/Florence-2
- 🗞 Mean Average Precision (mAP) blog post: blog.roboflow.com/mean-average-precision
- 🗞 Confusion Matrix blog post: blog.roboflow.com/what-is-a-confusion-matrix
Stay updated with the projects I'm working on at github.com/roboflow and github.com/SkalskiP! ⭐
Переглядів: 2 220

Відео

PaliGemma by Google: Train Model on Custom Detection Dataset
Переглядів 6 тис.28 днів тому
Learn how to fine-tune PaliGemma, Google's open-source Vision-Language Model, for custom object detection tasks. This step-by-step tutorial walks you through modifying Google's notebook to train PaliGemma on your dataset. We'll use the handwritten digits and math operations dataset from RF100, explore the JSONL format, and demonstrate how to deploy your fine-tuned model for real-world inference...
Dwell Time Analysis with Computer Vision | Real-Time Stream Processing
Переглядів 12 тис.2 місяці тому
Learn how to use computer vision to analyze wait times and optimize processes. This tutorial covers object detection, tracking, and calculating time spent in designated zones. Use these techniques to improve customer experience in retail, traffic management, or other scenarios. Chapters: - 00:00 Intro - 00:41 Static File Processing vs. Stream Processing: Time Calculation Explained - 04:29 Time ...
YOLOv9 Tutorial: Train Model on Custom Dataset | How to Deploy YOLOv9
Переглядів 36 тис.3 місяці тому
Description: Get hands-on with YOLOv9! This video dives into the architecture, setup, and how to train YOLOv9 on your custom datasets. Chapters: - 00:00 Intro - 00:36 Setting Up YOLOv9 - 03:29 YOLOv9 Inference with Pre-Trained COCO Weights - 06:35 Training YOLOv9 on Custom Dataset - 10:44 YOLOv9 Model Evaluation - 13:53 YOLOv9 Inference with Fine-Tuned Model - 15:18 Model Deployment with Infere...
YOLO-World: Real-Time, Zero-Shot Object Detection Explained
Переглядів 32 тис.4 місяці тому
In this video, you’ll learn how to use YOLO-World, a cutting-edge zero-shot object detection model. We'll cover its speed, compare it to other models, and run a live code demo for image AND video analysis. Chapters: - 00:00 Intro - 00:42 YOLO-World vs. Traditional Object Detectors: Speed and Accuracy - 02:26 YOLO-World Architecture - prompt-then-detect - 03:59 Setting Up and Running YOLO-World ...
Speed Estimation & Vehicle Tracking | Computer Vision | Open Source
Переглядів 34 тис.5 місяців тому
Learn how to track and estimate the speed of vehicles using YOLO, ByteTrack, and Roboflow Inference. This comprehensive tutorial covers object detection, multi-object tracking, filtering detections, perspective transformation, speed estimation, visualization improvements, and more. Use this knowledge to enhance traffic control systems, monitor road conditions, and gain valuable insights into ve...
GPT-4V Alternative (Self-Hosted): Deploy CogVLM on AWS
Переглядів 4,3 тис.6 місяців тому
Deploy CogVLM, a powerful GPT-4V alternative, on AWS with this step-by-step technical guide. Learn how to set up and run a self-hosted AI model, gaining independence from standard APIs and enhancing your computer vision capabilities. Chapters: - 00:00 Intro - 00:40 Introduction to CogVLM - 01:43 Setting Up the AWS Infrastructure - 03:56 Configuring the Inference Server - 05:41 Running Inference...
AI.engineer 2023: Live Coding a Multimodal Game, paint.wtf
Переглядів 2,5 тис.8 місяців тому
Roboflow's CEO re-creates our hit drawing game, paint.wtf, powered by the OpenAI CLIP model which attracted over 100,000 players in its first week live on stage in 5 minutes at the 2023 AI.engineer conference. paint.wtf: paint.wtf Inference: roboflow.com/inference
Top Object Detection Models in 2023 | Model Selection Guide sponsored by Intel
Переглядів 21 тис.9 місяців тому
Description: Discover the top object detection models in 2023 in this comprehensive video. We compare models like YOLOv8, YOLOv7, RTMDet, DETA, DINO, and GroundingDINO based on metrics like Mean Average Precision, community support, packaging, and licensing for you to decide which is best for your production AI applications. The video also details the challenges in comparing model speed and hig...
Traffic Analysis with YOLOv8 and ByteTrack - Vehicle Detection and Tracking
Переглядів 26 тис.9 місяців тому
In this video, we explore real-time traffic analysis using YOLOv8 and ByteTrack to detect and track vehicles on aerial images. Harnessing the power of Python and Supervision, we delve deep into assigning cars to specific entry zones and understanding their direction of movement. By visualizing their paths, we gain insights into traffic flow across bustling roundabouts. All resources, including ...
How to Use MMDetection | Train RTMDet on a Custom Dataset
Переглядів 14 тис.10 місяців тому
Dive into the world of computer vision with this comprehensive tutorial on training the RTMDet model using the renowned MMDetection library. Whether you're just starting out or looking to refine your skills, this guide offers a deep dive into the OpenMMLab ecosystem, hands-on installation steps, and practical insights into training on custom datasets. Chapters: - 00:00 Introduction - 00:29 What...
Open Source Computer Vision Deployment with Roboflow Inference
Переглядів 7 тис.10 місяців тому
Open Source Computer Vision Deployment with Roboflow Inference
Fast Segment Anything (FastSAM) vs SAM | Is it 50x faster?
Переглядів 15 тис.11 місяців тому
Fast Segment Anything (FastSAM) vs SAM | Is it 50x faster?
CVPR 2023 - Top Papers & Highlights (My first time!)
Переглядів 6 тис.11 місяців тому
CVPR 2023 - Top Papers & Highlights (My first time!)
Autodistill: Label and Train a Computer Vision Model in Under 20 Minutes
Переглядів 6 тис.Рік тому
Autodistill: Label and Train a Computer Vision Model in Under 20 Minutes
Autodistill: Train YOLOv8 with ZERO Annotations
Переглядів 35 тис.Рік тому
Autodistill: Train YOLOv8 with ZERO Annotations
How to Choose the Best Computer Vision Model for Your Project
Переглядів 14 тис.Рік тому
How to Choose the Best Computer Vision Model for Your Project
Train YOLO-NAS - SOTA Object Detection Model - on Custom Dataset
Переглядів 18 тис.Рік тому
Train YOLO-NAS - SOTA Object Detection Model - on Custom Dataset
CLIP, T-SNE, and UMAP - Master Image Embeddings & Vector Analysis
Переглядів 12 тис.Рік тому
CLIP, T-SNE, and UMAP - Master Image Embeddings & Vector Analysis
Accelerate Image Annotation with SAM and Grounding DINO | Python Tutorial
Переглядів 41 тис.Рік тому
Accelerate Image Annotation with SAM and Grounding DINO | Python Tutorial
Label Data with Segment Anything Model (SAM) in Roboflow
Переглядів 19 тис.Рік тому
Label Data with Segment Anything Model (SAM) in Roboflow
SAM - Segment Anything Model by Meta AI: Complete Guide | Python Setup & Applications
Переглядів 61 тис.Рік тому
SAM - Segment Anything Model by Meta AI: Complete Guide | Python Setup & Applications
AWS Startup Showcase - AI/ML Top Startups: Roboflow sponsored by Intel
Переглядів 661Рік тому
AWS Startup Showcase - AI/ML Top Startups: Roboflow sponsored by Intel
Segment Anything Model (SAM) Breakdown | Computer Vision Breakthrough
Переглядів 13 тис.Рік тому
Segment Anything Model (SAM) Breakdown | Computer Vision Breakthrough
Grounding DINO: Automated Dataset Annotation and Evaluation | SOTA Zero-Shot Object Detector
Переглядів 10 тис.Рік тому
Grounding DINO: Automated Dataset Annotation and Evaluation | SOTA Zero-Shot Object Detector
Detect Anything You Want with Grounding DINO | Zero Shot Object Detection SOTA
Переглядів 30 тис.Рік тому
Detect Anything You Want with Grounding DINO | Zero Shot Object Detection SOTA
Build Computer Vision Applications Faster with Supervision
Переглядів 4,2 тис.Рік тому
Build Computer Vision Applications Faster with Supervision
GPT 4: Will We Ever Train Again?
Переглядів 4,8 тис.Рік тому
GPT 4: Will We Ever Train Again?
Roboflow 6 Minute Intro | Build a Coin Counter with Computer Vision
Переглядів 55 тис.Рік тому
Roboflow 6 Minute Intro | Build a Coin Counter with Computer Vision
YOLOv8 native tracking | Step-by-step tutorial | Tracking with Live Webcam Stream
Переглядів 34 тис.Рік тому
YOLOv8 native tracking | Step-by-step tutorial | Tracking with Live Webcam Stream

КОМЕНТАРІ

  • @SridharanS-vz7re
    @SridharanS-vz7re 5 годин тому

    how to train this model on custom dataset for OCR

  • @geniusxbyofejiroagbaduta8665
    @geniusxbyofejiroagbaduta8665 7 годин тому

    Please sir also tech us how to annotate with it

    • @Roboflow
      @Roboflow 5 годин тому

      You mean how to automatically annotate images?

  • @geniusxbyofejiroagbaduta8665
    @geniusxbyofejiroagbaduta8665 7 годин тому

    Thanks Sir. Please do fine-tuning for Oct, captioning and segmentation task

    • @Roboflow
      @Roboflow 5 годин тому

      Did you tried to run OCR with pre-trained model?

  • @SatyamKumar-cb2mt
    @SatyamKumar-cb2mt 8 годин тому

    Thanks a ton for this awesome video! Every single term is explained so clearly-it's super helpful. I can't wait to dive in the code and start putting this knowledge to use!

    • @Roboflow
      @Roboflow 7 годин тому

      Thanks a lot! I really put an effort and try not to fall into a bias (not assume that people know those things).

  • @bladethirst1
    @bladethirst1 14 годин тому

    Is this applicable to grade handwritten pdf math assignments?

    • @Roboflow
      @Roboflow 12 годин тому

      Florence-2 can be really good at OCR processing of handwritten text. Not sure about math equations. We would need to confirm that.

    • @bladethirst1
      @bladethirst1 8 годин тому

      @@Roboflow {'<DETAILED_CAPTION>': 'In this image we can see a book with some text on it.'} This is the test output of a handwritten math problem deduction, is there someway to get more detailed caption or the OCR output?

  • @suphotnarapong355
    @suphotnarapong355 15 годин тому

    Thank you

  • @VLM234
    @VLM234 23 години тому

    Very informative video. Thanks for making auch a valuable video free of cost. Just one request when your you make tutorials if possible try to do inferencing, training or fine tuning on agricultural or satellite related data.

    • @Roboflow
      @Roboflow 22 години тому

      Next time I will try to find some cool datasets from this domains

  • @Jordufi
    @Jordufi День тому

    Nice video, as usual

  • @abdshomad
    @abdshomad День тому

    I've been waiting for this tutorial for days. Thank you again for being the first to comprehensively review this new model. Super exited! 🎉🥳

    • @Roboflow
      @Roboflow День тому

      As usual you are the first one to comment on the video! Thanks a lot for all the support! 🔥

  • @karthickkuduva9819
    @karthickkuduva9819 День тому

    should do this for every single image in batch . or we can do one image and replace it with batch of image?

  • @danielisflying
    @danielisflying День тому

    Amazing intro to Roboflow and Object detection. Very intuitive platform for dataset creation.

  • @muzammilomarzoy6616
    @muzammilomarzoy6616 День тому

    Awesome! Just Awesome straight to point tutorial.

  • @user-tm8kr8gc7e
    @user-tm8kr8gc7e 2 дні тому

    Very good explanation. One doubt on real time detection. what should be the maximum distance for YOLOv9 to detect the object? can it detect at 30ft or 40 feet could you please provide these details?

  • @m.hassanmaqsood6642
    @m.hassanmaqsood6642 3 дні тому

    I am facing an issue when I try this notebook AttributeError Traceback (most recent call last) <ipython-input-50-314f4612d4ab> in <cell line: 12>() 10 11 # annotators configuration ---> 12 thickness = sv.calculate_dynamic_line_thickness( 13 resolution_wh=video_info.resolution_wh 14 ) AttributeError: module 'supervision' has no attribute 'calculate_dynamic_line_thickness'

  • @uknowngirl2531
    @uknowngirl2531 3 дні тому

    Hi , i have a problem , when training the yolo creates new dataset by itself and trains on that without training on my own custom datasety , how to fix it ?

  • @nidhaljegham8485
    @nidhaljegham8485 5 днів тому

    Hello, how can I get performance metrics such as precision and recall while training and evaluating the model?

  • @navyabaireddy5920
    @navyabaireddy5920 5 днів тому

    All at once in the sense, rather than annotating all the images manually is there any way we could do it faster?

  • @iconolk7338
    @iconolk7338 5 днів тому

    I want to use this project. It works on the hugging face, but strangely it doesn't fit my environment, it doesn't work on my PC. I want to "clone" that on the hugging face, is there a way?

    • @Roboflow
      @Roboflow 5 днів тому

      Yes. HF Spaces work like git. You can clone entire project to your local.

  • @jeffcampsall5435
    @jeffcampsall5435 6 днів тому

    There needs to be correction factor along the path…it’s like drawing the globe on a flat piece of paper. If you watch cars driving away on the right side, their speed is 140 kph and “reduces” to 133 kph: which is very unlikely. I know the trapezoid can be limited to those vehicles closest to the camera but I thought you might like to tweak your algorithm. 👍

    • @Roboflow
      @Roboflow 5 днів тому

      Sure 👍🏻 the whole algo is a bit of simplification as we only have 4 points. If road is not perfectly flat and straight some divisions may occur. Still I think it’s one of the complexity/accuracy tradeoff is okey.

  • @BaireddyNavyace21b029
    @BaireddyNavyace21b029 6 днів тому

    How to annotate all the images at once? I couldn't find that option

    • @Roboflow
      @Roboflow 5 днів тому

      Could you be a bit more precise? What do you mean by „at once”?

  • @awezsheikh8962
    @awezsheikh8962 7 днів тому

    Cant load kagle dataset any another way

    • @Roboflow
      @Roboflow 5 днів тому

      What’s the problem with Kaggle?

    • @awezsheikh8962
      @awezsheikh8962 5 днів тому

      ​@@Roboflow it's showing out of date can't use the loading dataset command. Which is !kaggle competition files -c dfl-bundeslinga-data-shootout | grep clips | head-10 I checked in kaggle it showing no dataset available for dfl- Bundesliga

  • @AgungPrasetyo-gw6ee
    @AgungPrasetyo-gw6ee 8 днів тому

    i get this error: "SupervisionWarnings: The `frame_resolution_wh` parameter is no longer required and will be dropped in version supervision-0.24.0. The mask resolution is now calculated automatically based on the polygon coordinates. SupervisionWarnings: red is deprecated: `Color.red()` is deprecated and will be removed in `supervision-0.22.0`. Use `Color.RED` instead. Traceback (most recent call last): File "f:\PYTHON\smartCCTV-YoloV8\main.py", line 81, in <module> main() File "f:\PYTHON\smartCCTV-YoloV8\main.py", line 58, in main result = model(frame, agnostic_nms=True)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\laragon\bin\python\python-3.11\Lib\site-packages\ultralytics\yolo\engine\model.py", line 58, in __call__ return self.predict(source, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\laragon\bin\python\python-3.11\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\laragon\bin\python\python-3.11\Lib\site-packages\ultralytics\yolo\engine\model.py", line 130, in predict predictor.setup(model=self.model, source=source) File "D:\laragon\bin\python\python-3.11\Lib\site-packages\ultralytics\yolo\engine\predictor.py", line 111, in setup source = str(source or self.args.source) ^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" please help me

  • @ayamnoob4602
    @ayamnoob4602 8 днів тому

    I want to ask if the dataset I have doesn't have a map label, can it be used for this code?

  • @rox4333
    @rox4333 8 днів тому

    Hello, how can I see validation loss?

  • @PlateCatcher
    @PlateCatcher 8 днів тому

    This Cool! You might like PlateCatcher

  • @11aniketkumar
    @11aniketkumar 12 днів тому

    i keep vscode on half screen and other half is for youtube, but your code is not properly visible, it's too small to copy from video, also I don't want github links to supervision and inference but direct link to the script file that you have used in this video.

    • @Roboflow
      @Roboflow 5 днів тому

      github.com/roboflow/supervision/tree/develop/examples/speed_estimation

  • @robparatore1940
    @robparatore1940 12 днів тому

    Are there any footage requirements for inputting into YOLOV8? I am trying to use it for sports analysis and wondered whether you need the whole pitch/tactical wide lens?!how zoomed in does it need to be? Will it capture a ball being hit at really high speeds?

  • @oxydol3456
    @oxydol3456 12 днів тому

    Nice tutorial. The training status has been stuck at "Training machine starting..." for hours. The number of custom images of mine is 10. Each size is around 2mb. jpeg. Could overlapping bounding boxes be the root?

  • @deepakkarmaDK
    @deepakkarmaDK 13 днів тому

    i am getting this error AttributeError: module 'supervision' has no attribute 'BoxAnnotator'

  • @muhammadhilman1045
    @muhammadhilman1045 13 днів тому

    THANK YOUU SO MUCHH BROO 👍👍👍👍👍👍

  • @chandanchakma2875
    @chandanchakma2875 13 днів тому

    i want to learn AI .please make a playlist ..

  • @arpitapujapanda8415
    @arpitapujapanda8415 13 днів тому

    Can you make a video on person re identification.

  • @arpitapujapanda8415
    @arpitapujapanda8415 13 днів тому

    Can you make a video on person re identification.

  • @arpitapujapanda8415
    @arpitapujapanda8415 13 днів тому

    Can you make a video on person re identification.

  • @arpitapujapanda8415
    @arpitapujapanda8415 13 днів тому

    Can you make a video on person re identification.

  • @arpitapujapanda8415
    @arpitapujapanda8415 13 днів тому

    Can you make a video on person re identification.

  • @arpitapujapanda8415
    @arpitapujapanda8415 13 днів тому

    Can you make a video on person re identification.

  • @rawansaid3072
    @rawansaid3072 13 днів тому

    ita a best

  • @anz918
    @anz918 14 днів тому

    How do I add speed detection in this

  • @Maxik1787
    @Maxik1787 14 днів тому

    Very nice!

  • @fredericocaixeta9015
    @fredericocaixeta9015 14 днів тому

    Hello, Piotr Skalski! Hello everyone... I am diving a little into the code here... 😁 Quick question - how do I add an image into a detection-box from Supervision? Thanks

  • @abdultaj3754
    @abdultaj3754 16 днів тому

    Hey iam using yolov9 to detect diease and i wnat to use paleema gemma to give me features and generate Pdf report is this fine

  • @thesuriya_3
    @thesuriya_3 16 днів тому

    can you do as well as in VQA😊 using pytorch code

  • @nithinpb7042
    @nithinpb7042 17 днів тому

    Can we get an updated version of this? Training tensorflow with roboflow

  • @JemiloII
    @JemiloII 17 днів тому

    The fact that you care about licensing that helps us all out, instant subscribe. Pretty tired of seeing AGPL Licensed code/models being used over and over.

  • @ROHITHJVECE
    @ROHITHJVECE 18 днів тому

    detections = sv.Detections.from_yolov8(result) AttributeError: type object 'Detections' has no attribute 'from_yolov8' error

  • @alirezaee
    @alirezaee 19 днів тому

    Great work, thank you for sharing!

  • @uttamdwivedi7709
    @uttamdwivedi7709 19 днів тому

    Great tutorial @roboflow !!! Is it possible to train same model for VQA as well as object detection? Can you provide any example of how the JSONL file should look like in such cases?

    • @Roboflow
      @Roboflow 19 днів тому

      We will soon add support for VQA datasets in roboflow. I plan to roll out tutorials covering this topic soon.

  • @safiraghulam1862
    @safiraghulam1862 19 днів тому

    Hi, Can I fine-tune the model on a medical dataset? Currently, the model is not performing well on this data, and the results indicate that it is not suitable for medical data out-of-the-box. If I fine-tune the model on my dataset, which consists of approximately 200 to 300 images, will it work better? Additionally, is it possible to quantize this model to reduce its size from 3B to something smaller without significantly compromising its performance? Thank you.

    • @Roboflow
      @Roboflow 19 днів тому

      It is possible to fine-tune on medical images I done that on several use-cases like tumors detection.

    • @safiraghulam1862
      @safiraghulam1862 18 днів тому

      @@Roboflow okay

  • @moritzh7118
    @moritzh7118 20 днів тому

    Around 1:06:46 you present the non max merging in supervision. So far it only works for bboxes? Would be cool to have something for the masks from instance segmentation as well. But thanks for the hint

    • @Roboflow
      @Roboflow 20 днів тому

      Yes. For now we have that working only with boxes. But long therm plan is to roll it out for masks as well ;)