Need help with setting up Tensorboard for RVC!
Hello all, I need some help with Runpod. I am trying to get tensorboard to work when using either a secure or community GPU pod. I have no idea how to get tensorboard working. Do I need an SSH Server? I was trying to follow this guide - https://blog.runpod.io/how-to-achieve-true-ssh-on-runpod/ but I have no idea where I get a public key. If anyone here is experienced with tensorboard and knows how to make it work on Runpod, then I'd be grateful.
When I do get it install and running on a GPU pod, it gives me a local host link (which is something I cannot use on a remote server)
I will be playing around with it for just a few more minutes, before I just give up.
RunPod Blog
How to Achieve True SSH in RunPod
RunPod allows you to get a terminal access pretty easily, but it does not run a true SSH daemon by default. There are plenty of use cases, like needing to SCP or connecting an IDE that would warrant running a true SSH daemon inside the pod. We'll go through the
59 Replies
https://discord.com/channels/912829806415085598/1194711850223415348
I just updated my resource for this! LOL
GREAT TIMING
@Smack Me Harder ❤ can read my github repo the SSH section
and not do the whole how to achieve true SSH setup make ur own private keys / public key stuff, and just use their CLI tool
@justin okay, guessing it is this link - https://github.com/justinwlin/Runpod-Tips-and-Tricks/tree/main/SSH%20On%20Runpod will be reading it and see if it works. This is all new to me and I am not very familair with the process. Will let you know whether it is working or not. Thank you.
GitHub
Runpod-Tips-and-Tricks/SSH On Runpod at main · justinwlin/Runpod-Ti...
Runpod Tips and tricks repository. Contribute to justinwlin/Runpod-Tips-and-Tricks development by creating an account on GitHub.
@justin are you familiar with what Tensorboard is? If not, this might not be able to help. Still reading through it.
@justin not sure what to do here
this isn't working. I don't know what I am doing, so I am just going to close the pod and wait for an answer from someone
@Smack Me Harder ❤ that isnt ur pod id
ur pod id is this
underneath ur pod has an ID: <POD ID>
No not familiar with it
Yeah, I appreciate you helping. I need Tensoboard to monitor the training of an AI model. I get a local host link that I cannot open because the connection is refusing to connect (guessing it is for security reasons). I figured based off google searches I needed an SSH server to host the tensorboard link, so I can actually use it. Guess tensorboard can't be used for Runpod.
no u can
what are u using as ur template?
lol
u can definitely open local host
and connect to it
if tensorboard can run on linux
it can run on runpod
If you know the steps on how to set it up on a fresh machine
you can use a runpod pytorch template
when u connect to it will open a jupyter notebook
on port 8888
Normally, I use pytorch 2.0.1
and run whatever terminal commands and stuff u need for additional stuff
yea
yea use the runpod template?
if ur not already
then u can get a nice web gui
to do whatever u need there
then do I try --port 8888?
dont worry about that I guess first
first just get it running lol
and then if u get it launched
u can restart the pod, get it running, and do the rebindings later
https://discord.com/channels/912829806415085598/1207848538629742623
here is an example I have for Ollama which launches on some port
for a backend server
and i binded to it
Expose ports | RunPod Documentation
Learn to expose your ports.
but ur current situation id just start up a pytorch templaste
when u do the connect button, it will have a jupyter labs button u can connect to
and probably if u know how to install it on some other machine
im guessing u used terminal /
some jupyter lab
since this sounds related to tensorflow
go through the setup on runpod
get it just running somewhere
and worry about binding and hosting laster
https://ngrok.com/
If needbe an easier way if u do get something through a public link is also ngrok
ngrok | Unified Application Delivery Platform for Developers
ngrok is a secure unified ingress platform that combines your global server load balancing, reverse proxy, firewall, API gateway and Kubernetes Ingress Controller to deliver applications and APIs.
i saw someone else use this
then u dont need to mess with port configurations and stuff
and just let it tunnel your traffic in and out for u
if u get somethig running on some port
it sounds like
@justin It's sort of similair to this - https://stackoverflow.com/questions/38464559/how-to-locally-view-tensorboard-of-remote-server but it's for RVC - https://github.com/Mangio621/Mangio-RVC-Fork If you look at the first link, you'll see some ssh stuff and where they talk about ports. I just added that 6006 port under TCP (may not be right idk). If I were to do tensorboard --logdir logs --port 6006 (in the directory where the folder is) then I am hoping that it will open the page up. I am basically doing this all blind. Now, for months I was just training without the tensorboard (couldn't get working), and now I am trying to see if it is actually possible. So, while I don't necessarily need it, it would be helpful for monitoring the training process. Otherwise, I just launch my python script. Open the web-ui for RVC and start training.
Stack Overflow
How to locally view tensorboard of remote server
Using my own laptop to run Tensorflow on remote server of lab
I used tensorboard --logdir=./log try to view curves of the running results
I got:
Starting TensorBoard on port 6006 (You can
Starting TensorBoard on port 6006 (You can
GitHub
GitHub - Mangio621/Mangio-RVC-Fork: CREPE+HYBRID TRAINING A very ...
CREPE+HYBRID TRAINING A very experimental fork of the Retrieval-based-Voice-Conversion-WebUI repo that incorporates a variety of other f0 methods, along with a hybrid f0 nanmedian method. - Mangi...
Will look at the ports exposed link real quick
i see, to be honest. yea. idk. if u can run a script and if it does bind on port 6006, then u would just follow the expose port guide on runpod for the TCP
as long as its binded to 0.0.0.0:6006 specifically
if it is on 1.127.0.0 this doesnt mean to connect to outside networking ports
i am not too familiar with entworking
that is just what ik
lol
lol all good. Going to see if the tcp port 6006 works and if not, then I'll just go back to how I use runpod normally. Appreciate it.
you do not need true ssh for tensorboard
Yeah tensorboard is a web app, not a terminal app
@Madiator2011 [EU] @ashleyk Well idk how to launch it/get it working. If either of you do, then I'd appreciate it if you can explain how to get it working. Like I've said before, I go to start tensorboard. It gives me a localhost link, I then click on it and it tells me connection refused.
Did you added --bind_all
I did.
I tried changing the port too
same thing
tensorboard --logdir=path/to/your/log-directory --bind_all
then press the button on connect page
I can try that. I did go into the folder where the log folder is (so in this case it would be Mangio-RVC-Fork folder) and then I did tensorboard --logdir logs --bind_all (gave me the same issue. I will do a quick test to see if it works (Could be something I am doing wrong, so no promises it'll work)
btw you should not use link you get in terminal
I literally spent hours trying to get it to work just to see if it can because of RVC training. My only option I see to use tensorboard is to go to Paperspace and follow this guide https://aihubdocs.github.io/en/rvc/cloud/training/paperspace/ (Which if you scroll down, you'll see the tensorboard section) I did try this on RunPod, it didn't work.
Paperspace
Last update: Feb 10, 2024
@Madiator2011 [EU] then how would I launch it after starting it?
I did many RVC trainings
and use the tensorboard on RunPod?
not used tensorboard that much but I did some times
I also tried Applio (Github) and whenever I try to start training it tells me something about wavs not found or valid, even though they were placed in the right place where I told it). I do the training on RunPod (Faster than Colab, but Tensorboard would be nice if possible)
Like I said, not a big deal. It does help tho.
are you using custom template?
I mainly stick with this
works fine here
you want to edit pod or template and expose port 6006 like this
ah ok let me try that real quick on a new pod
then install tensorboard with pip
Then in terminal you can start it with command
Make sure to run in temux or screen so it wont get killed.
Then on connect page click 6006 button
@Madiator2011 [EU] Alright, thank you. I'll let you know in a few minutes if it works.
"Make sure to run in temux or screen" - Could you explain this a little bit more?
NetworkChuck
YouTube
you need to learn tmux RIGHT NOW!!
Spin up your next project with Linode: https://ntck.co/linode –You get a $100 Credit good for 60 days as a new user!
I just started using Tmux……it’s amazing! If you use a terminal or CLI in any capacity Tmux will 10x your productivity in 10 seconds. From creating multiple panes and windows with ease to leaving your terminal sessions active as...
Should I do --port 6006 as well or just what you put
nope as 5005 is defoult port
same thing
do not use link from terminal
got you sorry
it's working
just going to see if it loads data when I train, but this perfect thank you.
wow congrats!
@Madiator2011 [EU] on 🔥 haha
actual legend
what is rvc?
@Smack Me Harder ❤ it's late for me but if you get issues remaind me tomorrow
Thank you. Will do.
It's a AI model training service/software (don't know the right term). Basically install some stuff from github + some needed packages, and then you throw in .wav files of what you want to clone.
I usually go with batch size 10 A6000 GPU and 250 epochs
Batch size 8, 16 or 20 depends on the GPU.
Alright, it's working. You can consider this solved. Again, thank you for helping me solve this.