Weil Jimmer's BlogWeil Jimmer's Blog


Category:Functions

Found 36 records. At Page 2 / 8.

PHP 利用 cUrl 抓取 HTTPS/HTTP 網頁

No Comments
-
發布於 2016-04-25 20:16:50

最近不知道要發什麼,這篇這是純當筆記用。

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $data);
if (stripos($data,'https://')===0){
	//網址是https,設定SSL。
	curl_setopt($ch, CURLOPT_SSLVERSION,CURL_SSLVERSION_DEFAULT);
	curl_setopt($ch,CURLOPT_SSL_VERIFYHOST,0);
	curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,0);
}
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36');
curl_setopt($ch, CURLOPT_MAXREDIRS, 999);
$html = curl_exec($ch);
$header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
$header = substr($html, 0, $header_size);
$html_body = substr($html, $header_size);
curl_close($ch);

 


This entry was posted in General, Functions, The Internet, Note, PHP By Weil Jimmer.

超商繳費儲錢進去Paypal帳號

No Comments
-
更新於 2017-10-11 21:43:51

此方法已過時!請勿使用!未來將會可能重新更新此文章!


首先,這個方法透過不少外國管道!請別擔心。

此方法僅提供一個讓你透過超商繳費的方法可以付款用於大多數線上網站。


第一步先註冊BitoEx的帳號:https://www.bitoex.com

註冊帳號是為了購買全球通用的國際虛擬貨幣比特幣。

而比特幣可以在 BitoEx 購買到。建議購買至少 500 NT$。(可全家超商繳費付款,目前只能"全家"!)

買比特幣教學參考:https://www.bitoex.com/fami?locale=zh-tw

購買完成之後,通常會收到簡訊。如下:

然後,請先開好瀏覽器,登入進去BitoEx,接著點手機裡收到的連結。依照指示儲金額進入BitoEx。

兌換到我的錢包就可以了!(畫面僅供參考)

然後就去 Cryptopay 註冊帳號:https://cryptopay.me/join/f526d895

(使用上面的連結買金融卡可以打 75 折。)

註冊好帳號後,記得驗證 Email,然後登入,就會看到如下介面。

這四組比特幣地址都是你自己帳號的錢,只是貨幣單位不同,而且轉換過去後,就不會被比特幣價格升降而影響到幣值大小。建議直接複製 Bitcoin Account 的地址。之後要儲錢的話,可以直接寄到 USD Account。

就可以直接透過 BitoEx 寄送金額到這個國外網站。

這步驟!請注意!比特幣寄送的過程需要耗費十五分鐘至數小時,可能要等待很長一段時間才會入帳!

然後可以開始購買虛擬卡。

個人建議是買 USD 的虛擬卡拉,有其他需求就依照你自己的想法。

填入好基本資料後就可以送出了,基本上是可以亂填的。

然後選擇剛剛儲進金額的 Bitcoin Account,除非你儲在別的地方,切換到有錢的帳號即可。

接著購買完成後,就可以正式來使用了。先儲錢進去虛擬卡。

建議點 Maximum amount,直接儲最大金額,帳戶類別一樣要選有錢的帳戶,最後你就可以擁有比特幣的金融卡了!

點那個按鈕可以知道你的卡號和安全碼及有效年月。

就會有紀錄在你的交易項目了!而那四位數字就是你的驗證碼。刷新Paypal,填入四位數字驗證碼就完成驗證了!

接著你就可以利用 Paypal進行匿名全球交易,而且只靠"超商繳費"與假身分。

最後我還是要說一件重要的事情,每個月維持卡的費用是1美金,以及這張卡有一些使用上限,可以參考:https://cryptopay.me/bitcoin-debit-card

基本上,使用上限是 2500 €/£/$ ,除非你驗證你的帳號,上傳身分驗證。

以上。

By Weil Jimmer


This entry was posted in General, Experience, Functions, The Internet, Note By Weil Jimmer.

利用 Flickr API 取得原始圖片網址

No Comments
-
更新於 2017-05-19 18:53:20

鑒於網站上的空間有限,存放圖片又非常不方便,存本站會耗費空間與流量,存外站就不會。而 Flickr 支持 1 TB 的免費圖片空間!還支援原圖外連,這對我來說,非常好用!

為了把外站同步相簿到本站,需要研讀一下 Flickr API ,請參考:
https://www.flickr.com/services/api/

由於本教學是不需要"簽發授權的",所以原圖只能是上傳者公開,才能取得,否則只能其他解析度的圖片。但拿不到原圖,基本上還是很高清的。

不然只能用Flickr的OAuth取得Token,不過太麻煩我就不講了。可以參考:
https://www.flickr.com/services/api/explore/flickr.photosets.getPhotos

必須要自己設定成公開,API才可以取得原始圖片。

第一步、要先建立應用程式,https://www.flickr.com/services/apps/create/apply/

隨便填一下資料後,可以取得 應用程式的 key 值與金鑰。

然後使用:http://idgettr.com/

輸入自己個人頁面網址,可以取得 自己的 User ID,長得像是: 123456789@N01。

接著用「程式」訪問 Flickr 的 API 特殊網址 (要記得變更網址 { } 大括號夾住部分):

API_KEY,USER_ID 請照上述步驟取得
https://api.flickr.com/services/rest/?method=flickr.photosets.getList&api_key={API_KEY}&user_id={USER_ID}&format=json

可以得到如下圖的頁面:這是JSON格式的資料。請尋找您要取得網址的相簿ID。(長得像是下圖橘色所示)

然後再用程式訪問

{API_KEY},{USER_ID},{PHOTOSET_ID}記得替換
https://api.flickr.com/services/rest/?method=flickr.photosets.getPhotos&api_key={API_KEY}&user_id={USER_ID}&photoset_id={PHOTOSET_ID}&extras=url_o&format=json

就可取得 原始 資料。

範例程式碼:

<textarea name="contents" id="contents" style="resize:vertical;height:150px;font-size:12pt;"></textarea>
<script type="text/javascript">
var http = new XMLHttpRequest("Microsoft.XMLHTTP");
var url_='https://api.flickr.com/services/rest/?method=flickr.photosets.getPhotos&api_key={API_KEY}&user_id={USER_ID}&photoset_id={PHOTOSET_ID}&extras=url_o&format=json';
function connect(){
	http.onreadystatechange = function(){
		if (http.readyState==4){
			if (http.status==200){
				var str = http.responseText.toString();
				str = str.substring(14,str.length-1);
				var obj = JSON.parse(str);
				var html_code_to_load = "";
				for (var i=0;i<obj['photoset']['photo'].length;i++){
					html_code_to_load += obj['photoset']['photo'][i]['url_o'] + "\n";
				}
				document.getElementById('contents').innerHTML=html_code_to_load.substring(0,html_code_to_load.length-1);
			}else{
				//Connect Failed.
			}
		}
	}
	http.open("GET",url_,true);
	http.send();
}
connect();
</script>

By Weil Jimmer


This entry was posted in General, Experience, Functions, HTML, JS, XML By Weil Jimmer.

PHP string to html characters 字串轉HTML碼

No Comments
-
更新於 2016-01-11 20:09:41

緣由:

最近因為裝了PHP7的緣故,又加上有一點點的閒,故研究PHP這塊。尤其是運算耗時方面,因為我本身就是要測試PHP7可以快到何種地步。聽說是快20倍的樣子。

結果,到最後我反而開始寫程式碼,昨天修正BBcode檢查的函數,運行約一萬字的代碼,結果速度慢到極致,被我修改後,倒加速不少,總速度提升82倍,原本12.3秒,修正到0.15秒(如果僅比較檢查BBcode的函數,加快上萬倍。),後來遇上了這個轉HTMLcode的問題,原先寫得太慢了,這是我第三次修正!

//目前我寫的最快的版本。

function string_to_utf16_ascii_HTML_new2($str){
	$str=mb_convert_encoding($str, 'ucs-2', 'utf-8');
	for($i=0;$i<strlen($str);$i+=2){
		$str2.='&#'.str_pad((ord($str[$i])*256+ord($str[$i+1])),5,'0',STR_PAD_LEFT).';';
	}
	return $str2;
}

//稍慢的版本(舊版)

function string_to_utf16_ascii_HTML_new($str){
	for ($i=0;$i<strlen($str);$i++){
		$k=ord($str[$i]);
		if ($k >= 192 and $k <= 223){
			$k=$k-192;
			$k2=ord($str[$i+1])-128;
			$c=$c.'&#'.str_pad($k*64+$k2,5,'0',STR_PAD_LEFT).';';
		}elseif ($k >= 224 and $k <= 239){
			$k=$k-224;
			$k2=ord($str[$i+1])-128;
			$k3=ord($str[$i+2])-128;
			$c=$c.'&#'.str_pad($k*4096+$k2*64+$k3,5,'0',STR_PAD_LEFT).';';
		}elseif ($k >= 128 and $k <= 191){
		}else{
			$c=$c.'&#'.str_pad($k,5,'0',STR_PAD_LEFT).';';
		}
	}
	return $c;
}

//極慢的版本(初版)

function string_to_utf16_ascii_HTML($str){
	for ($i=0;$str[$i]!="";$i++){
		$b[]=ord($str[$i]);
	}
	$str2=',';
	for ($i=0;$str[$i]!="";$i++){
		$xa=str_pad(decbin($b[$i]),8,'0',STR_PAD_LEFT);
		if (substr($xa,0,3) == "110"){
			$xb=str_pad(decbin($b[$i+1]),8,'0',STR_PAD_LEFT);
			$xx1=substr($xa,3);
			$xx2=substr($xb,2);
			$c=$c.$str2.str_pad(bindec($xx1.$xx2),5,'0',STR_PAD_LEFT);
		}elseif (substr($xa,0,4) == "1110"){
			$xb=str_pad(decbin($b[$i+1]),8,'0',STR_PAD_LEFT);
			$xc=str_pad(decbin($b[$i+2]),8,'0',STR_PAD_LEFT);
			$xx1=substr($xa,3);
			$xx2=substr($xb,2);
			$xx3=substr($xc,2);
			$c=$c.$str2.str_pad(bindec($xx1.$xx2.$xx3),5,'0',STR_PAD_LEFT);
		}elseif (substr($xa,0,2) == "10"){
		}else{
			$c=$c.$str2.str_pad($b[$i],5,'0',STR_PAD_LEFT);
		}
	}
	return substr($c,1);
}

//將HTMLcode轉回UTF-8字串。

function utf16_ascii_HTML_to_string($str){
	return mb_convert_encoding($str, 'UTF-8', 'HTML-ENTITIES'); 
}

/********

使用方法:

*********

$str='這是測試123';
$encode_str=string_to_utf16_ascii_HTML_new2($str);
echo $encode_str;

//output:
//&#36889;&#26159;&#28204;&#35430;&#00049;&#00050;&#00051;
//以瀏覽器顯示,等同於「這是測試123」。

echo utf16_ascii_HTML_to_string($encode_str);

//output:
//這是測試123

*/

 


This entry was posted in General, Experience, Functions, Note, PHP By Weil Jimmer.

Python 進階下載器

No Comments
-
更新於 2015-11-20 18:47:40

我會寫這個,一來是為了讓朋友羨慕也想學程式,二來是我自己要用。我不會閒閒開發一個我用不到的東西。總是開發者要支持一下自己的產品嘛。

這是用Python3寫成的程式,主要針對手機而使用。需要安裝BeautifulSoup4。

(在Windows下的命令提示字元顯示很醜,沒有顏色,實際上在手機Linux終端機裡面跑的時候,會有顏色。)

主要支援以下功能:

一、抓取目標URL的目標連結

例如我要抓取IG某個頁面的所有圖片。

二、載入網頁列表,抓取目標連結

例如:載入某網站相簿的第一頁,抓取圖片,然後載入第二頁,抓取圖片,載入第三頁……以此類推。

三、規律網址抓取,這個算是最低階的方法吧。

例如:下載http://example.com/1.jpg,下載http://example.com/2.jpg,下載http://example.com/3.jpg,下載/4.jpg下載/5.jpg……

四、顯示目標清單

五、下載清單上的連結

至於抓圖功能,我可以稱進階抓圖器是沒有講假的,雖然還比不上我用VB.NET寫出來的 強大。那種仿一般正常用戶框架又有COOKIE、HEADER、還解析JS,Python很難辦得到。

所以,頂多次級一點。

支援:

一、抓取頁面上所有「看起來是網址」的連結。(即便它沒有被鑲入在任何標籤內)(採用正規表達式偵測)

二、抓取A標籤的屬性HREF。(超連結)

三、抓取IMG標籤的屬性SRC。(圖片)

四、抓取SOURCE標籤的屬性SRC。(HTML5的audio、movie)

五、抓取EMBED標籤的SRC屬性。(FLASH)

六、抓取OBJECT標籤的DATA屬性。(網頁插件)

七、LINK標籤的HREF屬性。(CSS)

八、SCRIPT標籤的SRC屬性。(JS)

九、FRAME標籤的SRC屬性。(框架)

十、IFRAME標籤的SRC屬性。(內置框架)

十一、以上全部。

十二、自訂抓取標籤名稱與屬性名稱。(這個我VB板的進階抓圖器沒有這項功能)

支援 過濾關鍵字,包刮AND、OR邏輯閘,一定要全部包刮關鍵字,或是命中其一關鍵字。

規律網址下載則支援,起始數字、終止數字、每次遞增多少、補位多少。

※這個有相對位置的處理。

****************************************

* 名稱:進階下載器

* 團隊:White Birch Forum Team

* 作者:Weil Jimmer

* 網站:http://0000.twgogo.org/

* 時間:2015.09.26

****************************************

Source Code

# coding: utf-8
"""Weil Jimmer For Safe Test Only"""
import os,urllib.request,shutil,sys,re
from threading import Thread
from time import sleep
from sys import platform as _platform

GRAY = "\033[1;30m"
RED = "\033[1;31m"
LIME = "\033[1;32m"
YELLOW = "\033[1;33m"
BLUE = "\033[1;34m"
MAGENTA = "\033[1;35m"
CYAN = "\033[1;36m"
WHITE = "\033[1;37m"
BGRAY = "\033[1;47m"
BRED = "\033[1;41m"
BLIME = "\033[1;42m"
BYELLOW = "\033[1;43m"
BBLUE = "\033[1;44m"
BMAGENTA = "\033[1;45m"
BCYAN = "\033[1;46m"
BDARK_RED = "\033[1;48m"
UNDERLINE = "\033[4m"
END = "\033[0m"

if _platform.find("linux")<0:
	GRAY = ""
	RED = ""
	LIME = ""
	YELLOW = ""
	BLUE = ""
	MAGENTA = ""
	CYAN = ""
	WHITE = ""
	BGRAY = ""
	BRED = ""
	BLIME = ""
	BYELLOW = ""
	BBLUE = ""
	BMAGENTA = ""
	BCYAN = ""
	UNDERLINE = ""
	END = ""
	os.system("color e")

try:
    import pip
except:
	print(RED + "錯誤沒有安裝pip!" + END)
	input()
	exit()

try:
    from bs4 import BeautifulSoup
except:
	print(RED + "錯誤沒有安裝bs4!嘗試安裝中...!" + END)
	pip.main(["install","beautifulsoup4"])
	from bs4 import BeautifulSoup

global phone_
phone_ = False

try:
	import android
	droid = android.Android()
	phone_ = True
except:
	try:
		import clipboard
	except:
		print(RED + "錯誤沒有安裝clipboard!嘗試安裝中...!" + END)
		pip.main(["install","PyGTK"])
		pip.main(["install","clipboard"])
		import clipboard

def get_clipboard():
	global phone_
	if phone_==True:
		return str(droid.getClipboard().result)
	else:
		return clipboard.paste()

global target_url
target_url = [[],[],[],[],[],[],[],[],[]]

def __init__(self):
	print("")

print (RED)
print ("*" * 40)
print ("*  Name:\tWeil_Advanced_Downloader")
print ("*  Team:" + LIME + "\tWhite Birch Forum Team" + RED)
print ("*  Developer:\tWeil Jimmer")
print ("*  Website:\thttp://0000.twgogo.org/")
print ("*  Date:\t2015.10.09")
print ("*" * 40)
print (END)

root_dir = "/sdcard/"
print("根目錄:" + root_dir)
global save_temp_dir
global save_dir
save_dir=str(input("存檔資料夾:"))
save_temp_dir=str(input("暫存檔資料夾(會自動刪除):"))

global target_array_index
target_array_index = 0

def int_s(k):
	try:
		return int(k)
	except:
		return -1

def reporthook(blocknum, blocksize, totalsize):
	readsofar = blocknum * blocksize
	if totalsize > 0:
		percent = readsofar * 1e2 / totalsize
		s = "\r%5.1f%% %*d / %d bytes" % (percent, len(str(totalsize)), readsofar, totalsize)
		sys.stderr.write(s)
		if readsofar >= totalsize:
			sys.stderr.write("\r" + MAGENTA + "%5.1f%% %*d / %d bytes" % (100, len(str(totalsize)), totalsize, totalsize))
	else:
		sys.stderr.write("\r未知檔案大小…下載中…" + str(readsofar) + " bytes")
		#sys.stderr.write("read %d\n" % (readsofar,))

def url_encode(url_):
	if url_.startswith("http://"):
		return 'http://' + urllib.parse.quote(url_[7:])
	elif url_.startswith("https://"):
		return 'https://' + urllib.parse.quote(url_[8:])
	elif url_.startswith("ftp://"):
		return 'ftp://' + urllib.parse.quote(url_[6:])
	elif ((not url_.startswith("ftp://")) and (not url_.startswith("http"))):
		return 'http://' + urllib.parse.quote(url_)
	return url_

def url_correct(url_):
	if ((not url_.startswith("ftp://")) and (not url_.startswith("http"))):
		return 'http://' + (url_)
	return url_

def download_URL(url,dir_name,ix,total,encode,return_yes_no):
	global save_temp_dir
	prog_str = "(" + str(ix) + "/" + str(total) + ")"
	if (total==0):
		prog_str=""
	file_name = url.split('/')[-1]
	file_name=file_name.replace(":","").replace("*","").replace('"',"").replace("\\","").replace("|","").replace("?","").replace("<","").replace(">","")
	if file_name=="":
		file_name="NULL"
	try:
		print(YELLOW + "下載中…" + prog_str + "\n" + url + "\n" + END)
		if not os.path.exists(root_dir + dir_name + "/") :
			os.makedirs(root_dir + dir_name + "/")
		opener = urllib.request.FancyURLopener({})
		opener.version = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36'
		opener.addheader("Referer", url)
		opener.addheader("X-Forwarded-For", "0.0.0.0")
		opener.addheader("Client-IP", "0.0.0.0")
		local_file,response_header=opener.retrieve(url_encode(url), root_dir + dir_name + "/" + str(ix) + "-" + file_name, reporthook)
		print(MAGENTA + "下載完成" + prog_str + "!" + END)
	except urllib.error.HTTPError as ex:
		print(RED + "下載失敗" + prog_str + "!" + str(ex.code) + END)
	except:
		print(RED + "下載失敗" + prog_str + "!未知錯誤!" + END)
	if return_yes_no==0:
		return ""
	try:
		k=open(local_file,encoding=encode).read()
	except:
		k="ERROR"
		print(RED + "讀取失敗!" + END)
	try:
		if dir_name==save_temp_dir:
			shutil.rmtree(root_dir + save_temp_dir + "/")
	except:
		print(RED + "刪除暫存資料夾失敗!" + END)
	return k

def check_in_filter(url_array,and_or,keyword_str):
	if keyword_str=="":
		return url_array
	url_filter_array = []
	s = keyword_str.split(',')
	for array_x in url_array:
		ok = True
		for keyword_ in s:
			if str(array_x).find(keyword_)>=0:
				if and_or==0:
					url_filter_array.append(array_x)
					ok=False
					break
			else:
				if and_or==1:
					ok=False
					break
		if ok==True:
			url_filter_array.append(array_x)
	return url_filter_array

def handle_relative_url(handle_url,ori_url):
	handle_url=str(handle_url)
	if handle_url=="":
		return ori_url
	if handle_url.startswith("?"):
		temp_form_url = ori_url
		search_A = ori_url.find("?")
		if search_A<0:
			return ori_url + handle_url
		else:
			return ori_url[0:search_A] + handle_url
	if handle_url.startswith("//"):
		return "http:" + handle_url
	if (handle_url.startswith("http://") or handle_url.startswith("https://") or handle_url.startswith("ftp://")):
		return handle_url
	root_url = ori_url
	search_ = root_url.find("//")
	if search_<0:
		return handle_url
	search_x = root_url.find("/", search_+2);
	if (search_x<0):
		root_url = ori_url
	else:
		root_url = ori_url[0:search_x]
	same_dir_url = ori_url[search_+2:]
	search_x2 = same_dir_url.rfind("/")
	if search_x2<0:
		same_dir_url = ori_url
	else:
		same_dir_url = ori_url[0:search_x2+search_+2]
	if handle_url.startswith("/"):
		return (root_url + handle_url)
	if handle_url.startswith("./"):
		return (same_dir_url + handle_url[1:])
	return (same_dir_url + "/" + handle_url)

def remove_duplicates(values):
	output = []
	seen = set()
	for value in values:
		if value not in seen:
			output.append(value)
			seen.add(value)
	return output

def get_text_url(file_content):
	url_return_array = re.findall('(http|https|ftp)://([\w+?\.\w+])+([a-zA-Z0-9\~\!\@\#\$\%\^\&amp;\*\(\)_\-\=\+\\\/\?\.\:\;\'\,]*)?', file_content)
	return url_return_array

def get_url_by_tagname_attribute(file_content,tagname,attribute,url_):
	soup = BeautifulSoup(file_content,'html.parser')
	url_return_array = []
	for link in soup.find_all(tagname):
		if link.get(attribute)!=None:
			url_return_array.append(handle_relative_url(link.get(attribute),url_))
	return url_return_array

def get_url_by_targetid_attribute(file_content,tagname,attribute,url_):
	soup = BeautifulSoup(file_content,'html.parser')
	url_return_array = []
	for link in soup.find_all(id=tagname):
		if link.get(attribute)!=None:
			url_return_array.append(handle_relative_url(link.get(attribute),url_))
	return url_return_array

def get_url_by_targetname_attribute(file_content,tagname,attribute,url_):
	soup = BeautifulSoup(file_content,'html.parser')
	url_return_array = []
	for link in soup.find_all(name=tagname):
		if link.get(attribute)!=None:
			url_return_array.append(handle_relative_url(link.get(attribute),url_))
	return url_return_array

def run_functional_get_url(way_X,html_code,target_array_index,and_or,keywords,ctagename,cattribute):
	global target_url
	if (way_X==1):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_text_url(html_code)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==2):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"a","href",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==3):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"img","src",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==4):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"source","src",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==5):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"embed","src",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==6):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"object","data",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==7):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"link","href",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==8):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"script","src",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==9):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"frame","src",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==10):
		ori_size=len(target_url[target_array_index])
		get_array_ = get_url_by_tagname_attribute(html_code,"iframe","src",temp_url)
		target_url[target_array_index].extend(get_array_)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==11):
		ori_size=len(target_url[target_array_index])
		get_array_1 = get_text_url(html_code)
		get_array_2 = get_url_by_tagname_attribute(html_code,"a","href",temp_url)
		get_array_3 = get_url_by_tagname_attribute(html_code,"img","src",temp_url)
		get_array_4 = get_url_by_tagname_attribute(html_code,"source","src",temp_url)
		get_array_5 = get_url_by_tagname_attribute(html_code,"embed","src",temp_url)
		get_array_6 = get_url_by_tagname_attribute(html_code,"object","data",temp_url)
		get_array_7 = get_url_by_tagname_attribute(html_code,"link","href",temp_url)
		get_array_8 = get_url_by_tagname_attribute(html_code,"script","src",temp_url)
		get_array_9 = get_url_by_tagname_attribute(html_code,"frame","src",temp_url)
		get_array_10 = get_url_by_tagname_attribute(html_code,"iframe","src",temp_url)
		target_url[target_array_index].extend(get_array_1)
		target_url[target_array_index].extend(get_array_2)
		target_url[target_array_index].extend(get_array_3)
		target_url[target_array_index].extend(get_array_4)
		target_url[target_array_index].extend(get_array_5)
		target_url[target_array_index].extend(get_array_6)
		target_url[target_array_index].extend(get_array_7)
		target_url[target_array_index].extend(get_array_8)
		target_url[target_array_index].extend(get_array_9)
		target_url[target_array_index].extend(get_array_10)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==12):
		ori_size=len(target_url[target_array_index])
		get_array_custom = get_url_by_tagname_attribute(html_code,ctagename,cattribute,temp_url)
		target_url[target_array_index].extend(get_array_custom)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==13):
		ori_size=len(target_url[target_array_index])
		get_array_custom = get_url_by_targetid_attribute(html_code,ctagename,cattribute,temp_url)
		target_url[target_array_index].extend(get_array_custom)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)
	elif(way_X==14):
		ori_size=len(target_url[target_array_index])
		get_array_custom = get_url_by_targetname_attribute(html_code,ctagename,cattribute,temp_url)
		target_url[target_array_index].extend(get_array_custom)
		target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
		target_url[target_array_index]=check_in_filter(target_url[target_array_index],and_or,keywords)
		print( LIME + "抓取完成!共抓取到:" + str(len(target_url[target_array_index])-ori_size) + "個URL" + END)

while True:	
	while True:
		method_X=int_s(input("\n\n" + CYAN + "要執行的動作:\n(1)抓取目標網頁資料\n(2)載入網址列表抓取資料\n(3)規律網址抓取\n(4)顯示目前清單\n(5)下載目標清單\n(6)清空目標清單\n(7)複製清單\n(8)刪除目標清單的特定值\n(9)從剪貼簿貼上網址(每個一行)" + END + "\n\n"))
		if method_X<=9 and method_X>=1:
			break
		else:
			print("輸入有誤!\n\n")
	if (method_X==1):
		while True:
			target_array_index=int_s(input("\n\n" + CYAN + "您要「存入」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
			if target_array_index<=8 and target_array_index>=1:
				break
			else:
				print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
		while True:
			way_X=int_s(input("\n\n" + CYAN + "抓取方式:\n(1)搜尋頁面所有純文字網址\n(2)抓取所有A標籤HREF屬性\n(3)抓取所有IMG標籤SRC屬性\n(4)抓取所有SOURCE標籤SRC屬性\n(5)抓取所有EMBED標籤SRC屬性\n(6)抓取所有OBJECT標籤DATA屬性\n(7)抓取所有LINK標籤HREF屬性\n(8)抓取所有SCRIPT標籤SRC屬性\n(9)抓取所有FRAME標籤SRC屬性\n(10)抓取所有IFRAME標籤SRC屬性\n(11)使用以上所有方法\n(12)自訂找尋標籤名及屬性名\n(13)自訂找尋ID及屬性名\n(14)自訂找尋Name及屬性名" + END + "\n\n"))
			if way_X<=14 and way_X>=1:
				break
		else:
			print("輸入有誤!")
		if way_X==12 or way_X==13 or way_X==14:	
			target_tagname_=str(input("\n目標標籤名稱/ID/Name:"))
			target_attribute_=str(input("\n目標屬性名稱:"))
		else:
			target_tagname_=""
			target_attribute_=""
		temp_url=url_correct(str(input("\n目標網頁URL:")))
		temp_url_code=str(input("\n目標網頁編碼(請輸入utf-8或big5或gbk…):"))
		keywords=str(input("\n" + CYAN + "請輸入過濾關鍵字(可留空,可多個,逗號為分隔符號):" + END + "\n\n"))
		and_or=0
		if keywords!="":
			while True:
				and_or=(-1)
				try:
					and_or=int_s(input("\n" + CYAN + "請輸入關鍵字邏輯閘:(1=and、0=or)" + END + "\n\n"))
				except:
					print("")
				if and_or==0 or and_or==1:
					break
				else:
					print("輸入有誤!\n")
		html_code=download_URL(temp_url,save_temp_dir,0,0,temp_url_code,1)
		if html_code=="ERROR":
			continue
		run_functional_get_url(way_X,html_code,target_array_index,and_or,keywords,target_tagname_,target_attribute_)
		input("\n\n完成!請輸入ENTER鍵跳出此功能...");
	elif(method_X==2):
		while True:
			RUN_array_index=int_s(input("\n\n" + CYAN + "您要「載入」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
			if RUN_array_index<=8 and RUN_array_index>=1:
				break
			else:
				print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
		while True:
			target_array_index=int_s(input("\n\n" + CYAN + "您要「存入」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
			if target_array_index<=8 and target_array_index>=1:
				break
			else:
				print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
		while True:
			way_X=int_s(input("\n\n" + CYAN + "抓取方式:\n(1)搜尋頁面所有純文字網址\n(2)抓取所有A標籤HREF屬性\n(3)抓取所有IMG標籤SRC屬性\n(4)抓取所有SOURCE標籤SRC屬性\n(5)抓取所有EMBED標籤SRC屬性\n(6)抓取所有OBJECT標籤DATA屬性\n(7)抓取所有LINK標籤HREF屬性\n(8)抓取所有SCRIPT標籤SRC屬性\n(9)抓取所有FRAME標籤SRC屬性\n(10)抓取所有IFRAME標籤SRC屬性\n(11)使用以上所有方法\n(12)自訂找尋標籤名及屬性名\n(13)自訂找尋ID及屬性名\n(14)自訂找尋Name及屬性名" + END + "\n\n"))
			if way_X<=14 and way_X>=1:
				break
		else:
			print("輸入有誤!")
		if way_X==12 or way_X==13 or way_X==14:
			target_tagname_=str(input("\n目標標籤名稱/ID/Name:"))
			target_attribute_=str(input("\n目標屬性名稱:"))
		else:
			target_tagname_=""
			target_attribute_=""
		keywords=str(input("\n" + CYAN + "請輸入過濾關鍵字(可留空,可多個,逗號為分隔符號):" + END + "\n\n"))
		while True:
			and_or=int_s(input("\n" + CYAN + "請輸入關鍵字邏輯閘:(1=and、0=or)" + END + "\n\n"))
			if and_or==0 or and_or==1:
				break
			else:
				print("輸入有誤!\n\n")
		temp_url_code=str(input("\n集合的網頁編碼(請輸入utf-8或big5或gbk…):"))
		for x in range(0,(len(target_url[RUN_array_index]))):
			html_code=download_URL(str(target_url[RUN_array_index][x]),save_temp_dir,(x+1),len(target_url[RUN_array_index]),temp_url_code,1)
			if html_code=="ERROR":
				continue
			run_functional_get_url(way_X,html_code,target_array_index,and_or,keywords,target_tagname_,target_attribute_)
		input("\n\n完成!請輸入ENTER鍵跳出此功能...");
	elif(method_X==3):
		start_number=int_s(input("起始點(數字):"))
		end_number=int_s(input("終止點(數字):"))
		step_ADD=int_s(input("每次遞增多少:"))
		str_padx=int_s(input("補滿位數至:"))
		if not os.path.exists('/sdcard/' + save_dir) :
			os.makedirs('/sdcard/' + save_dir )
		print(LIME + "※檔案將存在/sdcard/" + save_dir + "資料夾。" + END)
		while True:
			url=url_correct(input(LIME + "目標URL({i}是遞增數):" + END))
			if url.find("{i}")>=0:
				break
			else:
				print("網址未包含遞增數,請重新輸入網址。")
		for x in range(start_number,(end_number+1),step_ADD):
			download_URL(url.replace("{i}",str(x).zfill(str_padx)),save_dir,x,(end_number),"utf-8",0)
		input("\n\n完成!請輸入ENTER鍵跳出此功能...")
	elif(method_X==4):
		while True:
			RUN_array_index=int_s(input("\n\n" + CYAN + "您要「顯示」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
			if RUN_array_index<=8 and RUN_array_index>=1:
				break
			else:
				print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
		for x in range(0,(len(target_url[RUN_array_index]))):
			print("URL (" + str(x+1) + "/" + str(len(target_url[RUN_array_index])) + "):" + str(target_url[RUN_array_index][x]))
		input("\n\n完成!請輸入ENTER鍵跳出此功能...")
	elif(method_X==5):
		while True:
			RUN_array_index=int_s(input("\n\n" + CYAN + "您要「下載」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
			if RUN_array_index<=8 and RUN_array_index>=1:
				break
			else:
				print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
		for x in range(0,(len(target_url[RUN_array_index]))):
			download_URL(str(target_url[RUN_array_index][x]),save_dir,(x+1),len(target_url[RUN_array_index]),"utf-8",0)
		input("\n\n完成!請輸入ENTER鍵跳出此功能...")
	elif(method_X==6):
		ver = str(input("\n\n" + RED + "確定清空目標清單?(y/n)" + END + "\n\n"))
		if ver.lower()=="y":
			while True:
				RUN_array_index=int_s(input("\n\n" + CYAN + "您要「清空」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
				if RUN_array_index<=8 and RUN_array_index>=1:
					break
				else:
					print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
			target_url[RUN_array_index]=[]
			input("\n\n完成!請輸入ENTER鍵跳出此功能...")
	elif(method_X==7):
		ver = str(input("\n\n" + RED + "確定複製目標清單?(y/n)" + END + "\n\n"))
		if ver.lower()=="y":
			while True:
				RUN_array_index=int_s(input("\n\n" + CYAN + "您要「複製」的來源清單:(請輸入編號1~8)" + END + "\n\n"))
				if RUN_array_index<=8 and RUN_array_index>=1:
					break
				else:
					print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
			while True:
				target_array_index=int_s(input("\n\n" + CYAN + "您要「存入」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
				if target_array_index<=8 and target_array_index>=1:
					break
				else:
					print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
			target_url[target_array_index]=target_url[RUN_array_index]
			input("\n\n完成!請輸入ENTER鍵跳出此功能...")
	elif(method_X==8):
		ver = str(input("\n\n" + RED + "確定刪除目標清單特定值?(y/n)" + END + "\n\n"))
		if ver.lower()=="y":
			while True:
				RUN_array_index=int_s(input("\n\n" + CYAN + "您要「刪除值」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
				if RUN_array_index<=8 and RUN_array_index>=1:
					break
				else:
					print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
			if len(target_url[RUN_array_index])!=0:
				while True:
					target_array_index=int_s(input("\n\n" + CYAN + "您要「刪除的值編號」:(請輸入編號0~" + str(len(target_url[RUN_array_index])-1) + ")" + END + "\n\n"))
					if target_array_index>=0 and target_array_index<=(len(target_url[RUN_array_index])-1):
						break
					else:
						print("輸入有誤!請輸入0~" + str(len(target_url[RUN_array_index])-1) + "之間的號碼!")
				del target_url[RUN_array_index][target_array_index]
			else:
				print("空清單!無任何值!故無法刪除。")
			input("\n\n完成!請輸入ENTER鍵跳出此功能...")
	elif(method_X==9):
		ver = str(input("\n\n" + RED + "確定從剪貼簿貼上目標清單?(y/n)" + END + "\n\n"))
		if ver.lower()=="y":
			while True:
				target_array_index=int_s(input("\n\n" + CYAN + "您要「存入」的目標清單:(請輸入編號1~8)" + END + "\n\n"))
				if target_array_index<=8 and target_array_index>=1:
					break
				else:
					print("輸入有誤!只開放8個清單,請輸入1~8之間的號碼!")
			kk=get_clipboard()
			if kk=="" or kk==None:
				print(RED + "剪貼簿是空的!" + END)
			else:
				ori_size=len(target_url[target_array_index])
				target_url[target_array_index].extend(kk.split("\n"))
				target_url[target_array_index]=remove_duplicates(target_url[target_array_index])
				print(LIME + "已添加進去 " + str(len(target_url[target_array_index])-ori_size) + " 個不重複的URL。" + END)
			input("\n\n完成!請輸入ENTER鍵跳出此功能...")
input("\n\n請輸入ENTER鍵結束...")

 


This entry was posted in General, Experience, Free, Functions, Note, Product, Python By Weil Jimmer.

上一頁  1 2 3 4 5 6 7 8 /8 頁)下一頁 最終頁

Visitor Count

pop
nonenonenone

Note

台灣假新聞橫行,沒一家霉體能信的,網軍側翼到處洗風向,堪憂。

毋忘初心,
絕不利慾薰心。

支持網路中立性.
Support Net Neutrality.

飽暖思淫欲,
饑寒起盜心。

支持臺灣實施無條件基本收入

歡迎前來本站。

Words Quiz


Search

Music

Blogging Journey

4480days

since our first blog post.

The strong do what they can and the weak suffer what they must.

Privacy is your right and ability to be yourself and express yourself without the fear that someone is looking over your shoulder and that you might be punished for being yourself, whatever that may be.

It is quality rather than quantity that matters.

I WANT Internet Freedom.

Reality made most of people lost their childishness.

Justice,Freedom,Knowledge.

Without music life would be a mistake.

Support/Donate

This site also need a little money to maintain operations, not entirely without any cost in the Internet. Your donations will be the best support and power of the site.
MethodBitcoin Address
bitcoin1gtuwCjjVVrNUHPGvW6nsuWGxSwygUv4x
buymeacoffee
Register in linode via invitation link and stay active for three months.Linode

Support The Zeitgeist Movement

The Zeitgeist Movement

The Lie We Live

The Lie We Live

The Questions We Never Ask

The Questions We Never Ask

Man

Man

THE EMPLOYMENT

Man

In The Fall

In The Fall

Facebook is EATING the Internet

Facebook

Categories

Android (7)

Announcement (4)

Arduino (2)

Bash (2)

C (3)

C# (5)

C++ (1)

Experience (52)

Flash (2)

Free (13)

Functions (36)

Games (13)

General (60)

Git (2)

HTML (7)

Java (13)

JS (7)

Mood (24)

NAS (2)

Note (32)

Office (1)

OpenWrt (6)

PHP (9)

Privacy (4)

Product (12)

Python (4)

Software (11)

The Internet (25)

Tools (16)

VB.NET (8)

WebHosting (7)

Wi-Fi (5)

XML (4)