关于apache:URL中是否允许使用方括号?

关于apache:URL中是否允许使用方括号?

Are square brackets permitted in URLs?

URL中是否允许使用方括号?

我注意到Apache Commons HttpClient(3.0.1)抛出IOException,wget和Firefox接受方括号。

网址示例:

1
http://example.com/path/to/file[3].html

我的HTTP客户端遇到了这样的URL,但是我不确定是要修补代码还是引发异常(实际上应该如此)。


RFC 3986状态

A host identified by an Internet
Protocol literal address, version 6
[RFC3513] or later, is distinguished
by enclosing the IP literal within
square brackets ("[" and"]"). This
is the only place where square bracket
characters are allowed in the URI
syntax.

因此,理论上您不应在野外看到此类URI,因为它们应该经过编码。


我知道这个问题有点老了,但是我只想指出PHP使用括号在URL中传递数组。

1
http://www.example.com/foo.php?bar[]=1&bar[]=2&bar[]=3

在这种情况下,$_GET['bar']将包含array(1, 2, 3)


URL中通常不支持方括号[]

将它们替换为%5B%5D

  • 使用命令行,以下示例基于bashsed

    1
    2
    url='http://example.com?day=[0-3][0-9]'
    encoded_url="$( sed 's/\\[/%5B/g;s/]/%5D/g' <<<"$url")"
  • 使用Java URLEncoder.encode(String s, String enc)

  • 使用PHP rawurlencode()urlencode()

    1
    2
    3
    4
    <?php
    echo '<a href="http://example.com/day/',
        rawurlencode('[0-3][0-9]'), '">';
    ?>

    输出:

    1
     

    要么:

    1
    2
    3
    4
    5
    6
    <?php
    $query_string = 'day=' . urlencode('[0-3][0-9]') .
                    '&month=' . urlencode('[0-1][0-9]');
    echo '<a href="http://example.com?',
          htmlentities($query_string), '">';
    ?>
  • 使用您喜欢的编程语言...请通过发布注释或直接编辑此答案来扩展此答案,以从您的编程语言中添加您使用的功能;-)

有关更多详细信息,请参阅RFC 3986中指定的URL语法。附录A大约是查询字符串中的%-encoding(括号中的属于" gen-delims"的括号为%-encoded)。


任何引入URL且在引入特殊字符时都不会引发异常的浏览器或支持Web的软件几乎可以保证在幕后对特殊字符进行编码。弯括号,方括号,空格等均具有特殊的编码方式来表示它们,以免产生冲突。按照前面的答案,处理这些问题的最安全方法是先对它们进行URL编码,然后再将其交给尝试解析URL的对象。


路径名中唯一不允许使用的字符是#和?因为它们表示道路的尽头。

uri rfc将有明确的答案:

http://www.ietf.org/rfc/rfc1738.txt

Unsafe:

Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters"<" and">" are unsafe because they are used as the
delimiters around URLs in free text; the quote mark (""") is used to
delimit URLs in some systems. The character"#" is unsafe and should
always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might
follow it. The character"%" is unsafe because it is used for
encodings of other characters. Other characters are unsafe because
gateways and other transport agents are known to sometimes modify
such characters. These characters are"{","}","|","\\","^","~",
"[","]", and"`".

All unsafe characters must always be encoded within a URL. For
example, the character"#" must be encoded within URLs even in
systems that do not normally deal with fragment or anchor
identifiers, so that if the URL is copied into another system that
does use them, it will not be necessary to change the URL encoding.

答案是它们应该是十六进制编码的,但是了解Postel的定律,大多数事情都会逐字接受它们。


StackOverflow似乎不对它们进行编码:

https://stackoverflow.com/search?q=square+brackets+[url]


为了使用HttpClient commons类,您需要研究org.apache.commons.httpclient.util.URIUtil类,特别是encode()方法。在尝试获取URL之前,使用它对URL进行URI编码。


根据URL规范,方括号不是有效的URL字符。

以下是相关片段:

The"national" and"punctuation" characters do not appear in any
productions and therefore may not appear in URLs.
national { | } | vline | [ | ] | \\ | ^ | ~
punctuation < | >


最好使用URL进行编码,因为显然并非所有Web服务器都支持它们。有时,即使有标准,也并非每个人都遵循。


方括号被认为是不安全的,但是大多数浏览器会正确解析它们。话虽如此,最好用其他字符替换方括号。


推荐阅读